Difference between revisions of "CSV"

From Lazarus wiki
(more details on sdf, csv sample)
(SDF format)
Line 26: Line 26:
 
==== SDF format ====
 
==== SDF format ====
 
A summary of the SDF format as used in TStringList, as paraphrased/synthesized from various versions of Delphi help:
 
A summary of the SDF format as used in TStringList, as paraphrased/synthesized from various versions of Delphi help:
If you get DelimitedText, strings that include spaces, Delimiter or QuoteChar must be enclosed in QuoteChars (" by default) aka quoted. (In practice, however, with StrictDelimiter on, e.g. spaced text is not quoted)
+
* If you get DelimitedText, strings that include spaces, Delimiter (e.g. ,) or QuoteChar '''must''' be enclosed in QuoteChars (" by default) aka quoted. (In practice, however, with StrictDelimiter on, e.g. spaced text is not quoted)
In addition, a QuoteChar in a string will be repeated.
+
* In addition, a QuoteChar in a string '''must''' be repeated.
 
+
* When setting DelimitedText, strings '''must''' be separated by Delimiter characters '''or''' spaces.
When setting DelimitedText, strings must be separated by Delimiter characters or spaces.
+
* They ''may'' be enclosed in QuoteChars.
They may be enclosed in QuoteChars.
+
* QuoteChars in the string '''must''' be repeated to distinguish them from the QuoteChars enclosing the string.
 
+
* Delimiters that are not contained within QuoteChar marks are delimiters; same goes for spaces if StrictDelimiter is off.
QuoteChars in the string must be repeated to distinguish them from the QuoteChars enclosing the string.
+
* Two Delimiters in a row indicate an empty string.
 
 
Delimiters that are not contained within QuoteChar marks are delimiters; same goes for spaces if StrictDelimiter is off.
 
Two Delimiters in a row indicate an empty string.
 
  
 
(See e.g. [http://bugs.freepascal.org/view.php?id=19610] for a pending patch that improves SDF input/output).
 
(See e.g. [http://bugs.freepascal.org/view.php?id=19610] for a pending patch that improves SDF input/output).
  
 
Example SDF output (tested in Delphi and FPC with the patch for issue 19610):
 
Example SDF output (tested in Delphi and FPC with the patch for issue 19610):
This series of strings in a stringlist (note: numbers in front for clarity, not part of the data):
+
This series of strings in a stringlist (note: numbers in front indicate the field for clarity, they are not part of the data):
  
 
  0normal_string
 
  0normal_string
Line 54: Line 51:
 
  9SpacesAtTheEnd
 
  9SpacesAtTheEnd
  
gets output as this with ''StrictDelimiter'' true:
+
gets output as this with ''StrictDelimiter'' true (and ; as delimiter):
  
 
  normal_string;"includes;delimiter";includes space;"includes""quote";"includes ""quote and space";tab character;"""single_starting_quote";multi
 
  normal_string;"includes;delimiter";includes space;"includes""quote";"includes ""quote and space";tab character;"""single_starting_quote";multi
Line 63: Line 60:
 
  line.";"  SpacesInFront";"SpacesAtTheEnd  "
 
  line.";"  SpacesInFront";"SpacesAtTheEnd  "
  
 +
So, if using the SDF delimitedtext format to output CSV-like data, ''StrictDelimiter'' is best set to false in order to deal with leading and trailing spaces, multiline fields, etc.
  
 
=== SDFDataset ===
 
=== SDFDataset ===

Revision as of 14:12, 15 September 2012

Overview

CSV means Comma-Separated Values, and is a popular file format that is unfortunately not completely standardized. It is a text based file format with data fields separated by comma's (or, in variants, other characters like tabs and semicolons). There may or not be a header line that lists the field names. Field data containing delimiters may be prohibited or enclosed by quotes (most commonly the double quote character). Line endings (#13 and/or #10) may or may not be allowed in field data.

RFC4180 (see [www.rfc-editor.org/rfc/rfc4180.txt]) tries to codify and standardize existing practice; it makes sense to conform to this standard when writing CSV data (and accept all RFC4180 data when reading). Another, different, specification can be found at [1]

A sample CSV snippet:

FirstName,Surname,DOB,Remarks
Jim,Weston,19560818,"Also known as ""The Butcher"""
Alice,Cooper,19760312,""

Apparently a header line is used here, as are quoting using double quotes.

Spreadsheet packages such as Microsoft Excel and OpenOffice/LibreOFfice Calc are able to export to and import from this format. However, as Microsoft Excel may interpret some fields such as date fields differently depending on a user's OS locale, it may pay to find alternative ways of transferring data (e.g. using the FPSpreadsheet code).

CSV and SDF

Delphi (and FreePascal) have a very similar format, SDF. Please see the DelimitedText section below for more details.

Implementations

DelimitedText

TStringList offers the DelimitedText property. This parses a line of text out into separate fields. Note, however, that DelimitedText is supposed to be in SDF, a Delphi-specific format that is very much like CSV, but does not conform with RFC4180 completely.

Tips: when reading CSV data, set the StrictDelimiter property to true.

When writing out CSV data, set StrictDelimiter to false and output the DelimitedText property. One oddity is that e.g. tab characters are removed when writing out data using StrictDelimiter:=false

SDF format

A summary of the SDF format as used in TStringList, as paraphrased/synthesized from various versions of Delphi help:

  • If you get DelimitedText, strings that include spaces, Delimiter (e.g. ,) or QuoteChar must be enclosed in QuoteChars (" by default) aka quoted. (In practice, however, with StrictDelimiter on, e.g. spaced text is not quoted)
  • In addition, a QuoteChar in a string must be repeated.
  • When setting DelimitedText, strings must be separated by Delimiter characters or spaces.
  • They may be enclosed in QuoteChars.
  • QuoteChars in the string must be repeated to distinguish them from the QuoteChars enclosing the string.
  • Delimiters that are not contained within QuoteChar marks are delimiters; same goes for spaces if StrictDelimiter is off.
  • Two Delimiters in a row indicate an empty string.

(See e.g. [2] for a pending patch that improves SDF input/output).

Example SDF output (tested in Delphi and FPC with the patch for issue 19610): This series of strings in a stringlist (note: numbers in front indicate the field for clarity, they are not part of the data):

0normal_string
1includes;delimiter
2includes space
3includes"quote
4includes "quote and space
5tab	character
6"single_starting_quote
7multi
line.
8   SpacesInFront
9SpacesAtTheEnd

gets output as this with StrictDelimiter true (and ; as delimiter):

normal_string;"includes;delimiter";includes space;"includes""quote";"includes ""quote and space";tab	character;"""single_starting_quote";multi
line.;   SpacesInFront;SpacesAtTheEnd  

... and output as this with StrictDelimiter false:

normal_string;"includes;delimiter";"includes space";"includes""quote";"includes ""quote and space";"tab	character";"""single_starting_quote";"multi
line.";"   SpacesInFront";"SpacesAtTheEnd  "

So, if using the SDF delimitedtext format to output CSV-like data, StrictDelimiter is best set to false in order to deal with leading and trailing spaces, multiline fields, etc.

SDFDataset

FreePascal offers the SDFDataset, which stores data in SDF format. This format may be close enough to what a reading application expects to function.

Data Export

FreePascal/Lazarus database export functionality (e.g. TCSVExporter on the Data Export tab) offers CSV export functionality for datasets.

CsvDocument

See CsvDocument.

Jan's CSV components

See JCSV_(Jans_CSV_Components).

ZMSQL

ZMSQL stores data in semicolon-delimited files (using SDF?).