Please stop writing faulty CSV parsers!
I've seen hundreds of CSV parsers and so called tutorials for them online.
Nearly every one of them gets it wrong!
This wouldn't be such a bad thing as it doesn't affect me but people who try to write CSV readers and get it wrong tend to write CSV writers, too. And get them wrong as well. And these ones I have to write parsers for.
Please keep in mind that CSV (in order of increasing not so obviousness):
- can have quoting characters around values
- can have other quoting characters than "
- can even have other quoting characters than " and '
- can have no quoting characters at all
- can even have quoting characters on some values and none on others
- can have other separators than , and ;
- can have whitespace between seperators and (quoted) values
- can have other charsets than ascii
- should have the same number of values in each row, but doesn't always
- can contain empty fields, either quoted:
"foo","","bar"
or not: "foo",,"bar"
- can contain newlines in values
- can not contain newlines in values if they are not delimited
- can not contain newlines between values
- can have the delimiting character within the value if properly escaped
- does not use backslash to escape delimiters but...
- uses the quoting character itself to escape it, e.g.
Frodo's Ring
will be 'Frodo''s Ring'
- can have the quoting character at beginning or end of value, or even as only character (
"foo""", """bar", """"
)
- can even have the quoted character within the not quoted value; this one is not escaped
If you think this is obvious not a problem, then think again. I've seen every single one of these items implemented wrongly. Even in major software packages. (e.g. Office-Suites, CRM Systems)
There are good and correctly working out-of-the-box CSV readers and writers out there:
If you insist on writing your own at least read the (very short) RFC for CSV.