[javascript] How can I parse a CSV string with JavaScript, which contains comma in data?

I have the following type of string

var string = "'string, duppi, du', 23, lala"

I want to split the string into an array on each comma, but only the commas outside the single quotation marks.

I can't figure out the right regular expression for the split...

string.split(/,/)

will give me

["'string", " duppi", " du'", " 23", " lala"]

but the result should be:

["string, duppi, du", "23", "lala"]

Is there a cross-browser solution?

This question is related to javascript regex split

The answer is


Adding one more to the list, because I find all of the above not quite "KISS" enough.

This one uses regex to find either commas or newlines while skipping over quoted items. Hopefully this is something noobies can read through on their own. The splitFinder regexp has three things it does (split by a |):

  1. , - finds commas
  2. \r?\n - finds new lines, (potentially with carriage return if the exporter was nice)
  3. "(\\"|[^"])*?" - skips anynthing surrounded in quotes, because commas and newlines don't matter in there. If there is an escaped quote \\" in the quoted item, it will get captured before an end quote can be found.

_x000D_
_x000D_
const splitFinder = /,|\r?\n|"(\\"|[^"])*?"/g;_x000D_
_x000D_
function csvTo2dArray(parseMe) {_x000D_
  let currentRow = [];_x000D_
  const rowsOut = [currentRow];_x000D_
  let lastIndex = splitFinder.lastIndex = 0;_x000D_
  _x000D_
  // add text from lastIndex to before a found newline or comma_x000D_
  const pushCell = (endIndex) => {_x000D_
    endIndex = endIndex || parseMe.length;_x000D_
    const addMe = parseMe.substring(lastIndex, endIndex);_x000D_
    // remove quotes around the item_x000D_
    currentRow.push(addMe.replace(/^"|"$/g, ""));_x000D_
    lastIndex = splitFinder.lastIndex;_x000D_
  }_x000D_
_x000D_
_x000D_
  let regexResp;_x000D_
  // for each regexp match (either comma, newline, or quoted item)_x000D_
  while (regexResp = splitFinder.exec(parseMe)) {_x000D_
    const split = regexResp[0];_x000D_
_x000D_
    // if it's not a quote capture, add an item to the current row_x000D_
    // (quote captures will be pushed by the newline or comma following)_x000D_
    if (split.startsWith(`"`) === false) {_x000D_
      const splitStartIndex = splitFinder.lastIndex - split.length;_x000D_
      pushCell(splitStartIndex);_x000D_
_x000D_
      // then start a new row if newline_x000D_
      const isNewLine = /^\r?\n$/.test(split);_x000D_
      if (isNewLine) { rowsOut.push(currentRow = []); }_x000D_
    }_x000D_
  }_x000D_
  // make sure to add the trailing text (no commas or newlines after)_x000D_
  pushCell();_x000D_
  return rowsOut;_x000D_
}_x000D_
_x000D_
const rawCsv = `a,b,c\n"test\r\n","comma, test","\r\n",",",\nsecond,row,ends,with,empty\n"quote\"test"`_x000D_
const rows = csvTo2dArray(rawCsv);_x000D_
console.log(rows);
_x000D_
_x000D_
_x000D_


Aside from the excellent and complete answer from ridgerunner, I thought of a very simple workaround for when your backend runs PHP.

Add this PHP file to your domain's backend (say: csv.php)

<?php
    session_start(); // Optional
    header("content-type: text/xml");
    header("charset=UTF-8");
    // Set the delimiter and the End of Line character of your CSV content:
    echo json_encode(array_map('str_getcsv', str_getcsv($_POST["csv"], "\n")));
?>

Now add this function to your JavaScript toolkit (should be revised a bit to make crossbrowser I believe).

function csvToArray(csv) {
    var oXhr = new XMLHttpRequest;
    oXhr.addEventListener("readystatechange",
        function () {
            if (this.readyState == 4 && this.status == 200) {
                console.log(this.responseText);
                console.log(JSON.parse(this.responseText));
            }
        }
    );
    oXhr.open("POST","path/to/csv.php",true);
    oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
    oXhr.send("csv=" + encodeURIComponent(csv));
}

It will cost you one Ajax call, but at least you won't duplicate code nor include any external library.

Ref: http://php.net/manual/en/function.str-getcsv.php


My answer presumes your input is a reflection of code/content from web sources where single and double quote characters are fully interchangeable provided they occur as an non-escaped matching set.

You cannot use regex for this. You actually have to write a micro parser to analyze the string you wish to split. I will, for the sake of this answer, call the quoted parts of your strings as sub-strings. You need to specifically walk across the string. Consider the following case:

var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
    b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";

In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.

I am not going to write that level of complexity of code for you, but you can look at something I recently wrote that has the pattern you need. This code has nothing to do with commas, but is otherwise a valid enough micro-parser for you to follow in writing your own code. Look into the asifix function of the following application:

https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js


If you can have your quote delimiter be double quotes, then this is a duplicate of Example JavaScript code to parse CSV data.

You can either translate all single-quotes to double-quotes first:

string = string.replace( /'/g, '"' );

...or you can edit the regex in that question to recognize single-quotes instead of double-quotes:

// Quoted fields.
"(?:'([^']*(?:''[^']*)*)'|" +

However, this assumes certain markup that is not clear from your question. Please clarify what all the various possibilities of markup can be, per my comment on your question.


Regular expressions to the rescue! These few lines of code handle properly quoted fields with embedded commas, quotes, and newlines based on the RFC 4180 standard.

function parseCsv(data, fieldSep, newLine) {
    fieldSep = fieldSep || ',';
    newLine = newLine || '\n';
    var nSep = '\x1D';
    var qSep = '\x1E';
    var cSep = '\x1F';
    var nSepRe = new RegExp(nSep, 'g');
    var qSepRe = new RegExp(qSep, 'g');
    var cSepRe = new RegExp(cSep, 'g');
    var fieldRe = new RegExp('(?<=(^|[' + fieldSep + '\\n]))"(|[\\s\\S]+?(?<![^"]"))"(?=($|[' + fieldSep + '\\n]))', 'g');
    var grid = [];
    data.replace(/\r/g, '').replace(/\n+$/, '').replace(fieldRe, function(match, p1, p2) {
        return p2.replace(/\n/g, nSep).replace(/""/g, qSep).replace(/,/g, cSep);
    }).split(/\n/).forEach(function(line) {
        var row = line.split(fieldSep).map(function(cell) {
            return cell.replace(nSepRe, newLine).replace(qSepRe, '"').replace(cSepRe, ',');
        });
        grid.push(row);
    });
    return grid;
}

const csv = 'A1,B1,C1\n"A ""2""","B, 2","C\n2"';
const separator = ',';      // field separator, default: ','
const newline = ' <br /> '; // newline representation in case a field contains newlines, default: '\n' 
var grid = parseCsv(csv, separator, newline);
// expected: [ [ 'A1', 'B1', 'C1' ], [ 'A "2"', 'B, 2', 'C <br /> 2' ] ]

Unless stated elsewhere, you don't need a finite state machine. The regular expression handles RFC 4180 properly thanks to positive lookbehind, negative lookbehind, and positive lookahead.

Clone/download code at https://github.com/peterthoeny/parse-csv-js


According to this blog post, this function should do it:

String.prototype.splitCSV = function(sep) {
  for (var foo = this.split(sep = sep || ","), x = foo.length - 1, tl; x >= 0; x--) {
    if (foo[x].replace(/'\s+$/, "'").charAt(foo[x].length - 1) == "'") {
      if ((tl = foo[x].replace(/^\s+'/, "'")).length > 1 && tl.charAt(0) == "'") {
        foo[x] = foo[x].replace(/^\s*'|'\s*$/g, '').replace(/''/g, "'");
      } else if (x) {
        foo.splice(x - 1, 2, [foo[x - 1], foo[x]].join(sep));
      } else foo = foo.shift().split(sep).concat(foo);
    } else foo[x].replace(/''/g, "'");
  } return foo;
};

You would call it like so:

var string = "'string, duppi, du', 23, lala";
var parsed = string.splitCSV();
alert(parsed.join("|"));

This jsfiddle kind of works, but it looks like some of the elements have spaces before them.


While reading the CSV file into a string, it contains null values in between strings, so try it with \0 line by line. It works for me.

stringLine = stringLine.replace(/\0/g, "" );

I've used regex a number of times, but I always have to relearn it each time, which is frustrating :-)

So Here's a non-regex solution:

function csvRowToArray(row, delimiter = ',', quoteChar = '"'){
    let nStart = 0, nEnd = 0, a=[], nRowLen=row.length, bQuotedValue;
    while (nStart <= nRowLen) {
        bQuotedValue = (row.charAt(nStart) === quoteChar);
        if (bQuotedValue) {
            nStart++;
            nEnd = row.indexOf(quoteChar + delimiter, nStart)
        } else {
            nEnd = row.indexOf(delimiter, nStart)
        }
        if (nEnd < 0) nEnd = nRowLen;
        a.push(row.substring(nStart,nEnd));
        nStart = nEnd + delimiter.length + (bQuotedValue ? 1 : 0)
    }
    return a;
}

How it works:

  1. Pass in the csv string in row.
  2. While the start position of the next value is within the row, do the following:
    • If this value has been quoted, set nEnd to the closing quote.
    • Else if value has NOT been quoted, set nEnd to the next delimiter.
    • Add the value to an array.
    • Set nStart to nEnd plus the length of the delimeter.

Sometimes it's good to write your own small function, rather than use a library. Your own code is going to perform well and use only a small footprint. In addition, you can easily tweak it to suit your own needs.


I have also faced the same type of problem when I had to parse a CSV file.

The file contains a column address which contains the ',' .

After parsing that CSV file to JSON, I get mismatched mapping of the keys while converting it into a JSON file.

I used Node.js for parsing the file and libraries like baby parse and csvtojson.

Example of file -

address,pincode
foo,baar , 123456

While I was parsing directly without using baby parse in JSON, I was getting:

[{
 address: 'foo',
 pincode: 'baar',
 'field3': '123456'
}]

So I wrote code which removes the comma(,) with any other delimiter with every field:

_x000D_
_x000D_
/*
 csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
 output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
*/
const removeComma = function(csvString){
    let delimiter = '|'
    let Baby = require('babyparse')
    let arrRow = Baby.parse(csvString).data;
    /*
      arrRow = [
      [ 'address', 'pincode' ],
      [ 'foo, bar', '123456']
      ]
    */
    return arrRow.map((singleRow, index) => {
        //the data will include
        /*
        singleRow = [ 'address', 'pincode' ]
        */
        return singleRow.map(singleField => {
            //for removing the comma in the feild
            return singleField.split(',').join(delimiter)
        })
    }).reduce((acc, value, key) => {
        acc = acc +(Array.isArray(value) ?
         value.reduce((acc1, val)=> {
            acc1 = acc1+ val + ','
            return acc1
        }, '') : '') + '\n';
        return acc;
    },'')
}
_x000D_
_x000D_
_x000D_

The function returned can be passed into the csvtojson library and thus the result can be used.

_x000D_
_x000D_
const csv = require('csvtojson')

let csvString = "address, pincode\\nfoo, bar, 123456\\n"
let jsonArray = []
modifiedCsvString = removeComma(csvString)
csv()
  .fromString(modifiedCsvString)
  .on('json', json => jsonArray.push(json))
  .on('end', () => {
    /* do any thing with the json Array */
  })
_x000D_
_x000D_
_x000D_

Now you can get the output like:

[{
  address: 'foo, bar',
  pincode: 123456
}]

PEG(.js) grammar that handles RFC 4180 examples at http://en.wikipedia.org/wiki/Comma-separated_values:

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]

Test at http://jsfiddle.net/knvzk/10 or https://pegjs.org/online.

Download the generated parser at https://gist.github.com/3362830.


People seemed to be against RegEx for this. Why?

(\s*'[^']+'|\s*[^,]+)(?=,|$)

Here's the code. I also made a fiddle.

String.prototype.splitCSV = function(sep) {
  var regex = /(\s*'[^']+'|\s*[^,]+)(?=,|$)/g;
  return matches = this.match(regex);    
}

var string = "'string, duppi, du', 23, 'string, duppi, du', lala";
var parsed = string.splitCSV();
alert(parsed.join('|'));

I had a very specific use case where I wanted to copy cells from Google Sheets into my web app. Cells could include double-quotes and new-line characters. Using copy and paste, the cells are delimited by a tab characters, and cells with odd data are double quoted. I tried this main solution, the linked article using regexp, and Jquery-CSV, and CSVToArray. http://papaparse.com/ Is the only one that worked out of the box. Copy and paste is seamless with Google Sheets with default auto-detect options.


I liked FakeRainBrigand's answer, however it contains a few problems: It can not handle whitespace between a quote and a comma, and does not support 2 consecutive commas. I tried editing his answer but my edit got rejected by reviewers that apparently did not understand my code. Here is my version of FakeRainBrigand's code. There is also a fiddle: http://jsfiddle.net/xTezm/46/

String.prototype.splitCSV = function() {
        var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
        for (var n = 0; n < matches.length; ++n) {
            matches[n] = matches[n].trim();
            if (matches[n] == ',') matches[n] = '';
        }
        if (this[0] == ',') matches.unshift("");
        return matches;
}

var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
var parsed = string.splitCSV();
alert(parsed.join('|'));

To complement this answer

If you need to parse quotes escaped with another quote, example:

"some ""value"" that is on xlsx file",123

You can use

function parse(text) {
  const csvExp = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|"([^""]*(?:"[\S\s][^""]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;

  const values = [];

  text.replace(csvExp, (m0, m1, m2, m3, m4) => {
    if (m1 !== undefined) {
      values.push(m1.replace(/\\'/g, "'"));
    }
    else if (m2 !== undefined) {
      values.push(m2.replace(/\\"/g, '"'));
    }
    else if (m3 !== undefined) {
      values.push(m3.replace(/""/g, '"'));
    }
    else if (m4 !== undefined) {
      values.push(m4);
    }
    return '';
  });

  if (/,\s*$/.test(text)) {
    values.push('');
  }

  return values;
}

You can use papaparse.js like the example below:

<!DOCTYPE html>
<html lang="en">

    <head>
        <title>CSV</title>
    </head>

    <body>
        <input type="file" id="files" multiple="">
        <button onclick="csvGetter()">CSV Getter</button>
        <h3>The Result will be in the Console.</h3>

        <script src="papaparse.min.js"></script>

        <script>
            function csvGetter() {

                var file = document.getElementById('files').files[0];
                Papa.parse(file, {
                    complete: function(results) {
                        console.log(results.data);
                    }
                });
            }
          </script>
    </body>

</html>

Don't forget to include papaparse.js in the same folder.


RFC 4180 solution

This does not solve the string in the question since its format is not conforming with RFC 4180; the acceptable encoding is escaping double quote with double quote. The solution below works correctly with CSV files d/l from google spreadsheets.

UPDATE (3/2017)

Parsing single line would be wrong. According to RFC 4180 fields may contain CRLF which will cause any line reader to break the CSV file. Here is an updated version that parses CSV string:

_x000D_
_x000D_
'use strict';_x000D_
_x000D_
function csvToArray(text) {_x000D_
    let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;_x000D_
    for (l of text) {_x000D_
        if ('"' === l) {_x000D_
            if (s && l === p) row[i] += l;_x000D_
            s = !s;_x000D_
        } else if (',' === l && s) l = row[++i] = '';_x000D_
        else if ('\n' === l && s) {_x000D_
            if ('\r' === p) row[i] = row[i].slice(0, -1);_x000D_
            row = ret[++r] = [l = '']; i = 0;_x000D_
        } else row[i] += l;_x000D_
        p = l;_x000D_
    }_x000D_
    return ret;_x000D_
};_x000D_
_x000D_
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"\r\n"2nd line one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"';_x000D_
console.log(csvToArray(test));
_x000D_
_x000D_
_x000D_

OLD ANSWER

(Single line solution)

_x000D_
_x000D_
function CSVtoArray(text) {_x000D_
    let ret = [''], i = 0, p = '', s = true;_x000D_
    for (let l in text) {_x000D_
        l = text[l];_x000D_
        if ('"' === l) {_x000D_
            s = !s;_x000D_
            if ('"' === p) {_x000D_
                ret[i] += '"';_x000D_
                l = '-';_x000D_
            } else if ('' === p)_x000D_
                l = '-';_x000D_
        } else if (s && ',' === l)_x000D_
            l = ret[++i] = '';_x000D_
        else_x000D_
            ret[i] += l;_x000D_
        p = l;_x000D_
    }_x000D_
    return ret;_x000D_
}_x000D_
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,five for fun';_x000D_
console.log(CSVtoArray(test));
_x000D_
_x000D_
_x000D_

And for the fun, here is how you create CSV from the array:

_x000D_
_x000D_
function arrayToCSV(row) {_x000D_
    for (let i in row) {_x000D_
        row[i] = row[i].replace(/"/g, '""');_x000D_
    }_x000D_
    return '"' + row.join('","') + '"';_x000D_
}_x000D_
_x000D_
let row = [_x000D_
  "one",_x000D_
  "two with escaped \" double quote",_x000D_
  "three, with, commas",_x000D_
  "four with no quotes (now has)",_x000D_
  "five for fun"_x000D_
];_x000D_
let text = arrayToCSV(row);_x000D_
console.log(text);
_x000D_
_x000D_
_x000D_


No regexp, readable, and according to https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules:

function csv2arr(str: string) {
    let line = ["",];
    const ret = [line,];
    let quote = false;

    for (let i = 0; i < str.length; i++) {
        const cur = str[i];
        const next = str[i + 1];

        if (!quote) {
            const cellIsEmpty = line[line.length - 1].length === 0;
            if (cur === '"' && cellIsEmpty) quote = true;
            else if (cur === ",") line.push("");
            else if (cur === "\r" && next === "\n") { line = ["",]; ret.push(line); i++; }
            else if (cur === "\n" || cur === "\r") { line = ["",]; ret.push(line); }
            else line[line.length - 1] += cur;
        } else {
            if (cur === '"' && next === '"') { line[line.length - 1] += cur; i++; }
            else if (cur === '"') quote = false;
            else line[line.length - 1] += cur;
        }
    }
    return ret;
}

Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows