[java] Read CSV with Scanner()

My csv is getting read into the System.out, but I've noticed that any text with a space gets moved into the next line (as a return \n)

Here's how my csv starts:

first,last,email,address 1, address 2
john,smith,[email protected],123 St. Street,
Jane,Smith,[email protected],4455 Roger Cir,apt 2

After running my app, any cell with a space (address 1), gets thrown onto the next line.

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class main {

    public static void main(String[] args) {
        // -define .csv file in app
        String fileNameDefined = "uploadedcsv/employees.csv";
        // -File class needed to turn stringName to actual file
        File file = new File(fileNameDefined);

        try{
            // -read from filePooped with Scanner class
            Scanner inputStream = new Scanner(file);
            // hashNext() loops line-by-line
            while(inputStream.hasNext()){
                //read single line, put in string
                String data = inputStream.next();
                System.out.println(data + "***");

            }
            // after loop, close scanner
            inputStream.close();


        }catch (FileNotFoundException e){

            e.printStackTrace();
        }

    }
}

So here's the result in the console:

first,last,email,address 
1,address 
2
john,smith,[email protected],123 
St. 
Street,
Jane,Smith,[email protected],4455 
Roger 
Cir,apt 
2

Am I using Scanner incorrectly?

This question is related to java csv java.util.scanner

The answer is


I have seen many production problems caused by code not handling quotes ("), newline characters within quotes, and quotes within the quotes; e.g.: "he said ""this""" should be parsed into: he said "this"

Like it was mentioned earlier, many CSV parsing examples out there just read a line, and then break up the line by the separator character. This is rather incomplete and problematic.

For me and probably those who prefer build verses buy (or use somebody else's code and deal with their dependencies), I got down to classic text parsing programming and that worked for me:

/**
 * Parse CSV data into an array of String arrays. It handles double quoted values.
 * @param is input stream
 * @param separator
 * @param trimValues
 * @param skipEmptyLines
 * @return an array of String arrays
 * @throws IOException
 */
public static String[][] parseCsvData(InputStream is, char separator, boolean trimValues, boolean skipEmptyLines)
    throws IOException
{
    ArrayList<String[]> data = new ArrayList<String[]>();
    ArrayList<String> row = new ArrayList<String>();
    StringBuffer value = new StringBuffer();
    int ch = -1;
    int prevCh = -1;
    boolean inQuotedValue = false;
    boolean quoteAtStart = false;
    boolean rowIsEmpty = true;
    boolean isEOF = false;

    while (true)
    {
        prevCh = ch;
        ch = (isEOF) ? -1 : is.read();

        // Handle carriage return line feed
        if (prevCh == '\r' && ch == '\n')
        {
            continue;
        }
        if (inQuotedValue)
        {
            if (ch == -1)
            {
                inQuotedValue = false;
                isEOF = true;
            }
            else
            {
                value.append((char)ch);

                if (ch == '"')
                {
                    inQuotedValue = false;
                }
            }
        }
        else if (ch == separator || ch == '\r' || ch == '\n' || ch == -1)
        {
            // Add the value to the row
            String s = value.toString();

            if (quoteAtStart && s.endsWith("\""))
            {
                s = s.substring(1, s.length() - 1);
            }
            if (trimValues)
            {
                s = s.trim();
            }
            rowIsEmpty = (s.length() > 0) ? false : rowIsEmpty;
            row.add(s);
            value.setLength(0);

            if (ch == '\r' || ch == '\n' || ch == -1)
            {
                // Add the row to the result
                if (!skipEmptyLines || !rowIsEmpty)
                {
                    data.add(row.toArray(new String[0]));
                }
                row.clear();
                rowIsEmpty = true;

                if (ch == -1)
                {
                    break;
                }
            }
        }
        else if (prevCh == '"')
        {
            inQuotedValue = true;
        }
        else
        {
            if (ch == '"')
            {
                inQuotedValue = true;
                quoteAtStart = (value.length() == 0) ? true : false;
            }
            value.append((char)ch);
        }
    }
    return data.toArray(new String[0][]);
}

Unit Test:

String[][] data = parseCsvData(new ByteArrayInputStream("foo,\"\",,\"bar\",\"\"\"music\"\"\",\"carriage\r\nreturn\",\"new\nline\"\r\nnext,line".getBytes()), ',', true, true);
for (int rowIdx = 0; rowIdx < data.length; rowIdx++)
{
    System.out.println(Arrays.asList(data[rowIdx]));
}

generates the output:

[foo, , , bar, "music", carriage
return, new
line]
[next, line]

If you absolutely must use Scanner, then you must set its delimiter via its useDelimiter(...) method. Else it will default to using all white space as its delimiter. Better though as has already been stated -- use a CSV library since this is what they do best.

For example, this delimiter will split on commas with or without surrounding whitespace:

scanner.useDelimiter("\\s*,\\s*");

Please check out the java.util.Scanner API for more on this.


Please stop writing faulty CSV parsers!

I've seen hundreds of CSV parsers and so called tutorials for them online.

Nearly every one of them gets it wrong!

This wouldn't be such a bad thing as it doesn't affect me but people who try to write CSV readers and get it wrong tend to write CSV writers, too. And get them wrong as well. And these ones I have to write parsers for.

Please keep in mind that CSV (in order of increasing not so obviousness):

  1. can have quoting characters around values
  2. can have other quoting characters than "
  3. can even have other quoting characters than " and '
  4. can have no quoting characters at all
  5. can even have quoting characters on some values and none on others
  6. can have other separators than , and ;
  7. can have whitespace between seperators and (quoted) values
  8. can have other charsets than ascii
  9. should have the same number of values in each row, but doesn't always
  10. can contain empty fields, either quoted: "foo","","bar" or not: "foo",,"bar"
  11. can contain newlines in values
  12. can not contain newlines in values if they are not delimited
  13. can not contain newlines between values
  14. can have the delimiting character within the value if properly escaped
  15. does not use backslash to escape delimiters but...
  16. uses the quoting character itself to escape it, e.g. Frodo's Ring will be 'Frodo''s Ring'
  17. can have the quoting character at beginning or end of value, or even as only character ("foo""", """bar", """")
  18. can even have the quoted character within the not quoted value; this one is not escaped

If you think this is obvious not a problem, then think again. I've seen every single one of these items implemented wrongly. Even in major software packages. (e.g. Office-Suites, CRM Systems)

There are good and correctly working out-of-the-box CSV readers and writers out there:

If you insist on writing your own at least read the (very short) RFC for CSV.


Scanner.next() does not read a newline but reads the next token, delimited by whitespace (by default, if useDelimiter() was not used to change the delimiter pattern). To read a line use Scanner.nextLine().

Once you read a single line you can use String.split(",") to separate the line into fields. This enables identification of lines that do not consist of the required number of fields. Using useDelimiter(","); would ignore the line-based structure of the file (each line consists of a list of fields separated by a comma). For example:

while (inputStream.hasNextLine())
{
    String line = inputStream.nextLine();
    String[] fields = line.split(",");
    if (fields.length >= 4) // At least one address specified.
    {
        for (String field: fields) System.out.print(field + "|");
        System.out.println();
    }
    else
    {
        System.err.println("Invalid record: " + line);
    }
}

As already mentioned, using a CSV library is recommended. For one, this (and useDelimiter(",") solution) will not correctly handle quoted identifiers containing , characters.


Split nextLine() by this delimiter: (?=([^\"]*\"[^\"]*\")*[^\"]*$)").


I agree with Scheintod that using an existing CSV library is a good idea to have RFC-4180-compliance from the start. Besides the mentioned OpenCSV and Oster Miller, there are a series of other CSV libraries out there. If you're interested in performance, you can take a look at the uniVocity/csv-parsers-comparison. It shows that

are consistently the fastest using either JDK 6, 7, 8, or 9. The study did not find any RFC 4180 compatibility issues in any of those three. Both OpenCSV and Oster Miller are found to be about twice as slow as those.

I'm not in any way associated with the author(s), but concerning the uniVocity CSV parser, the study might be biased due to its author being the same as of that parser.

To note, the author of SimpleFlatMapper has also published a performance comparison comparing only those three.


Well, I do my coding in NetBeans 8.1:

First: Create a new project, select Java application and name your project.

Then modify your code after public class to look like the following:

/**
 * @param args the command line arguments
 * @throws java.io.FileNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException {
    try (Scanner scanner = new Scanner(new File("C:\\Users\\YourName\\Folder\\file.csv"))) {
         scanner.useDelimiter(",");
         while(scanner.hasNext()){
             System.out.print(scanner.next()+"|");
         }}
    }
}

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to java.util.scanner

How do I use a delimiter with Scanner.useDelimiter in Java? How to read multiple Integer values from a single line of input in Java? What's the difference between next() and nextLine() methods from Scanner class? Read line with Scanner Rock, Paper, Scissors Game Java Java using scanner enter key pressed Scanner is never closed how to insert a new line character in a string to PrintStream then use a scanner to re-read the file Exception in thread "main" java.util.NoSuchElementException Using a scanner to accept String input and storing in a String Array