Read CSV with Scanner

Question

My csv is getting read into the System out  but I ve noticed that any text with a space gets moved into the next line  as a return  n   Here s how my csv starts   first last email address 1  address 2 john smith blah blah com 123 St  Street  Jane Smith blech blech com 4455 Roger Cir apt 2   After running my app  any cell with a space  address 1   gets thrown onto the next line   import java io File  import java io FileNotFoundException  import java util Scanner   public class main        public static void main String   args               -define  csv file in app         String fileNameDefined    uploadedcsv employees csv              -File class needed to turn stringName to actual file         File file   new File fileNameDefined            try                 -read from filePooped with Scanner class             Scanner inputStream   new Scanner file                  hashNext   loops line-by-line             while inputStream hasNext                       read single line  put in string                 String data   inputStream next                    System out println data                                         after loop  close scanner             inputStream close               catch  FileNotFoundException e                e printStackTrace                         So here s the result in the console    first last email address  1 address  2 john smith blah blah com 123  St   Street  Jane Smith blech blech com 4455  Roger  Cir apt  2   Am I using Scanner incorrectly

User · Answer

Well  I do my coding in NetBeans 8 1   First  Create a new project  select Java application and name your project    Then modify your code after public class to look like the following           param args the command line arguments     throws java io FileNotFoundException     public static void main String   args  throws FileNotFoundException       try  Scanner scanner   new Scanner new File  C   Users  YourName  Folder  file csv                scanner useDelimiter                while scanner hasNext                  System out print scanner next

User · Answer

Scanner next   does not read a newline but reads the next token  delimited by whitespace  by default  if useDelimiter   was not used to change the delimiter pattern   To read a line use Scanner nextLine     Once you read a single line you can use String split      to separate the line into fields  This enables identification of lines that do not consist of the required number of fields  Using useDelimiter       would ignore the line-based structure of the file  each line consists of a list of fields separated by a comma   For example   while  inputStream hasNextLine          String line   inputStream nextLine        String   fields   line split           if  fields length  gt   4     At least one address specified                for  String field  fields  System out print field                 System out println              else               System err println  Invalid record      line             As already mentioned  using a CSV library is recommended  For one  this  and useDelimiter      solution  will not correctly handle quoted identifiers containing   characters

User · Answer

I agree with Scheintod that using an existing CSV library is a good idea to have RFC-4180-compliance from the start  Besides the mentioned OpenCSV and Oster Miller  there are a series of other CSV libraries out there  If you re interested in performance  you can take a look at the uniVocity csv-parsers-comparison  It shows that   uniVocity CSV parser SimpleFlatMapper CSV parser Jackson CSV parser   are consistently the fastest using either JDK 6  7  8  or 9  The study did not find any RFC 4180 compatibility issues in any of those three  Both OpenCSV and Oster Miller are found to be about twice as slow as those   I m not in any way associated with the author s   but concerning the uniVocity CSV parser  the study might be biased due to its author being the same as of that parser   To note  the author of SimpleFlatMapper has also published a performance comparison comparing only those three

User · Answer

Split nextLine   by this delimiter

User · Answer

Please stop writing faulty CSV parsers   I ve seen hundreds of CSV parsers and so called tutorials for them online   Nearly every one of them gets it wrong    This wouldn t be such a bad thing as it doesn t affect me but people who try to write CSV readers and get it wrong tend to write CSV writers  too  And get them wrong as well  And these ones I have to write parsers for   Please keep in mind that CSV  in order of increasing not so obviousness     can have quoting characters around values can have other quoting characters than   can even have other quoting characters than   and   can have no quoting characters at all can even have quoting characters on some values and none on others can have other separators than   and   can have whitespace between seperators and  quoted  values can have other charsets than ascii should have the same number of values in each row  but doesn t always can contain empty fields  either quoted   foo      bar  or not   foo    bar  can contain newlines in values can not contain newlines in values if they are not delimited can not contain newlines between values can have the delimiting character within the value if properly escaped does not use backslash to escape delimiters but    uses the quoting character itself to escape it  e g  Frodo s Ring will be  Frodo  s Ring  can have the quoting character at beginning or end of value  or even as only character   foo        bar         can even have the quoted character within the not quoted value  this one is not escaped   If you think this is obvious not a problem  then think again  I ve seen every single one of these items implemented wrongly  Even in major software packages   e g  Office-Suites  CRM Systems   There are good and correctly working out-of-the-box CSV readers and writers out there    opencsv Ostermiller Java Utilities Apache Commons CSV   If you insist on writing your own at least read the  very short  RFC for CSV

User · Answer

If you absolutely must use Scanner  then you must set its delimiter via its useDelimiter      method  Else it will default to using all white space as its delimiter  Better though as has already been stated -- use a CSV library since this is what they do best   For example  this delimiter will split on commas with or without surrounding whitespace   scanner useDelimiter    s    s       Please check out the java util Scanner API for more on this

User · Answer

I have seen many production problems caused by code not handling quotes      newline characters within quotes  and quotes within the quotes  e g    he said   this    should be parsed into  he said  this   Like it was mentioned earlier  many CSV parsing examples out there just read a line  and then break up the line by the separator character  This is rather incomplete and problematic   For me and probably those who prefer build verses buy  or use somebody else s code and deal with their dependencies   I got down to classic text parsing programming and that worked for me          Parse CSV data into an array of String arrays  It handles double quoted values      param is input stream     param separator     param trimValues     param skipEmptyLines     return an array of String arrays     throws IOException     public static String     parseCsvData InputStream is  char separator  boolean trimValues  boolean skipEmptyLines      throws IOException       ArrayList lt String   gt  data   new ArrayList lt String   gt         ArrayList lt String gt  row   new ArrayList lt String gt         StringBuffer value   new StringBuffer        int ch   -1      int prevCh   -1      boolean inQuotedValue   false      boolean quoteAtStart   false      boolean rowIsEmpty   true      boolean isEOF   false       while  true                prevCh   ch          ch    isEOF    -1   is read                Handle carriage return line feed         if  prevCh      r   amp  amp  ch      n                         continue                    if  inQuotedValue                        if  ch    -1                                inQuotedValue   false                  isEOF   true                            else                               value append  char ch                    if  ch                                               inQuotedValue   false                                                    else if  ch    separator    ch      r     ch      n     ch    -1                           Add the value to the row             String s   value toString                 if  quoteAtStart  amp  amp  s endsWith                                      s   s substring 1  s length   - 1                             if  trimValues                                s   s trim                              rowIsEmpty    s length    gt  0    false   rowIsEmpty              row add s               value setLength 0                if  ch      r     ch      n     ch    -1                                   Add the row to the result                 if   skipEmptyLines     rowIsEmpty                                        data add row toArray new String 0                                       row clear                    rowIsEmpty   true                   if  ch    -1                                        break                                                    else if  prevCh                               inQuotedValue   true                    else                       if  ch                                       inQuotedValue   true                  quoteAtStart    value length      0    true   false                            value append  char ch                       return data toArray new String 0          Unit Test   String     data   parseCsvData new ByteArrayInputStream  foo         bar         music         carriage r nreturn     new nline   r nnext line  getBytes          true  true   for  int rowIdx   0  rowIdx  lt  data length  rowIdx          System out println Arrays asList data rowIdx         generates the output    foo      bar   music   carriage return  new line   next  line

User · Answer

scanner useDelimiter         This should work   import java io File  import java io FileNotFoundException  import java util Scanner    public class TestScanner        public static void main String   args  throws FileNotFoundException           Scanner scanner   new Scanner new File   Users pankaj abc csv             scanner useDelimiter               while scanner hasNext                 System out print scanner next                           scanner close               For CSV File   a b c d e 1 2 3 4 5 X Y Z A B   Output is   a b c d e 1 2 3 4 5 X Y Z A B

[java] Read CSV with Scanner()

Examples related to java

Examples related to csv

Examples related to java.util.scanner