Line contains NULL byte in CSV reader Python

Question

I m trying to write a program that looks at a  CSV file  input csv  and rewrites only the rows that begin with a certain element  corrected csv   as listed in a text file  output txt    This is what my program looks like right now   import csv  lines      with open  output txt   r   as f      for line in f readlines            lines append line  -1    with open  corrected csv   w   as correct      writer   csv writer correct  dialect    excel       with open  input csv    r   as mycsv          reader   csv reader mycsv          for row in reader              if row 0  not in lines                  writer writerow row    Unfortunately  I keep getting this error  and I have no clue what it s about   Traceback  most recent call last     File  C  Python32 Sample Program csvParser py   line 12  in  lt module gt      for row in reader   csv Error  line contains NULL byte   Credit to all the people here to even to get me to this point

User · Answer

I m guessing you have a NUL byte in input csv   You can test that with  if   0  in open  input csv   read        print  you have null bytes in your input file  else      print  you don t    if you do   reader   csv reader x replace   0       for x in mycsv    may get you around that   Or it may indicate you have utf16 or something  interesting  in the  csv file

User · Answer

You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.

See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'rb') as mycsv:
        reader = csv.reader( (line.replace('\0','') for line in mycsv) )
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

User · Answer

I ve recently fixed this issue and in my instance it was a file that was compressed that I was trying to read  Check the file format first  Then check that the contents are what the extension refers to

User · Answer

This is long settled  but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow   In my case  the issue was much simpler  and is worth being conscious of   The data being produced into the CSV wasn t consistent  resulting in some columns being completely missing  which seems to end up throwing this error as well   The lesson  If you re seeing this error  verify that your data looks the way that you think it does

User · Answer

It is very simple  don t make a csv file by  quot create new excel quot  or save as  quot  csv quot  from window  simply import csv module  write a dummy csv file  and then paste your data in that  csv made by python csv module itself will no longer show you encoding or blank line error

User · Answer

This will tell you what line is the problem.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        try:
            for i, row in enumerate(reader):
                if row[0] not in lines:
                   writer.writerow(row)
        except csv.Error:
            print('csv choked on line %s' % (i+1))
            raise

Perhaps this from daniweb would be helpful:

I'm getting this error when reading from a csv file: "Runtime Error! line contains NULL byte". Any idea about the root cause of this error?

...

Ok, I got it and thought I'd post the solution. Simply yet caused me grief... Used file was saved in a .xls format instead of a .csv Didn't catch this because the file name itself had the .csv extension while the type was still .xls

User · Answer

pandas read csv now handles the different UTF encoding when reading writing and therefore can deal directly with null bytes  data   pd read csv file  encoding  utf-16     see https   pandas pydata org pandas-docs stable reference api pandas read csv html

User · Answer

I ve solved a similar problem with an easier solution   import codecs csvReader   csv reader codecs open  file csv    rU    utf-16      The key was using the codecs module to open the file with the UTF-16 encoding  there are a lot more of encodings  check the documentation

User · Answer

A tricky way   If you develop under Lunux  you can use all the power of sed   from subprocess import check call  CalledProcessError  PATH TO FILE     home user some path to file csv   try      check call  sed -i -e  s   x0  g      format PATH TO FILE   shell True  except CalledProcessError as err      print err        The most efficient solution for huge files   Checked for Python3  Kubuntu

User · Answer

Turning my linux environment into a clean complete UTF-8 environment made the trick for me  Try the following in your command line   export LC ALL en US UTF-8 export LANG en US UTF-8 export LANGUAGE en US UTF-8

User · Answer

If you want to replace the nulls with something you can do this   def fix nulls s       for line in s          yield line replace   0         r   csv reader fix nulls open

[python] "Line contains NULL byte" in CSV reader (Python)

The answer is

Examples related to python

Examples related to csv

Tags