[python] "Line contains NULL byte" in CSV reader (Python)

I'm trying to write a program that looks at a .CSV file (input.csv) and rewrites only the rows that begin with a certain element (corrected.csv), as listed in a text file (output.txt).

This is what my program looks like right now:

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

Unfortunately, I keep getting this error, and I have no clue what it's about.

Traceback (most recent call last):
  File "C:\Python32\Sample Program\csvParser.py", line 12, in <module>
    for row in reader:
_csv.Error: line contains NULL byte

Credit to all the people here to even to get me to this point.

This question is related to python csv

The answer is


I'm guessing you have a NUL byte in input.csv. You can test that with

if '\0' in open('input.csv').read():
    print "you have null bytes in your input file"
else:
    print "you don't"

if you do,

reader = csv.reader(x.replace('\0', '') for x in mycsv)

may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.


You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.

See the (line.replace('\0','') for line in f) below, also you'll want to probably open that file up using mode rb.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'rb') as mycsv:
        reader = csv.reader( (line.replace('\0','') for line in mycsv) )
        for row in reader:
            if row[0] not in lines:
                writer.writerow(row)

I've recently fixed this issue and in my instance it was a file that was compressed that I was trying to read. Check the file format first. Then check that the contents are what the extension refers to.


This is long settled, but I ran across this answer because I was experiencing an unexpected error while reading a CSV to process as training data in Keras and TensorFlow.

In my case, the issue was much simpler, and is worth being conscious of. The data being produced into the CSV wasn't consistent, resulting in some columns being completely missing, which seems to end up throwing this error as well.

The lesson: If you're seeing this error, verify that your data looks the way that you think it does!


It is very simple.

don't make a csv file by "create new excel" or save as ".csv" from window.

simply import csv module, write a dummy csv file, and then paste your data in that.

csv made by python csv module itself will no longer show you encoding or blank line error.


This will tell you what line is the problem.

import csv

lines = []
with open('output.txt','r') as f:
    for line in f.readlines():
        lines.append(line[:-1])

with open('corrected.csv','w') as correct:
    writer = csv.writer(correct, dialect = 'excel')
    with open('input.csv', 'r') as mycsv:
        reader = csv.reader(mycsv)
        try:
            for i, row in enumerate(reader):
                if row[0] not in lines:
                   writer.writerow(row)
        except csv.Error:
            print('csv choked on line %s' % (i+1))
            raise

Perhaps this from daniweb would be helpful:

I'm getting this error when reading from a csv file: "Runtime Error! line contains NULL byte". Any idea about the root cause of this error?

...

Ok, I got it and thought I'd post the solution. Simply yet caused me grief... Used file was saved in a .xls format instead of a .csv Didn't catch this because the file name itself had the .csv extension while the type was still .xls


pandas.read_csv now handles the different UTF encoding when reading/writing and therefore can deal directly with null bytes

data = pd.read_csv(file, encoding='utf-16')

see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html


A tricky way:

If you develop under Lunux, you can use all the power of sed:

from subprocess import check_call, CalledProcessError

PATH_TO_FILE = '/home/user/some/path/to/file.csv'

try:
    check_call("sed -i -e 's|\\x0||g' {}".format(PATH_TO_FILE), shell=True)
except CalledProcessError as err:
    print(err)    

The most efficient solution for huge files.

Checked for Python3, Kubuntu


Turning my linux environment into a clean complete UTF-8 environment made the trick for me. Try the following in your command line:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

If you want to replace the nulls with something you can do this:

def fix_nulls(s):
    for line in s:
        yield line.replace('\0', ' ')

r = csv.reader(fix_nulls(open(...)))