[python] How to write UTF-8 in a CSV file

I am trying to create a text file in csv format out of a PyQt4 QTableWidget. I want to write the text with a UTF-8 encoding because it contains special characters. I use following code:

import codecs
...
myfile = codecs.open(filename, 'w','utf-8')
...
f = result.table.item(i,c).text()
myfile.write(f+";")

It works until the cell contains a special character. I tried also with

myfile = open(filename, 'w')
...
f = unicode(result.table.item(i,c).text(), "utf-8")

But it also stops when a special character appears. I have no idea what I am doing wrong.

This question is related to python csv encoding utf-8

The answer is


From your shell run:

pip2 install unicodecsv

And (unlike the original question) presuming you're using Python's built in csv module, turn
import csv into
import unicodecsv as csv in your code.


For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings

def encode_rows_to_utf8(rows):
    encoded_rows = []
    for row in rows:
        encoded_row = []
        for value in row:
            if isinstance(value, basestring):
                value = unicode(value).encode("utf-8")
            encoded_row.append(value)
        encoded_rows.append(encoded_row)
    return encoded_rows

Use this package, it just works: https://github.com/jdunck/python-unicodecsv.


A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:

    fd = codecs.open(tempfilename, 'wb', 'utf-8')  
    for c in whatever :
        fd.write( json.dumps(c) [1:-1] )   # json dumps writes ["a",..]
        fd.write('\n')
    fd.close()

Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*


For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.

For example:

csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

works, while:

csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

will throw AttributeError: 'int' object has no attribute 'encode'.

As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:

def to_utf8(lst):
    return [unicode(elem).encode('utf-8') for elem in lst]

...
csv_writer.writerow(to_utf8(row))

Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.


The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples

(can't copy the code here because it's protected by copyright)


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8