Read and Write CSV files including unicode with Python 2 7

Question

I am new to Python  and I have a question about how to use Python to read and write CSV files  My file contains like Germany  French  etc  According to my code  the files can be read correctly in Python  but when I write it into a new CSV file  the unicode becomes some strange characters   The data is like    And my code is   import csv  f open  xxx csv   rb   reader csv reader f   wt open  lll csv   wb   writer csv writer wt quoting csv QUOTE ALL   wt close   f close     And the result is like    What should I do to solve the problem

User · Answer

Another alternative    Use the code from the unicodecsv package      https   pypi python org pypi unicodecsv    gt  gt  gt  import unicodecsv as csv  gt  gt  gt  from io import BytesIO  gt  gt  gt  f   BytesIO    gt  gt  gt  w   csv writer f  encoding  utf-8    gt  gt  gt      w writerow  u      u        gt  gt  gt      f seek 0   gt  gt  gt  r   csv reader f  encoding  utf-8    gt  gt  gt  next r      u      u      True   This module is API compatible with the STDLIB csv module

User · Answer

Make sure you encode and decode as appropriate   This example will roundtrip some example text in utf-8 to a csv file and back out to demonstrate     - - coding  utf-8 - - import csv  tests   German    u Stra  e  u ausl  sen  u zerst  ren             French    u fran  ais  u am  ricaine  u   pais             Chinese    u      u     u         with open   tmp utf csv   w   as fout      writer csv writer fout          writer writerows  tests keys         for row in zip  tests values             row  s encode  utf-8   for s in row          writer writerows  row    with open   tmp utf csv   r   as fin      reader csv reader fin      for row in reader          temp list row          fmt u    lt 15   len temp          print fmt format   s decode  utf-8   for s in temp     Prints   German         Chinese        French          Stra  e                        fran  ais        ausl  sen                      am  ricaine      zerst  ren                       pais

User · Answer

Because str in python2 is bytes actually  So if want to write unicode to csv  you must encode unicode to str using utf-8 encoding   def py2 unicode to str u         unicode is only exist in python2     assert isinstance u  unicode      return u encode  utf-8     Use class csv DictWriter csvfile  fieldnames  restval     extrasaction  raise   dialect  excel    args    kwds     py2   The csvfile  open fp   w   pass key and value in bytes which are encoded with utf-8   writer writerow  py2 unicode to str k   py2 unicode to str v  for k v in row items       py3   The csvfile  open fp   w   pass normal dict contains str as row to writer writerow row     Finally code  import sys  is py2   sys version info 0     2  def py2 unicode to str u         unicode is only exist in python2     assert isinstance u  unicode      return u encode  utf-8    with open  file csv    w   as f      if is py2          data    u Python     u Python     u Python  2   u Python  2              just one more line to handle this         data    py2 unicode to str k   py2 unicode to str v  for k  v in data items             fields   list data 0           writer   csv DictWriter f  fieldnames fields           for row in data              writer writerow row      else          data     Python      Python      Python  2    Python  2            fields   list data 0           writer   csv DictWriter f  fieldnames fields           for row in data              writer writerow row    Conclusion  In python3  just use the unicode str   In python2  use unicode handle text  use str when I O occurs

User · Answer

I had the very same issue  The answer is that you are doing it right already  It is the problem of MS Excel  Try opening the file with another editor and you will notice that your encoding was successful already  To make MS Excel happy  move from UTF-8 to UTF-16  This should work   class UnicodeWriter  def   init   self  f  dialect csv excel tab  encoding  utf-16     kwds         Redirect output to a queue     self queue   StringIO StringIO       self writer   csv writer self queue  dialect dialect    kwds      self stream   f        Force BOM     if encoding   utf-16           import codecs         f write codecs BOM UTF16       self encoding   encoding  def writerow self  row         Modified from original  now using unicode s  to deal with e g  ints     self writer writerow  unicode s  encode  utf-8   for s in row         Fetch UTF-8 output from the queue         data   self queue getvalue       data   data decode  utf-8             and reencode it into the target encoding     data   data encode self encoding         strip BOM     if self encoding     utf-16           data   data 2          write to the target stream     self stream write data        empty queue     self queue truncate 0   def writerows self  rows       for row in rows          self writerow row

User · Answer

There is an example at the end of the csv module documentation that demonstrates how to deal with Unicode   Below is copied directly from that example   Note that the strings read or written will be Unicode strings   Don t pass a byte string to UnicodeWriter writerows  for example   import csv codecs cStringIO  class UTF8Recoder      def   init   self  f  encoding           self reader   codecs getreader encoding  f      def   iter   self           return self     def next self           return self reader next   encode  utf-8    class UnicodeReader      def   init   self  f  dialect csv excel  encoding  utf-8-sig     kwds           f   UTF8Recoder f  encoding          self reader   csv reader f  dialect dialect    kwds      def next self              next   - gt  unicode         This function reads and returns the next line as a Unicode string                      row   self reader next           return  unicode s   utf-8   for s in row      def   iter   self           return self  class UnicodeWriter      def   init   self  f  dialect csv excel  encoding  utf-8-sig     kwds           self queue   cStringIO StringIO           self writer   csv writer self queue  dialect dialect    kwds          self stream   f         self encoder   codecs getincrementalencoder encoding        def writerow self  row              writerow unicode  - gt  None         This function takes a Unicode string and encodes it to the output                      self writer writerow  s encode  utf-8   for s in row           data   self queue getvalue           data   data decode  utf-8           data   self encoder encode data          self stream write data          self queue truncate 0       def writerows self  rows           for row in rows              self writerow row   with open  xxx csv   rb   as fin  open  lll csv   wb   as fout      reader   UnicodeReader fin      writer   UnicodeWriter fout quoting csv QUOTE ALL      for line in reader          writer writerow line    Input  UTF-8 encoded    American     French     German       Output    American         French         German

User · Answer

I couldn t respond to Mark above  but I just made one modification which fixed the error which was caused if data in the cells was not unicode  i e  float or int data   I replaced this line into the UnicodeWriter function   self writer writerow  s encode  utf-8   if type s   types UnicodeType else s for s in row    so that it became   class UnicodeWriter      def   init   self  f  dialect csv excel  encoding  utf-8-sig     kwds          self queue   cStringIO StringIO           self writer   csv writer self queue  dialect dialect    kwds          self stream   f         self encoder   codecs getincrementalencoder encoding        def writerow self  row              writerow unicode  - gt  None         This function takes a Unicode string and encodes it to the output                      self writer writerow  s encode  utf-8   if type s   types UnicodeType else s for s in row           data   self queue getvalue           data   data decode  utf-8           data   self encoder encode data          self stream write data          self queue truncate 0       def writerows self  rows           for row in rows              self writerow row    You will also need to  import types

[python] Read and Write CSV files including unicode with Python 2.7

Examples related to python

Examples related to csv

Examples related to python-2.7

Examples related to unicode

Examples related to export