csv Error field larger than field limit 131072

Question

I have a script reading in a csv file with very huge fields     example from http   docs python org 3 3 library csv html highlight csv 20dictreader examples import csv with open  some csv   newline     as f      reader   csv reader f      for row in reader          print row    However  this throws the following error on some csv files    csv Error  field larger than field limit  131072    How can I analyze csv files with huge fields  Skipping the lines with huge fields is not an option as the data needs to be analyzed in subsequent steps

User · Accepted Answer

The csv file might contain very huge fields, therefore increase the field_size_limit:

import sys
import csv

csv.field_size_limit(sys.maxsize)

sys.maxsize works for Python 2.x and 3.x. sys.maxint would only work with Python 2.x (SO: what-is-sys-maxint-in-python-3)

Update

As Geoff pointed out, the code above might result in the following error: OverflowError: Python int too large to convert to C long. To circumvent this, you could use the following quick and dirty code (which should work on every system with Python 2 and Python 3):

import sys
import csv
maxInt = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.

    try:
        csv.field_size_limit(maxInt)
        break
    except OverflowError:
        maxInt = int(maxInt/10)

User · Answer

Find the cqlshrc file usually placed in  cassandra directory   In that file append    csv  field size limit   1000000000

User · Answer

Sometimes  a row contain double quote column  When csv reader try read this row  not understood end of column and fire this raise  Solution is below   reader   csv reader cf  quoting csv QUOTE MINIMAL

User · Answer

You can use read csv from pandas to skip these lines   import pandas as pd  data df   pd read csv  data csv   error bad lines False

User · Answer

This could be because your CSV file has embedded single or double quotes  If your CSV file is tab-delimited try opening it as   c   csv reader f  delimiter   t   quoting csv QUOTE NONE

User · Answer

csv field sizes are controlled via  Python 3 Docs   csv field size limit  new limit    emphasis is mine    Returns the current maximum field size allowed by the parser  If new limit is given  this becomes the new limit   It is set by default to 131072 or 0x20000  128k   which should be enough for any decent  csv     gt  gt  gt  import csv  gt  gt  gt   gt  gt  gt   gt  gt  gt  limit0   csv field size limit    gt  gt  gt  limit0 131072  gt  gt  gt   quot 0x 0 016X  quot  format limit0   0x0000000000020000    However  when dealing with a  csv file  with the correct quoting and delimiter  having  at least  one field longer than this size  the error pops up  To get rid of the error  the size limit should be increased  to avoid any worries  the maximum possible value is attempted   Behind the scenes  check  GitHub   python cpython -  master  cpython Modules  csv c for implementation details   the variable that holds this value is a C long   Wikipedia   C data types   whose size varies depending on CPU architecture and OS  ILP   The classical difference  for a 64bit OS  and Python build   the long type size  in bits  is   Nix  64 Win  32  When attempting to set it  the new value is checked to be in the long boundaries  that s why in some cases another exception pops up  because sys maxsize is typically 64bit wide - encountered on Win     gt  gt  gt  import sys  ctypes as ct  gt  gt  gt   gt  gt  gt   gt  gt  gt  sys platform  sys maxsize  ct sizeof ct c void p    8  ct sizeof ct c long    8   win32   9223372036854775807  64  32   gt  gt  gt   gt  gt  gt  csv field size limit sys maxsize  Traceback  most recent call last     File  quot  lt stdin gt  quot   line 1  in  lt module gt  OverflowError  Python int too large to convert to C long   To avoid running into this problem  set the  maximum possible  limit  LONG MAX   using an artifice  thanks to  Python 3 Docs   ctypes - A foreign function library for Python   It should work on Python 3 and Python 2  on any CPU   OS    gt  gt  gt  csv field size limit int ct c ulong -1  value    2   131072  gt  gt  gt  limit1   csv field size limit    gt  gt  gt  limit1 2147483647  gt  gt  gt   quot 0x 0 016X  quot  format limit1   0x000000007FFFFFFF    64bit Python on a Nix like OS    gt  gt  gt  import sys  csv  ctypes as ct  gt  gt  gt   gt  gt  gt   gt  gt  gt  sys platform  sys maxsize  ct sizeof ct c void p    8  ct sizeof ct c long    8   linux   9223372036854775807  64  64   gt  gt  gt   gt  gt  gt  csv field size limit   131072  gt  gt  gt   gt  gt  gt  csv field size limit int ct c ulong -1  value    2   131072  gt  gt  gt  limit1   csv field size limit    gt  gt  gt  limit1 9223372036854775807  gt  gt  gt   quot 0x 0 016X  quot  format limit1   0x7FFFFFFFFFFFFFFF    For 32bit Python  things should run smoothly without the artifice  as both sys maxsize and LONG MAX are 32bit wide   If this maximum value is still not enough  then the  csv would need manual intervention in order to be processed from Python  Check the following resources for more details on   Playing with C types boundaries from Python   SO   Maximum and minimum value of C types integers from Python   CristiFati s answer  Python 32bit vs 64bit differences   SO   How do I determine if my python shell is executing in 32bit or 64bit mode on OS X    CristiFati s answer

User · Answer

Below is to check the current limit  csv field size limit     Out 20   131072  Below is to increase the limit  Add it to the code  csv field size limit 100000000    Try checking the limit again  csv field size limit     Out 22   100000000  Now you won t get the error   csv Error  field larger than field limit  131072

User · Answer

I just had this happen to me on a  plain  CSV file  Some people might call it an invalid formatted file  No escape characters  no double quotes and delimiter was a semicolon   A sample line from this file would look like this      First cell  Second   Cell with one double quote and leading   space  Partially quoted  cell Last cell   the single quote in the second cell would throw the parser off its rails  What worked was   csv reader inputfile  delimiter      doublequote  False   quotechar     quoting csv QUOTE NONE

[python] _csv.Error: field larger than field limit (131072)

Update

Examples related to python

Examples related to csv