How to solve the memory error in Python

Question

I am dealing with several large txt file  each of them has about 8000000 lines  A short example of the lines are   usedfor zipper fasten coat usedfor zipper fasten jacket usedfor zipper fasten pant usedfor your foot walk atlocation camera cupboard atlocation camera drawer atlocation camera house relatedto more plenty   The code to store them in a dictionary is    dicCSK   collections defaultdict list  for line in finCSK      line line strip   n       try          r  c1  c2   line split          except ValueError          print line     dicCSK c1  append r     c2    It runs good in the first txt file  but when it runs to the second txt file  I got an error MemoryError    I am using window 7 64bit with python 2 7 32bit  intel i5 cpu  with 8Gb memory  How can I solve the problem   Further explaining  I have four large files  each file contains different information for many entities  For example  I want to find all information for cat  its father node animal and its child node persian cat and so on  So my program first read all txt files in the dictionary  then I scan all dictionaries to find information for cat and its father and its children

User · Answer

Assuming your example text is representative of all the text, one line would consume about 75 bytes on my machine:

In [3]: sys.getsizeof('usedfor zipper fasten_coat')
Out[3]: 75

Doing some rough math:

75 bytes * 8,000,000 lines / 1024 / 1024 = ~572 MB

So roughly 572 meg to store the strings alone for one of these files. Once you start adding in additional, similarly structured and sized files, you'll quickly approach your virtual address space limits, as mentioned in @ShadowRanger's answer.

If upgrading your python isn't feasible for you, or if it only kicks the can down the road (you have finite physical memory after all), you really have two options: write your results to temporary files in-between loading in and reading the input files, or write your results to a database. Since you need to further post-process the strings after aggregating them, writing to a database would be the superior approach.

User · Answer

Simplest solution  You re probably running out of virtual address space  any other form of error usually means running really slowly for a long time before you finally get a MemoryError   This is because a 32 bit application on Windows  and most OSes  is limited to 2 GB of user mode address space  Windows can be tweaked to make it 3 GB  but that s still a low cap   You ve got 8 GB of RAM  but your program can t use  at least  3 4 of it  Python has a fair amount of per-object overhead  object header  allocation alignment  etc    odds are the strings alone are using close to a GB of RAM  and that s before you deal with the overhead of the dictionary  the rest of your program  the rest of Python  etc  If memory space fragments enough  and the dictionary needs to grow  it may not have enough contiguous space to reallocate  and you ll get a MemoryError   Install a 64 bit version of Python  if you can  I d recommend upgrading to Python 3 for other reasons   it will use more memory  but then  it will have access to a lot more memory space  and more physical RAM as well    If that s not enough  consider converting to a sqlite3 database  or some other DB   so it naturally spills to disk when the data gets too large for main memory  while still having fairly efficient lookup

[python] How to solve the memory error in Python

Examples related to python

Examples related to memory