Efficiently counting the number of lines of a text file 200mb

Question

I have just found out that my script gives me a fatal error   Fatal error  Allowed memory size of 268435456 bytes exhausted  tried to allocate 440 bytes  in C  process txt php on line 109   That line is this    lines   count file  path   - 1    So I think it is having difficulty loading the file into memeory and counting the number of lines  is there a more efficient way I can do this without having memory issues   The text files that I need to count the number of lines for range from 2MB to 500MB  Maybe a Gig sometimes   Thanks all for any help

User · Answer

Counting the number of lines can be done by following codes    lt  php  fp  fopen  myfile txt    r     count 0  while  line   fgetss  fp      fgetss   is used to get a line from a file ignoring html tags  count    echo  Total number of lines  are    count  fclose  fp     gt

User · Answer

This is an addition to Wallace de Souza s solution  It also skips empty lines while counting   function getLines  file         file   new  SplFileObject  file   r         file- gt setFlags SplFileObject  READ AHEAD   SplFileObject  SKIP EMPTY    SplFileObject  DROP NEW LINE        file- gt seek PHP INT MAX        return  file- gt key     1

User · Answer

Using a loop of fgets   calls is fine solution and the most straightforward to write  however    even though internally the file is read using a buffer of 8192 bytes  your code still has to call that function for each line  it s technically possible that a single line may be bigger than the available memory if you re reading a binary file    This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk   function getLines  file         f   fopen  file   rb         lines   0       while   feof  f              lines    substr count fread  f  8192     n               fclose  f        return  lines      If the average length of each line is at most 4kB  you will already start saving on function calls  and those can add up when you process big files   Benchmark  I ran a test with a 1GB file  here are the results                 ------------- ------------------ ---------                 This answer   Dominic s answer   wc -l      ------------ ------------- ------------------ ---------    Lines        3550388       3550389            3550388    ------------ ------------- ------------------ ---------    Runtime      1 055         4 297              0 587      ------------ ------------- ------------------ ---------    Time is measured in seconds real time  see here what real means

User · Answer

Based on dominic Rodger s solution  here is what I use  it uses wc if available  otherwise fallbacks to dominic Rodger s solution     class FileTool        public static function getNbLines  file                 linecount   0            m   exec  which wc            if          m                 cmd    wc -l  lt       str replace              file                      n   exec  cmd               return  int  n   1                       handle   fopen  file   r            while   feof  handle                  line   fgets  handle                linecount                      fclose  handle           return  linecount            https   github com lingtalfi Bat blob master FileTool php

User · Answer

public function quickAndDirtyLineCounter         echo   lt table gt         folders     C  wamp www qa abcfolder               foreach   folders as  folder             files   scandir  folder           foreach   files as  file                if  file            file             file exists  folder       file                    continue                                 handle   fopen  folder      file   r                     linecount   0                  while  feof  handle                        if is bool  handle   break                        line   fgets  handle                        linecount                                        fclose  handle                   echo   lt tr gt  lt td gt      folder     lt  td gt  lt td gt      file     lt  td gt  lt td gt      linecount     lt  td gt  lt  tr gt                                    echo   lt  table gt

User · Answer

This will use less memory  since it doesn t load the whole file into memory    file  largefile txt    linecount   0   handle   fopen  file   r    while  feof  handle       line   fgets  handle      linecount       fclose  handle    echo  linecount    fgets loads a single line into memory  if the second argument  length is omitted it will keep reading from the stream until it reaches the end of the line  which is what we want   This is still unlikely to be as quick as using something other than PHP  if you care about wall time as well as memory usage   The only danger with this is if any lines are particularly long  what if you encounter a 2GB file without line breaks    In which case you re better off doing slurping it in in chunks  and counting end-of-line characters    file  largefile txt    linecount   0   handle   fopen  file   r    while  feof  handle       line   fgets  handle  4096      linecount    linecount   substr count  line  PHP EOL      fclose  handle    echo  linecount

User · Answer

There is another answer that I thought might be a good addition to this list   If you have perl installed and are able to run things from the shell in PHP    lines   exec  perl -pe   s  r n  n  r  n g       escapeshellarg  largetextfile txt         wc -l      This should handle most line breaks whether from Unix or Windows created files     TWO downsides  at least    1  It is not a great idea to have your script so dependent upon the system its running on   it may not be safe to assume Perl and wc are available    2  Just a small mistake in escaping and you have handed over access to a shell on your machine   As with most things I know  or think I know  about coding  I got this info from somewhere else    John Reeve Article

User · Answer

For just counting the lines use    handle   fopen  file   r    static  b   0  while  a   fgets  handle          b      echo  b

User · Answer

If you re using PHP 5 5 you can use a generator  This will NOT work in any version of PHP before 5 5 though  From php net    Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface       This function implements a generator to load individual lines of a large file function getLines  file         f   fopen  file   r            read each line of the file without loading the whole file to memory     while   line   fgets  f             yield  line              Since generators implement simple iterators  I can quickly count the number    of lines using the iterator count   function   file     path to file txt    lineCount   iterator count getLines  file       the number of lines in the file

User · Answer

You have several options  The first is to increase the availble memory allowed  which is probably not the best way to do things given that you state the file can get very large  The other way is to use fgets to read the file line by line and increment a counter  which should not cause any memory issues at all as only the current line is in memory at any one time

User · Answer

I use this method for purely counting how many lines in a file  What is the downside of doing this verses the other answers  I m seeing many lines as opposed to my two line solution  I m guessing there s a reason nobody does this    lines   count file  your file     echo  lines

User · Answer

If you re running this on a Linux Unix host  the easiest solution would be to use exec   or similar to run the command wc -l  path   Just make sure you ve sanitized  path first to be sure that it isn t something like   path to file   rm -rf

User · Answer

If you re under linux you can simply do   number of lines   intval trim shell exec  wc -l    file name     awk   print  1          You just have to find the right command if you re using another OS  Regards

User · Answer

There is a faster way I found that does not require looping through the entire file   only on  nix systems  there might be a similar way on windows       file     path to your file      Get number of lines  totalLines   intval exec  wc -l   file

User · Answer

private static function lineCount  file         linecount   0       handle   fopen  file   r        while  feof  handle            if  fgets  handle      false                     linecount                        fclose  handle       return   linecount           I wanted to add a little fix to the function above     in a specific example where i had a file containing the word  testing  the function returned 2 as a result  so i needed to add a check if fgets returned false or not     have fun

User · Answer

Simple Oriented Object solution   file   new  SplFileObject  file extension     while  file- gt valid     file- gt fgets     var dump  file- gt key       Update  Another way to make this is with PHP INT MAX in SplFileObject  seek method    file   new  SplFileObject  file extension    r     file- gt seek PHP INT MAX    echo  file- gt key     1

User · Answer

The most succinct cross-platform solution that only buffers one line at a time    file   new  SplFileObject   FILE      file- gt setFlags  file  READ AHEAD    lines   iterator count  file     Unfortunately  we have to set the READ AHEAD flag otherwise iterator count blocks indefinitely  Otherwise  this would be a one-liner

[php] Efficiently counting the number of lines of a text file. (200mb+)

Examples related to php

Examples related to file

Examples related to memory

Examples related to text

Examples related to memory-leaks