[php] Efficiently counting the number of lines of a text file. (200mb+)

I have just found out that my script gives me a fatal error:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109

That line is this:

$lines = count(file($path)) - 1;

So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues?

The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes.

Thanks all for any help.

This question is related to php file memory text memory-leaks

The answer is


The most succinct cross-platform solution that only buffers one line at a time.

$file = new \SplFileObject(__FILE__);
$file->setFlags($file::READ_AHEAD);
$lines = iterator_count($file);

Unfortunately, we have to set the READ_AHEAD flag otherwise iterator_count blocks indefinitely. Otherwise, this would be a one-liner.


private static function lineCount($file) {
    $linecount = 0;
    $handle = fopen($file, "r");
    while(!feof($handle)){
        if (fgets($handle) !== false) {
                $linecount++;
        }
    }
    fclose($handle);
    return  $linecount;     
}

I wanted to add a little fix to the function above...

in a specific example where i had a file containing the word 'testing' the function returned 2 as a result. so i needed to add a check if fgets returned false or not :)

have fun :)


There is a faster way I found that does not require looping through the entire file

only on *nix systems, there might be a similar way on windows ...

$file = '/path/to/your.file';

//Get number of lines
$totalLines = intval(exec("wc -l '$file'"));

public function quickAndDirtyLineCounter()
{
    echo "<table>";
    $folders = ['C:\wamp\www\qa\abcfolder\',
    ];
    foreach ($folders as $folder) {
        $files = scandir($folder);
        foreach ($files as $file) {
            if($file == '.' || $file == '..' || !file_exists($folder.'\\'.$file)){
                continue;
            }
                $handle = fopen($folder.'/'.$file, "r");
                $linecount = 0;
                while(!feof($handle)){
                    if(is_bool($handle)){break;}
                    $line = fgets($handle);
                    $linecount++;
                  }
                fclose($handle);
                echo "<tr><td>" . $folder . "</td><td>" . $file . "</td><td>" . $linecount . "</td></tr>";
            }
        }
        echo "</table>";
}

Using a loop of fgets() calls is fine solution and the most straightforward to write, however:

  1. even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.

  2. it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.

This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.

function getLines($file)
{
    $f = fopen($file, 'rb');
    $lines = 0;

    while (!feof($f)) {
        $lines += substr_count(fread($f, 8192), "\n");
    }

    fclose($f);

    return $lines;
}

If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.

Benchmark

I ran a test with a 1GB file; here are the results:

             +-------------+------------------+---------+
             | This answer | Dominic's answer | wc -l   |
+------------+-------------+------------------+---------+
| Lines      | 3550388     | 3550389          | 3550388 |
+------------+-------------+------------------+---------+
| Runtime    | 1.055       | 4.297            | 0.587   |
+------------+-------------+------------------+---------+

Time is measured in seconds real time, see here what real means


For just counting the lines use:

$handle = fopen("file","r");
static $b = 0;
while($a = fgets($handle)) {
    $b++;
}
echo $b;

This is an addition to Wallace de Souza's solution

It also skips empty lines while counting:

function getLines($file)
{
    $file = new \SplFileObject($file, 'r');
    $file->setFlags(SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY | 
SplFileObject::DROP_NEW_LINE);
    $file->seek(PHP_INT_MAX);

    return $file->key() + 1; 
}

There is another answer that I thought might be a good addition to this list.

If you have perl installed and are able to run things from the shell in PHP:

$lines = exec('perl -pe \'s/\r\n|\n|\r/\n/g\' ' . escapeshellarg('largetextfile.txt') . ' | wc -l');

This should handle most line breaks whether from Unix or Windows created files.

TWO downsides (at least):

1) It is not a great idea to have your script so dependent upon the system its running on ( it may not be safe to assume Perl and wc are available )

2) Just a small mistake in escaping and you have handed over access to a shell on your machine.

As with most things I know (or think I know) about coding, I got this info from somewhere else:

John Reeve Article


If you're using PHP 5.5 you can use a generator. This will NOT work in any version of PHP before 5.5 though. From php.net:

"Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface."

// This function implements a generator to load individual lines of a large file
function getLines($file) {
    $f = fopen($file, 'r');

    // read each line of the file without loading the whole file to memory
    while ($line = fgets($f)) {
        yield $line;
    }
}

// Since generators implement simple iterators, I can quickly count the number
// of lines using the iterator_count() function.
$file = '/path/to/file.txt';
$lineCount = iterator_count(getLines($file)); // the number of lines in the file

You have several options. The first is to increase the availble memory allowed, which is probably not the best way to do things given that you state the file can get very large. The other way is to use fgets to read the file line by line and increment a counter, which should not cause any memory issues at all as only the current line is in memory at any one time.


Counting the number of lines can be done by following codes:

<?php
$fp= fopen("myfile.txt", "r");
$count=0;
while($line = fgetss($fp)) // fgetss() is used to get a line from a file ignoring html tags
$count++;
echo "Total number of lines  are ".$count;
fclose($fp);
?>

I use this method for purely counting how many lines in a file. What is the downside of doing this verses the other answers. I'm seeing many lines as opposed to my two line solution. I'm guessing there's a reason nobody does this.

$lines = count(file('your.file'));
echo $lines;

If you're running this on a Linux/Unix host, the easiest solution would be to use exec() or similar to run the command wc -l $path. Just make sure you've sanitized $path first to be sure that it isn't something like "/path/to/file ; rm -rf /".


Based on dominic Rodger's solution, here is what I use (it uses wc if available, otherwise fallbacks to dominic Rodger's solution).

class FileTool
{

    public static function getNbLines($file)
    {
        $linecount = 0;

        $m = exec('which wc');
        if ('' !== $m) {
            $cmd = 'wc -l < "' . str_replace('"', '\\"', $file) . '"';
            $n = exec($cmd);
            return (int)$n + 1;
        }


        $handle = fopen($file, "r");
        while (!feof($handle)) {
            $line = fgets($handle);
            $linecount++;
        }
        fclose($handle);
        return $linecount;
    }
}

https://github.com/lingtalfi/Bat/blob/master/FileTool.php


Simple Oriented Object solution

$file = new \SplFileObject('file.extension');

while($file->valid()) $file->fgets();

var_dump($file->key());

Update

Another way to make this is with PHP_INT_MAX in SplFileObject::seek method.

$file = new \SplFileObject('file.extension', 'r');
$file->seek(PHP_INT_MAX);

echo $file->key() + 1; 

If you're under linux you can simply do:

number_of_lines = intval(trim(shell_exec("wc -l ".$file_name." | awk '{print $1}'")));

You just have to find the right command if you're using another OS

Regards


Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to memory

How does the "view" method work in PyTorch? How do I release memory used by a pandas dataframe? How to solve the memory error in Python Docker error : no space left on device Default Xmxsize in Java 8 (max heap size) How to set Apache Spark Executor memory What is the best way to add a value to an array in state How do I read a large csv file with pandas? How to clear variables in ipython? Error occurred during initialization of VM Could not reserve enough space for object heap Could not create the Java virtual machine

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable

Examples related to memory-leaks

AngularJS - Does $destroy remove event listeners? Implementing IDisposable correctly Best way to increase heap size in catalina.bat file If a DOM Element is removed, are its listeners also removed from memory? Resource leak: 'in' is never closed android activity has leaked window com.android.internal.policy.impl.phonewindow$decorview Issue This Handler class should be static or leaks might occur: IncomingHandler possible EventEmitter memory leak detected performSelector may cause a leak because its selector is unknown How can I create a memory leak in Java?