How to process a file in PowerShell line-by-line as a stream

Question

I m working with some multi-gigabyte text files and want to do some stream processing on them using PowerShell  It s simple stuff  just parsing each line and pulling out some data  then storing it in a database   Unfortunately  get-content      whatever       appears to keep the entire set of lines at this stage of the pipe in memory  It s also surprisingly slow  taking a very long time to actually read it all in   So my question is two parts    How can I make it process the stream line by line and not keep the entire thing buffered in memory  I would like to avoid using up several gigs of RAM for this purpose  How can I make it run faster  PowerShell iterating over a get-content appears to be 100x slower than a C  script    I m hoping there s something dumb I m doing here  like missing a -LineBufferSize parameter or something

User · Accepted Answer

If you are really about to work on multi-gigabyte text files then do not use PowerShell. Even if you find a way to read it faster processing of huge amount of lines will be slow in PowerShell anyway and you cannot avoid this. Even simple loops are expensive, say for 10 million iterations (quite real in your case) we have:

# "empty" loop: takes 10 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) {} }

# "simple" job, just output: takes 20 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) { $i } }

# "more real job": 107 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) { $i.ToString() -match '1' } }

UPDATE: If you are still not scared then try to use the .NET reader:

$reader = [System.IO.File]::OpenText("my.log")
try {
    for() {
        $line = $reader.ReadLine()
        if ($line -eq $null) { break }
        # process the line
        $line
    }
}
finally {
    $reader.Close()
}

UPDATE 2

There are comments about possibly better / shorter code. There is nothing wrong with the original code with for and it is not pseudo-code. But the shorter (shortest?) variant of the reading loop is

$reader = [System.IO.File]::OpenText("my.log")
while($null -ne ($line = $reader.ReadLine())) {
    $line
}

User · Answer

If you want to use straight PowerShell check out the below code    content   Get-Content C  Users You Documents test txt foreach   line in  content        Write-Host  line

User · Answer

System IO File ReadLines   is perfect for this scenario  It returns all the lines of a file  but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory   Requires  NET 4 0 or higher   foreach   line in  System IO File   ReadLines  filename           do something with  line     http   msdn microsoft com en-us library dd383503 aspx

[powershell] How to process a file in PowerShell line-by-line as a stream

Examples related to powershell

Examples related to stream