[powershell] Remove Top Line of Text File with PowerShell

I am trying to just remove the first line of about 5000 text files before importing them.

I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:

set-content file (get-content unless line contains amount)

However, I can't seem to figure out how to do something like contains.

This question is related to powershell

The answer is


While I really admire the answer from @hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).

Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

... or in short form:

(gc $file | select -Skip 1) | sc $file

I just learned from a website:

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

Or you can use the aliases to make it short, like:

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }

Another approach to remove the first line from file, using multiple assignment technique. Refer Link

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename

$x = get-content $file
$x[1..$x.count] | set-content $file

Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.

For example, if we define an array variable like this,

$array = @("first item","second item","third item")

so $array returns

first item
second item
third item

then we can "index into" that array to retrieve only its 1st element

$array[0]

or only its 2nd

$array[1]

or a range of index values from the 2nd through the last.

$array[1..$array.count]

For smaller files you could use this:

& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null

... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.


Using variable notation, you can do it without a temporary file:

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.

My solution was to use a more .NET approach: StreamReader + StreamWriter. See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?

Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

It returned:

188.1224443

Inspired by AASoft's answer, I went out to improve it a bit more:

  1. Avoid the loop variable $i and the comparison with 0 in every loop
  2. Wrap the execution into a try..finally block to always close the files in use
  3. Make the solution work for an arbitrary number of lines to remove from the beginning of the file
  4. Use a variable $p to reference the current directory

These changes lead to the following code:

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.


skip` didn't work, so my workaround is

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force