[bash] How to get the part of a file after the first line that matches a regular expression?

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.

That is:

$ cat file | grep 'TERMINATE'     # It is found on line 534

So, I want the file from line 535 to line 1000 for further processing.

How can I do that?

This question is related to bash shell scripting grep

The answer is


If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:

tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file

and the following will print from the following line matching TERMINATE till the end of the file:

tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file

It takes 2 processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file dones not contain TERMINATE, the 1st command fails.


These will print all lines from the last found line "TERMINATE" till end of file:

LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG|tail -n 1|sed "s/:/ \\'/g"|awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

grep -A 10000000 'TERMINATE' file

  • is much, much faster than sed especially working on really big file. It works up to 10M lines (or whatever you put in) so no harm in making this big enough to handle about anything you hit.

As a simple approximation you could use

grep -A100000 TERMINATE file

which greps for TERMINATE and outputs up to 100000 lines following that line.

From man page

-A NUM, --after-context=NUM

Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.


If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. awk can do this in a simple way:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

Explanation:

  1. Although not best practice you could rely on the fact that all vars defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
  2. After the printing is done we check if the this is the starter-line (that should not be included).

This will print all lines after the TERMINATE-line.


Generalization:

  • You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
  • start- and end-lines could be defined by a regular expression matching the line.

Example:

$ cat ex_file.txt 
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt 
A good line to include
And this line
Yep
$

Explanation:

  1. If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
  2. Print the current line if found is set.
  3. If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.

Notes:

  • The code rely on the fact that all awk-vars defaults to 0 or the empty string if not defined. This is valid but may not be best practice so you could add a BEGIN{found=0} to the start of the awk-expression.
  • If multiple start-end-blocks is found they are all printed.

This could be a one way of doing it. If you know what line of the file you have your grep word and how many lines you have in your file:

grep -A466 'TERMINATE' file


sed is a much better tool for the job: sed -n '/re/,$p' file

where re is regexp.

Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.


Alternatives to the excellent sed answer by jfgagne, and which don't include the matching line :


A tool to use here is awk:

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

How does this work:

  1. We set the variable 'found' to zero, evaluating false
  2. if a match for 'TERMINATE' is found with the regular expression, we set it to one.
  3. If our 'found' variable evaluates to True, print :)

The other solutions might consume a lot of memory if you use them on very large files.


Use bash parameter expansion like the following:

content=$(cat file)
echo "${content#*TERMINATE}"

There are many ways to do it with sed or awk:

sed -n '/TERMINATE/,$p' file

This looks for TERMINATE in your file and prints from that line up to the end of the file.

awk '/TERMINATE/,0' file

This is exactly the same behaviour as sed.

In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):

awk 'NR>=535' file

Example

$ seq 10 > a        #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to scripting

What does `set -x` do? Creating an array from a text file in Bash Windows batch - concatenate multiple text files into one Raise error in a Bash script How do I assign a null value to a variable in PowerShell? Difference between ${} and $() in Bash Using a batch to copy from network drive to C: or D: drive Check if a string matches a regex in Bash script How to run a script at a certain time on Linux? How to make an "alias" for a long path?

Examples related to grep

grep's at sign caught as whitespace cat, grep and cut - translated to python How to suppress binary file matching results in grep Linux find and grep command together Filtering JSON array using jQuery grep() Linux Script to check if process is running and act on the result grep without showing path/file:line How do you grep a file and get the next 5 lines How to grep, excluding some patterns? Fast way of finding lines in one file that are not in another?