[bash] Extract data from log file in specified range of time

I want to extract information from a log file using a shell script (bash) based on time range. A line in the log file looks like this:

172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"

i want to extract data specific intervals. For example I need to look only at the events which happened during the last X minutes or X days ago from the last recorded data. I'm new in shell scripting but i have tried to use grep command.

This question is related to bash

The answer is


I used this command to find last 5 minutes logs for particular event "DHCPACK", try below:

$ grep "DHCPACK" /var/log/messages | grep "$(date +%h\ %d) [$(date --date='5 min ago' %H)-$(date +%H)]:*:*"

Use grep and regular expressions, for example if you want 4 minutes interval of logs:

grep "31/Mar/2002:19:3[1-5]" logfile

will return all logs lines between 19:31 and 19:35 on 31/Mar/2002. Supposing you need the last 5 days starting from today 27/Sep/2011 you may use the following:

grep "2[3-7]/Sep/2011" logfile

You can use sed for this. For example:

$ sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' /var/log/mail.log
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: connect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: disconnect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie pop3d: Connection, ip=[::ffff:127.0.0.1]
...

How it works

The -n switch tells sed to not output each line of the file it reads (default behaviour).

The last p after the regular expressions tells it to print lines that match the preceding expression.

The expression '/pattern1/,/pattern2/' will print everything that is between first pattern and second pattern. In this case it will print every line it finds between the string Feb 23 13:55 and the string Feb 23 14:00.

More info here.


You can use this for getting current and log times:

#!/bin/bash

log="log_file_name"
while read line
do
  current_hours=`date | awk 'BEGIN{FS="[ :]+"}; {print $4}'`
  current_minutes=`date | awk 'BEGIN{FS="[ :]+"}; {print $5}'`
  current_seconds=`date | awk 'BEGIN{FS="[ :]+"}; {print $6}'`

  log_file_hours=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $7}'`
  log_file_minutes=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $8}'`
  log_file_seconds=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $9}'`    
done < $log

And compare log_file_* and current_* variables.


well, I have spent some time on your date format.....

however, finally i worked it out..

let's take an example file (named logFile), i made it a bit short. say, you want to get last 5 mins' log in this file:

172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
### lines below are what you want (5 mins till the last record)
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 

here is the solution:

# this variable you could customize, important is convert to seconds. 
# e.g 5days=$((5*24*3600))
x=$((5*60))   #here we take 5 mins as example

# this line get the timestamp in seconds of last line of your logfile
last=$(tail -n1 logFile|awk -F'[][]' '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; print d;}' )

#this awk will give you lines you needs:
awk -F'[][]' -v last=$last -v x=$x '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; if (last-d<=x)print $0 }' logFile      

output:

172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET

EDIT

you may notice that in the output the [ and ] are disappeared. If you do want them back, you can change the last awk line print $0 -> print $1 "[" $2 "]" $3