Efficiently finding the last line in a text file

Question

I need to extract the last line from a number of very large  several hundred megabyte  text files to get certain data   Currently  I am using python to cycle through all the lines until the file is empty and then I process the last line returned  but I am certain there is a more efficient way to do this     What is the best way to retrieve just the last line of a text file using python

User · Answer

lines   file readlines   fileHandle close   last line   lines -1

User · Answer

Use the file s seek method with a negative offset and whence os SEEK END to read a block from the end of the file  Search that block for the last line end character s  and grab all the characters after it  If there is no line end  back up farther and repeat the process   def last line in file  block size 1024  ignore ending newline False       suffix          in file seek 0  os SEEK END      in file length   in file tell       seek offset   0      while -seek offset  lt  in file length             Read from end          seek offset -  block size         if -seek offset  gt  in file length                Limit if we ran out of file  can t seek backward from start               block size -  -seek offset - in file length             if block size    0                  break             seek offset   -in file length         in file seek seek offset  os SEEK END          buf   in file read block size             Search for line end          if ignore ending newline and seek offset    -block size and buf -1       n               buf   buf  -1          pos   buf rfind   n           if pos    -1                Found line end              return buf pos 1     suffix          suffix   buf   suffix        One-line file      return suffix   Note that this will not work on things that don t support seek  like stdin or sockets  In those cases  you re stuck reading the whole thing  like the tail command does

User · Answer

usr bin python  count   0  f   open  last line1   r    for line in f readlines         line   line strip        count   count   1      print line  print count  f close    count1   0  h   open  last line1   r    for line in h readlines         line   line strip        count1   count1   1      if count1    count          print line          -------------------- this is the last line  h close

User · Answer

Not the straight forward way  but probably much faster than a simple Python implementation   line   subprocess check output   tail    -1   filename

User · Answer

If you can pick a reasonable maximum line length  you can seek to nearly the end of the file before you start reading   myfile seek -max line length  os SEEK END  line   myfile readlines   -1

User · Answer

Could you load the file into a mmap  then use mmap rfind string   start   end    to find the second last EOL character in the file      A seek to that point in the file should point you to the last line I would think

User · Answer

Seek to the end of the file minus 100 bytes or so  Do a read and search for a newline  If here is no newline  seek back another 100 bytes or so  Lather  rinse  repeat  Eventually you ll find a newline  The last line begins immediately after that newline   Best case scenario you only do one read of 100 bytes

User · Answer

The inefficiency here is not really due to Python  but to the nature of how files are read   The only way to find the last line is to read the file in and find the line endings   However  the seek operation may be used to skip to any byte offset in the file   You can  therefore begin very close to the end of the file  and grab larger and larger chunks as needed until the last line ending is found   from os import SEEK END  def get last line file     CHUNK SIZE   1024   Would be good to make this the chunk size of the filesystem    last line         while True        We grab chunks from the end of the file towards the beginning until we        get a new line     file seek -len last line  - CHUNK SIZE  SEEK END      chunk   file read CHUNK SIZE       if not chunk          The whole file is one big line       return last line      if not last line and chunk endswith   n            Ignore the trailing newline at the end of the file  but include it          in the output         last line     n        chunk   chunk  -1       nl pos   chunk rfind   n         What s being searched for will have to be modified if you are searching       files with non-unix line endings       last line   chunk nl pos   1     last line      if nl pos    -1          The whole chunk is part of the last line        continue      return last line

User · Answer

with open  output txt    r   as f      lines   f read   splitlines       last line   lines -1      print last line

User · Answer

Here s a slightly different solution   Instead of multi-line  I focused on just the last line  and instead of a constant block size  I have a dynamic  doubling  block size   See comments for more info     Get last line of a text file using seek method   Works with non-constant block size      IDK if that speed things up  but it s good enough for us     especially with constant line lengths in the file  provided by len guess      in which case the block size doubling is not performed much if at all   Currently    we re using this on a textfile format with constant line lengths    Requires that the file is opened up in binary mode   No nonzero end-rel seeks in text mode  REL FILE END   2 def lastTextFileLine file  len guess 1       file seek -1  REL FILE END         1   gt  go back to position 0   -1   gt  1 char back from end of file     text   file read 1      tot sz   1                store total size so we know where to seek to next rel file end     if text    b  n            if newline is the last character  we want the text right before it         file seek 0  REL FILE END       else  consider the text all the way at the end  after last newline          tot sz   0     blocks                  For storing succesive search blocks  so that we don t end up searching in the already searched     j   file tell              j   end pos     not done   True     block sz   len guess     while not done          if j  lt  block sz      in case our block doubling takes us past the start of the file  here j also   length of file remainder              block sz   j             not done   False         tot sz    block sz         file seek -tot sz  REL FILE END            Yes  seek   works with negative numbers for seeking backward from file end         text   file read block sz          i   text rfind b  n           if i    -1              text   text i 1   join reversed blocks               return str text          else              blocks append text              block sz  lt  lt   1      double block size  converge with open ended binary search-like strategy              j   j - block sz        if this doesn t work  try using tmp j1   file tell   above     return str b   join reversed blocks           if newline was never found  return everything read   Ideally  you d wrap this in a class LastTextFileLine and keep track of a moving average of line lengths   This would give you a good len guess maybe

User · Answer

If you do know the maximal length of a line  you can do  def getLastLine fname  maxLineLength 80       fp file fname   rb       fp seek -maxLineLength-1  2    2 means  from the end of the file      return fp readlines   -1    This works on my windows machine  But I do not know what happens on other platforms if you open a text file in binary mode  The binary mode is needed if you want to use seek

[python] Efficiently finding the last line in a text file

Examples related to python

Examples related to text