[python] Efficiently finding the last line in a text file

I need to extract the last line from a number of very large (several hundred megabyte) text files to get certain data. Currently, I am using python to cycle through all the lines until the file is empty and then I process the last line returned, but I am certain there is a more efficient way to do this.

What is the best way to retrieve just the last line of a text file using python?

This question is related to python text

The answer is


Use the file's seek method with a negative offset and whence=os.SEEK_END to read a block from the end of the file. Search that block for the last line end character(s) and grab all the characters after it. If there is no line end, back up farther and repeat the process.

def last_line(in_file, block_size=1024, ignore_ending_newline=False):
    suffix = ""
    in_file.seek(0, os.SEEK_END)
    in_file_length = in_file.tell()
    seek_offset = 0

    while(-seek_offset < in_file_length):
        # Read from end.
        seek_offset -= block_size
        if -seek_offset > in_file_length:
            # Limit if we ran out of file (can't seek backward from start).
            block_size -= -seek_offset - in_file_length
            if block_size == 0:
                break
            seek_offset = -in_file_length
        in_file.seek(seek_offset, os.SEEK_END)
        buf = in_file.read(block_size)

        # Search for line end.
        if ignore_ending_newline and seek_offset == -block_size and buf[-1] == '\n':
            buf = buf[:-1]
        pos = buf.rfind('\n')
        if pos != -1:
            # Found line end.
            return buf[pos+1:] + suffix

        suffix = buf + suffix

    # One-line file.
    return suffix

Note that this will not work on things that don't support seek, like stdin or sockets. In those cases, you're stuck reading the whole thing (like the tail command does).


The inefficiency here is not really due to Python, but to the nature of how files are read. The only way to find the last line is to read the file in and find the line endings. However, the seek operation may be used to skip to any byte offset in the file. You can, therefore begin very close to the end of the file, and grab larger and larger chunks as needed until the last line ending is found:

from os import SEEK_END

def get_last_line(file):
  CHUNK_SIZE = 1024 # Would be good to make this the chunk size of the filesystem

  last_line = ""

  while True:
    # We grab chunks from the end of the file towards the beginning until we 
    # get a new line
    file.seek(-len(last_line) - CHUNK_SIZE, SEEK_END)
    chunk = file.read(CHUNK_SIZE)

    if not chunk:
      # The whole file is one big line
      return last_line

    if not last_line and chunk.endswith('\n'):
      # Ignore the trailing newline at the end of the file (but include it 
      # in the output).
      last_line = '\n'
      chunk = chunk[:-1]

    nl_pos = chunk.rfind('\n')
    # What's being searched for will have to be modified if you are searching
    # files with non-unix line endings.

    last_line = chunk[nl_pos + 1:] + last_line

    if nl_pos == -1:
      # The whole chunk is part of the last line.
      continue

    return last_line

Not the straight forward way, but probably much faster than a simple Python implementation:

line = subprocess.check_output(['tail', '-1', filename])

lines = file.readlines()
fileHandle.close()
last_line = lines[-1]

Seek to the end of the file minus 100 bytes or so. Do a read and search for a newline. If here is no newline, seek back another 100 bytes or so. Lather, rinse, repeat. Eventually you'll find a newline. The last line begins immediately after that newline.

Best case scenario you only do one read of 100 bytes.


If you can pick a reasonable maximum line length, you can seek to nearly the end of the file before you start reading.

myfile.seek(-max_line_length, os.SEEK_END)
line = myfile.readlines()[-1]

If you do know the maximal length of a line, you can do

def getLastLine(fname, maxLineLength=80):
    fp=file(fname, "rb")
    fp.seek(-maxLineLength-1, 2) # 2 means "from the end of the file"
    return fp.readlines()[-1]

This works on my windows machine. But I do not know what happens on other platforms if you open a text file in binary mode. The binary mode is needed if you want to use seek().


with open('output.txt', 'r') as f:
    lines = f.read().splitlines()
    last_line = lines[-1]
    print last_line

Could you load the file into a mmap, then use mmap.rfind(string[, start[, end]]) to find the second last EOL character in the file? A seek to that point in the file should point you to the last line I would think.


Here's a slightly different solution. Instead of multi-line, I focused on just the last line, and instead of a constant block size, I have a dynamic (doubling) block size. See comments for more info.

# Get last line of a text file using seek method.  Works with non-constant block size.  
# IDK if that speed things up, but it's good enough for us, 
# especially with constant line lengths in the file (provided by len_guess), 
# in which case the block size doubling is not performed much if at all.  Currently,
# we're using this on a textfile format with constant line lengths.
# Requires that the file is opened up in binary mode.  No nonzero end-rel seeks in text mode.
REL_FILE_END = 2
def lastTextFileLine(file, len_guess=1):
    file.seek(-1, REL_FILE_END)      # 1 => go back to position 0;  -1 => 1 char back from end of file
    text = file.read(1)
    tot_sz = 1              # store total size so we know where to seek to next rel file end
    if text != b'\n':        # if newline is the last character, we want the text right before it
        file.seek(0, REL_FILE_END)    # else, consider the text all the way at the end (after last newline)
        tot_sz = 0
    blocks = []           # For storing succesive search blocks, so that we don't end up searching in the already searched
    j = file.tell()          # j = end pos
    not_done = True
    block_sz = len_guess
    while not_done:
        if j < block_sz:   # in case our block doubling takes us past the start of the file (here j also = length of file remainder)
            block_sz = j
            not_done = False
        tot_sz += block_sz
        file.seek(-tot_sz, REL_FILE_END)         # Yes, seek() works with negative numbers for seeking backward from file end
        text = file.read(block_sz)
        i = text.rfind(b'\n')
        if i != -1:
            text = text[i+1:].join(reversed(blocks))
            return str(text)
        else:
            blocks.append(text)
            block_sz <<= 1    # double block size (converge with open ended binary search-like strategy)
            j = j - block_sz      # if this doesn't work, try using tmp j1 = file.tell() above
    return str(b''.join(reversed(blocks)))      # if newline was never found, return everything read

Ideally, you'd wrap this in a class LastTextFileLine and keep track of a moving average of line lengths. This would give you a good len_guess maybe.


#!/usr/bin/python

count = 0

f = open('last_line1','r')

for line in f.readlines():

    line = line.strip()

    count = count + 1

    print line

print count

f.close()

count1 = 0

h = open('last_line1','r')

for line in h.readlines():

    line = line.strip()

    count1 = count1 + 1

    if count1 == count:

       print line         #-------------------- this is the last line

h.close()