[python] How can I tail a log file in Python?

I'd like to make the output of tail -F or something similar available to me in Python without blocking or locking. I've found some really old code to do that here, but I'm thinking there must be a better way or a library to do the same thing by now. Anyone know of one?

Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data.

This question is related to python tail

The answer is


Python is "batteries included" - it has a nice solution for it: https://pypi.python.org/pypi/pygtail

Reads log file lines that have not been read. Remembers where it finished last time, and continues from there.

import sys
from pygtail import Pygtail

for line in Pygtail("some.log"):
    sys.stdout.write(line)

Adapting Ijaz Ahmad Khan's answer to only yield lines when they are completely written (lines end with a newline char) gives a pythonic solution with no external dependencies:

def follow(file) -> Iterator[str]:
    """ Yield each line from a file as they are written. """
    line = ''
    while True:
        tmp = file.readline()
        if tmp is not None:
            line += tmp
            if line.endswith("\n"):
                yield line
                line = ''
        else:
            time.sleep(0.1)


if __name__ == '__main__':
    for line in follow(open("test.txt", 'r')):
        print(line, end='')

You can also use 'AWK' command.
See more at: http://www.unix.com/shell-programming-scripting/41734-how-print-specific-lines-awk.html
awk can be used to tail last line, last few lines or any line in a file.
This can be called from python.


All the answers that use tail -f are not pythonic.

Here is the pythonic way: ( using no external tool or library)

def follow(thefile):
     while True:
        line = thefile.readline()
        if not line or not line.endswith('\n'):
            time.sleep(0.1)
            continue
        yield line



if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print(line, end='')

Non Blocking

If you are on linux (as windows does not support calling select on files) you can use the subprocess module along with the select module.

import time
import subprocess
import select

f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)

while True:
    if p.poll(1):
        print f.stdout.readline()
    time.sleep(1)

This polls the output pipe for new data and prints it when it is available. Normally the time.sleep(1) and print f.stdout.readline() would be replaced with useful code.

Blocking

You can use the subprocess module without the extra select module calls.

import subprocess
f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
    line = f.stdout.readline()
    print line

This will also print new lines as they are added, but it will block until the tail program is closed, probably with f.kill().


The only portable way to tail -f a file appears to be, in fact, to read from it and retry (after a sleep) if the read returns 0. The tail utilities on various platforms use platform-specific tricks (e.g. kqueue on BSD) to efficiently tail a file forever without needing sleep.

Therefore, implementing a good tail -f purely in Python is probably not a good idea, since you would have to use the least-common-denominator implementation (without resorting to platform-specific hacks). Using a simple subprocess to open tail -f and iterating through the lines in a separate thread, you can easily implement a non-blocking tail operation in Python.

Example implementation:

import threading, Queue, subprocess
tailq = Queue.Queue(maxsize=10) # buffer at most 100 lines

def tail_forever(fn):
    p = subprocess.Popen(["tail", "-f", fn], stdout=subprocess.PIPE)
    while 1:
        line = p.stdout.readline()
        tailq.put(line)
        if not line:
            break

threading.Thread(target=tail_forever, args=(fn,)).start()

print tailq.get() # blocks
print tailq.get_nowait() # throws Queue.Empty if there are no lines to read

If you are on linux you implement a non-blocking implementation in python in the following way.

import subprocess
subprocess.call('xterm -title log -hold -e \"tail -f filename\"&', shell=True, executable='/bin/csh')
print "Done"

Another option is the tailhead library that provides both Python versions of of tail and head utilities and API that can be used in your own module.

Originally based on the tailer module, its main advantage is the ability to follow files by path i.e. it can handle situation when file is recreated. Besides, it has some bug fixes for various edge cases.


Using the sh module (pip install sh):

from sh import tail
# runs forever
for line in tail("-f", "/var/log/some_log_file.log", _iter=True):
    print(line)

[update]

Since sh.tail with _iter=True is a generator, you can:

import sh
tail = sh.tail("-f", "/var/log/some_log_file.log", _iter=True)

Then you can "getNewData" with:

new_data = tail.next()

Note that if the tail buffer is empty, it will block until there is more data (from your question it is not clear what you want to do in this case).

[update]

This works if you replace -f with -F, but in Python it would be locking. I'd be more interested in having a function I could call to get new data when I want it, if that's possible. – Eli

A container generator placing the tail call inside a while True loop and catching eventual I/O exceptions will have almost the same effect of -F.

def tail_F(some_file):
    while True:
        try:
            for line in sh.tail("-f", some_file, _iter=True):
                yield line
        except sh.ErrorReturnCode_1:
            yield None

If the file becomes inaccessible, the generator will return None. However it still blocks until there is new data if the file is accessible. It remains unclear for me what you want to do in this case.

Raymond Hettinger approach seems pretty good:

def tail_F(some_file):
    first_call = True
    while True:
        try:
            with open(some_file) as input:
                if first_call:
                    input.seek(0, 2)
                    first_call = False
                latest_data = input.read()
                while True:
                    if '\n' not in latest_data:
                        latest_data += input.read()
                        if '\n' not in latest_data:
                            yield ''
                            if not os.path.isfile(some_file):
                                break
                            continue
                    latest_lines = latest_data.split('\n')
                    if latest_data[-1] != '\n':
                        latest_data = latest_lines[-1]
                    else:
                        latest_data = input.read()
                    for line in latest_lines[:-1]:
                        yield line + '\n'
        except IOError:
            yield ''

This generator will return '' if the file becomes inaccessible or if there is no new data.

[update]

The second to last answer circles around to the top of the file it seems whenever it runs out of data. – Eli

I think the second will output the last ten lines whenever the tail process ends, which with -f is whenever there is an I/O error. The tail --follow --retry behavior is not far from this for most cases I can think of in unix-like environments.

Perhaps if you update your question to explain what is your real goal (the reason why you want to mimic tail --retry), you will get a better answer.

The last answer does not actually follow the tail and merely reads what's available at run time. – Eli

Of course, tail will display the last 10 lines by default... You can position the file pointer at the end of the file using file.seek, I will left a proper implementation as an exercise to the reader.

IMHO the file.read() approach is far more elegant than a subprocess based solution.


Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data

We've already got one and itsa very nice. Just call f.read() whenever you want more data. It will start reading where the previous read left off and it will read through the end of the data stream:

f = open('somefile.log')
p = 0
while True:
    f.seek(p)
    latest_data = f.read()
    p = f.tell()
    if latest_data:
        print latest_data
        print str(p).center(10).center(80, '=')

For reading line-by-line, use f.readline(). Sometimes, the file being read will end with a partially read line. Handle that case with f.tell() finding the current file position and using f.seek() for moving the file pointer back to the beginning of the incomplete line. See this ActiveState recipe for working code.


You could use the 'tailer' library: https://pypi.python.org/pypi/tailer/

It has an option to get the last few lines:

# Get the last 3 lines of the file
tailer.tail(open('test.txt'), 3)
# ['Line 9', 'Line 10', 'Line 11']

And it can also follow a file:

# Follow the file as it grows
for line in tailer.follow(open('test.txt')):
    print line

If one wants tail-like behaviour, that one seems to be a good option.