This link (How to get line count cheaply in Python?) has lots of potential solutions, but they all ignore one way to make this run considerably faster, namely by using the unbuffered (raw) interface, using bytearrays, and doing your own buffering.
Using a modified version of the timing tool, I believe the following code is faster (and marginally more pythonic) than any of the solutions offered:
def _make_gen(reader):
b = reader(1024 * 1024)
while b:
yield b
b = reader(1024*1024)
def rawpycount(filename):
f = open(filename, 'rb')
f_gen = _make_gen(f.raw.read)
return sum( buf.count(b'\n') for buf in f_gen )
Here are my timings:
rawpycount 0.0048 0.0046 1.00
bufcount 0.0074 0.0066 1.43
wccount 0.01 0.01 2.17
itercount 0.014 0.014 3.04
opcount 0.021 0.02 4.43
kylecount 0.023 0.021 4.58
simplecount 0.022 0.022 4.81
mapcount 0.038 0.032 6.82
I would post it there, but I'm a relatively new user to stack exchange and don't have the requisite manna.
EDIT:
This can be done completely with generators expressions in-line using itertools, but it gets pretty weird looking:
from itertools import (takewhile,repeat)
def rawbigcount(filename):
f = open(filename, 'rb')
bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
return sum( buf.count(b'\n') for buf in bufgen if buf )