[python] Cheap way to search a large text file for a string

This is entirely inspired by laurasia's answer above, but it refines the structure.

It also adds some checks:

  • It will correctly return 0 when searching an empty file for the empty string. In laurasia's answer, this is an edge case that will return -1.
  • It also pre-checks whether the goal string is larger than the buffer size, and raises an error if this is the case.

In practice, the goal string should be much smaller than the buffer for efficiency, and there are more efficient methods of searching if the size of the goal string is very close to the size of the buffer.

def fnd(fname, goal, start=0, bsize=4096):
    if bsize < len(goal):
        raise ValueError("The buffer size must be larger than the string being searched for.")
    with open(fname, 'rb') as f:
        if start > 0:
            f.seek(start)
        overlap = len(goal) - 1
        while True:
            buffer = f.read(bsize)
            pos = buffer.find(goal)
            if pos >= 0:
                return f.tell() - len(buffer) + pos
            if not buffer:
                return -1
            f.seek(f.tell() - overlap)