[python] 'negative' pattern matching in python

I have the following input,

OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.

And I'd like to extract all of the input except the line containing "OK SYS 10 LEN 20" and the last line which contains a single "." (dot). That is, I want to extract the following

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt.1234 /data/c13af4/f.txt

I tried the following,

for item in output:
    matchObj = re.search("^(?!OK) | ^(?!\\.)", item)
    if matchObj:
        print "got item "  + item

but it does not work, as it does not produce any output.

This question is related to python regex

The answer is


If this is a file, you can simply skip the first and last lines and read the rest with csv:

>>> s = """OK SYS 10 LEN 20 12 43
... 1233a.fdads.txt,23 /data/a11134/a.txt
... 3232b.ddsss.txt,32 /data/d13f11/b.txt
... 3452d.dsasa.txt,1234 /data/c13af4/f.txt
... ."""
>>> stream = StringIO.StringIO(s)
>>> rows = [row for row in csv.reader(stream,delimiter=',') if len(row) == 2]
>>> rows
[['1233a.fdads.txt', '23 /data/a11134/a.txt'], ['3232b.ddsss.txt', '32 /data/d13f11/b.txt'], ['3452d.dsasa.txt', '1234 /data/c13af4/f.txt']]

If its a file, then you can do this:

with open('myfile.txt','r') as f:
   rows = [row for row in csv.reader(f,delimiter=',') if len(row) == 2]

If the OK line is the first line and the last line is the dot you could consider slice them off like this:

TestString = '''OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
'''
print('\n'.join(TestString.split()[1:-1]))

However if this is a very large string you may run into memory problems.


Why dont you match the OK SYS row and not return it.

for item in output:
    matchObj = re.search("(OK SYS|\\.).*", item)
    if not matchObj:
        print "got item "  + item

and(re.search("bla_bla_pattern", str_item, re.IGNORECASE) == None)

is working.


You can also do it without negative look ahead. You just need to add parentheses to that part of expression which you want to extract. This construction with parentheses is named group.

Let's write python code:

string = """OK SYS 10 LEN 20 12 43
1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt
.
"""

search_result = re.search(r"^OK.*\n((.|\s)*).", string)

if search_result:
    print(search_result.group(1))

Output is:

1233a.fdads.txt,23 /data/a11134/a.txt
3232b.ddsss.txt,32 /data/d13f11/b.txt
3452d.dsasa.txt,1234 /data/c13af4/f.txt

^OK.*\n will find first line with OK statement, but we don't want to extract it so leave it without parentheses. Next is part which we want to capture: ((.|\s)*), so put it inside parentheses. And in the end of regexp we look for a dot ., but we also don't want to capture it.

P.S: I find this answer is super helpful to understand power of groups. https://stackoverflow.com/a/3513858/4333811


 if not (line.startswith("OK ") or line.strip() == "."):
     print line

Use a negative match. (Also note that whitespace is significant, by default, inside a regex so don't space things out. Alternatively, use re.VERBOSE.)

for item in output:
    matchObj = re.search("^(OK|\\.)", item)
    if not matchObj:
        print "got item " + item