[python] Compare two files report difference in python

I have 2 files called "hosts" (in different directories)

I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.

So far I have tried this

hosts0 = open(dst1 + "/hosts","r") 
hosts1 = open(dst2 + "/hosts","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

But when I run this, I get

File "./audit.py", line 34, in <module>
  if lines2 != lines1[i]:
IndexError: list index out of range

Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?

This question is related to python file comparison

The answer is


You can add an conditional statement. If your array goes beyond index, then break and print the rest of the file.


hosts0 = open("C:path\\a.txt","r")
hosts1 = open("C:path\\b.txt","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

The above code is working for me. Can you please indicate what error you are facing?


import difflib
f=open('a.txt','r')  #open a file
f1=open('b.txt','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str2,str1))
print '\n'.join(diff)

The difflib library is useful for this, and comes in the standard library. I like the unified diff format.

http://docs.python.org/2/library/difflib.html#difflib.unified_diff

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
        )
        for line in diff:
            sys.stdout.write(line)

Outputs:

--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
 one
 two
-dogs
 three

And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
            n=0,
        )
        for line in diff:
            for prefix in ('---', '+++', '@@'):
                if line.startswith(prefix):
                    break
            else:
                sys.stdout.write(line[1:])

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to comparison

Wildcard string comparison in Javascript How to compare two JSON objects with the same elements in a different order equal? Comparing strings, c++ Char Comparison in C bash string compare to multiple correct values Comparing two hashmaps for equal values and same key sets? Comparing boxed Long values 127 and 128 Compare two files report difference in python How do I fix this "TypeError: 'str' object is not callable" error? Compare cell contents against string in Excel