I have 2 files called "hosts" (in different directories)
I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.
So far I have tried this
hosts0 = open(dst1 + "/hosts","r")
hosts1 = open(dst2 + "/hosts","r")
lines1 = hosts0.readlines()
for i,lines2 in enumerate(hosts1):
if lines2 != lines1[i]:
print "line ", i, " in hosts1 is different \n"
print lines2
else:
print "same"
But when I run this, I get
File "./audit.py", line 34, in <module>
if lines2 != lines1[i]:
IndexError: list index out of range
Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?
This question is related to
python
file
comparison
You can add an conditional statement. If your array goes beyond index, then break and print the rest of the file.
hosts0 = open("C:path\\a.txt","r")
hosts1 = open("C:path\\b.txt","r")
lines1 = hosts0.readlines()
for i,lines2 in enumerate(hosts1):
if lines2 != lines1[i]:
print "line ", i, " in hosts1 is different \n"
print lines2
else:
print "same"
The above code is working for me. Can you please indicate what error you are facing?
import difflib
f=open('a.txt','r') #open a file
f1=open('b.txt','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split() #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ() # compare and just print
diff=list(d.compare(str2,str1))
print '\n'.join(diff)
The difflib library is useful for this, and comes in the standard library. I like the unified diff format.
http://docs.python.org/2/library/difflib.html#difflib.unified_diff
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
)
for line in diff:
sys.stdout.write(line)
Outputs:
--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
one
two
-dogs
three
And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.
import difflib
import sys
with open('/tmp/hosts0', 'r') as hosts0:
with open('/tmp/hosts1', 'r') as hosts1:
diff = difflib.unified_diff(
hosts0.readlines(),
hosts1.readlines(),
fromfile='hosts0',
tofile='hosts1',
n=0,
)
for line in diff:
for prefix in ('---', '+++', '@@'):
if line.startswith(prefix):
break
else:
sys.stdout.write(line[1:])
Source: Stackoverflow.com