[linux] Fastest way to tell if two files have the same contents in Unix/Linux?

I have a shell script in which I need to check whether two files contain the same data or not. I do this a for a lot of files, and in my script the diff command seems to be the performance bottleneck.

Here's the line:

diff -q $dst $new > /dev/null

if ($status) then ...

Could there be a faster way to compare the files, maybe a custom algorithm instead of the default diff?

This question is related to linux file unix diff

The answer is


You can compare by checksum algorithm like sha256

sha256sum oldFile > oldFile.sha256

echo "$(cat oldFile.sha256) newFile" | sha256sum --check

newFile: OK

if the files are distinct the result will be

newFile: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Try also to use the cksum command:

chk1=`cksum <file1> | awk -F" " '{print $1}'`
chk2=`cksum <file2> | awk -F" " '{print $1}'`

if [ $chk1 -eq $chk2 ]
then
  echo "File is identical"
else
  echo "File is not identical"
fi

The cksum command will output the byte count of a file. See 'man cksum'.


Because I suck and don't have enough reputation points I can't add this tidbit in as a comment.

But, if you are going to use the cmp command (and don't need/want to be verbose) you can just grab the exit status. Per the cmp man page:

If a FILE is '-' or missing, read standard input. Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.

So, you could do something like:

STATUS="$(cmp --silent $FILE1 $FILE2; echo $?)"  # "$?" gives exit status for each comparison

if [[ $STATUS -ne 0 ]]; then  # if status isn't equal to 0, then execute code
    DO A COMMAND ON $FILE1
else
    DO SOMETHING ELSE
fi

EDIT: Thanks for the comments everyone! I updated the test syntax here. However, I would suggest you use Vasili's answer if you are looking for something similar to this answer in readability, style, and syntax.


For files that are not different, any method will require having read both files entirely, even if the read was in the past.

There is no alternative. So creating hashes or checksums at some point in time requires reading the whole file. Big files take time.

File metadata retrieval is much faster than reading a large file.

So, is there any file metadata you can use to establish that the files are different? File size ? or even results of the file command which does just read a small portion of the file?

File size example code fragment:

  ls -l $1 $2 | 
  awk 'NR==1{a=$5} NR==2{b=$5} 
       END{val=(a==b)?0 :1; exit( val) }'

[ $? -eq 0 ] && echo 'same' || echo 'different'  

If the files are the same size then you are stuck with full file reads.


To quickly and safely compare any two files:

if cmp --silent -- "$FILE1" "$FILE2"; then
  echo "files contents are identical"
else
  echo "files differ"
fi

It's readable, efficient, and works for any file names including "` $()


I like @Alex Howansky have used 'cmp --silent' for this. But I need both positive and negative response so I use:

cmp --silent file1 file2 && echo '### SUCCESS: Files Are Identical! ###' || echo '### WARNING: Files Are Different! ###'

I can then run this in the terminal or with a ssh to check files against a constant file.


Doing some testing with a Raspberry Pi 3B+ (I'm using an overlay file system, and need to sync periodically), I ran a comparison of my own for diff -q and cmp -s; note that this is a log from inside /dev/shm, so disk access speeds are a non-issue:

[root@mypi shm]# dd if=/dev/urandom of=test.file bs=1M count=100 ; time diff -q test.file test.copy && echo diff true || echo diff false ; time cmp -s test.file test.copy && echo cmp true || echo cmp false ; cp -a test.file test.copy ; time diff -q test.file test.copy && echo diff true || echo diff false; time cmp -s test.file test.copy && echo cmp true || echo cmp false
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 6.2564 s, 16.8 MB/s
Files test.file and test.copy differ

real    0m0.008s
user    0m0.008s
sys     0m0.000s
diff false

real    0m0.009s
user    0m0.007s
sys     0m0.001s
cmp false
cp: overwrite âtest.copyâ? y

real    0m0.966s
user    0m0.447s
sys     0m0.518s
diff true

real    0m0.785s
user    0m0.211s
sys     0m0.573s
cmp true
[root@mypi shm]# pico /root/rwbscripts/utils/squish.sh

I ran it a couple of times. cmp -s consistently had slightly shorter times on the test box I was using. So if you want to use cmp -s to do things between two files....

identical (){
  echo "$1" and "$2" are the same.
  echo This is a function, you can put whatever you want in here.
}
different () {
  echo "$1" and "$2" are different.
  echo This is a function, you can put whatever you want in here, too.
}
cmp -s "$FILEA" "$FILEB" && identical "$FILEA" "$FILEB" || different "$FILEA" "$FILEB"

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to diff

Create patch or diff file from git repository and apply it to another different git repository Comparing the contents of two files in Sublime Text Git diff between current branch and master but not including unmerged master commits Fast way of finding lines in one file that are not in another? Python - difference between two strings How to see the changes in a Git commit? unix diff side-to-side results? Find the files existing in one directory but not in the other git diff between two different files How to get the difference (only additions) between two files in linux