[file] SHA-256 or MD5 for file integrity

I know that SHA-256 is favored over MD5 for security, etc., but, if I am to use a method to only check file integrity (that is, nothing to do with password encryption, etc.), is there any advantage of using SHA-256?

Since MD5 is 128-bit and SHA-256 is 256-bit (therefore twice as big)...

  1. Would it take up to twice as long to encrypt?

  2. Where time is not of essence, like in a backup program, and file integrity is all that is needed, would anyone argue against MD5 for a different algorithm, or even suggest a different technique?

  3. Does using MD5 produce a checksum?

This question is related to file hash md5 sha256

The answer is


  1. Yes, on most CPUs, SHA-256 is two to three times slower than MD5, though not primarily because of its longer hash. See other answers here and the answers to this Stack Overflow questions.
  2. Here's a backup scenario where MD5 would not be appropriate:
    • Your backup program hashes each file being backed up. It then stores each file's data by its hash, so if you're backing up the same file twice you only end up with one copy of it.
    • An attacker can cause the system to backup files they control.
    • The attacker knows the MD5 hash of a file they want to remove from the backup.
    • The attacker can then use the known weaknesses of MD5 to craft a new file that has the same hash as the file to remove. When that file is backed up, it will replace the file to remove, and that file's backed up data will be lost.
    • This backup system could be strengthened a bit (and made more efficient) by not replacing files whose hash it has previously encountered, but then an attacker could prevent a target file with a known hash from being backed up by preemptively backing up a specially constructed bogus file with the same hash.
    • Obviously most systems, backup and otherwise, do not satisfy the conditions necessary for this attack to be practical, but I just wanted to give an example of a situation where SHA-256 would be preferable to MD5. Whether this would be the case for the system you're creating depends on more than just the characteristics of MD5 and SHA-256.
  3. Yes, cryptographic hashes like the ones generated by MD5 and SHA-256 are a type of checksum.

Happy hashing!


To 1): Yes, on most CPUs, SHA-256 is about only 40% as fast as MD5.

To 2): I would argue for a different algorithm than MD5 in such a case. I would definitely prefer an algorithm that is considered safe. However, this is more a feeling. Cases where this matters would be rather constructed than realistic, e.g. if your backup system encounters an example case of an attack on an MD5-based certificate, you are likely to have two files in such an example with different data, but identical MD5 checksums. For the rest of the cases, it doesn't matter, because MD5 checksums have a collision (= same checksums for different data) virtually only when provoked intentionally. I'm not an expert on the various hashing (checksum generating) algorithms, so I can not suggest another algorithm. Hence this part of the question is still open. Suggested further reading is Cryptographic Hash Function - File or Data Identifier on Wikipedia. Also further down on that page there is a list of cryptographic hash algorithms.

To 3): MD5 is an algorithm to calculate checksums. A checksum calculated using this algorithm is then called an MD5 checksum.


It is technically approved that MD5 is faster than SHA256 so in just verifying file integrity it will be sufficient and better for performance.

You are able to checkout the following resources:


  1. No, it's less fast but not that slow
  2. For a backup program it's maybe necessary to have something even faster than MD5

All in all, I'd say that MD5 in addition to the file name is absolutely safe. SHA-256 would just be slower and harder to handle because of its size.

You could also use something less secure than MD5 without any problem. If nobody tries to hack your file integrity this is safe, too.


Every answer seems to suggest that you need to use secure hashes to do the job but all of these are tuned to be slow to force a bruteforce attacker to have lots of computing power and depending on your needs this may not be the best solution.

There are algorithms specifically designed to hash files as fast as possible to check integrity and comparison (murmur, XXhash...). Obviously these are not designed for security as they don't meet the requirements of a secure hash algorithm (i.e. randomness) but have low collision rates for large messages. This features make them ideal if you are not looking for security but speed.

Examples of this algorithms and comparison can be found in this excellent answer: Which hashing algorithm is best for uniqueness and speed?.

As an example, we at our Q&A site use murmur3 to hash the images uploaded by the users so we only store them once even if users upload the same image in several answers.


The underlying MD5 algorithm is no longer deemed secure, thus while md5sum is well-suited for identifying known files in situations that are not security related, it should not be relied on if there is a chance that files have been purposefully and maliciously tampered. In the latter case, the use of a newer hashing tool such as sha256sum is highly recommended.

So, if you are simply looking to check for file corruption or file differences, when the source of the file is trusted, MD5 should be sufficient. If you are looking to verify the integrity of a file coming from an untrusted source, or over from a trusted source over an unencrypted connection, MD5 is not sufficient.

Another commenter noted that Ubuntu and others use MD5 checksums. Ubuntu has moved to PGP and SHA256, in addition to MD5, but the documentation of the stronger verification strategies are more difficult to find. See the HowToSHA256SUM page for more details.


Examples related to file

Gradle - Move a folder from ABC to XYZ Difference between opening a file in binary vs text Angular: How to download a file from HttpClient? Python error message io.UnsupportedOperation: not readable java.io.FileNotFoundException: class path resource cannot be opened because it does not exist Writing JSON object to a JSON file with fs.writeFileSync How to read/write files in .Net Core? How to write to a CSV line by line? Writing a dictionary to a text file? What are the pros and cons of parquet format compared to other formats?

Examples related to hash

php mysqli_connect: authentication method unknown to the client [caching_sha2_password] What is Hash and Range Primary Key? How to create a laravel hashed password Hashing a file in Python PHP salt and hash SHA256 for login password Append key/value pair to hash with << in Ruby Are there any SHA-256 javascript implementations that are generally considered trustworthy? How do I generate a SALT in Java for Salted-Hash? What does hash do in python? Hashing with SHA1 Algorithm in C#

Examples related to md5

Hashing a file in Python How to convert password into md5 in jquery? How do I calculate the MD5 checksum of a file in Python? encrypt and decrypt md5 How to generate an MD5 file hash in JavaScript? SHA-256 or MD5 for file integrity How to reverse MD5 to get the original string? Calculate a MD5 hash from a string Calculate MD5 checksum for a file How to convert md5 string to normal text?

Examples related to sha256

PHP salt and hash SHA256 for login password Are there any SHA-256 javascript implementations that are generally considered trustworthy? Mismatch Detected for 'RuntimeLibrary' SHA-256 or MD5 for file integrity Hashing a string with Sha256 How to hash some string with sha256 in Java? Generating a SHA-256 hash from the Linux command line Hash String via SHA-256 in Java Generate sha256 with OpenSSL and C++ How long is the SHA256 hash?