Hashing a file in Python

Question

I want python to read to the EOF so I can get an appropriate hash  whether it is sha1 or md5  Please help  Here is what I have so far   import hashlib  inputFile   raw input  Enter the name of the file    openedFile   open inputFile  readFile   openedFile read    md5Hash   hashlib md5 readFile  md5Hashed   md5Hash hexdigest    sha1Hash   hashlib sha1 readFile  sha1Hashed   sha1Hash hexdigest    print  File Name   s    inputFile print  MD5   r    md5Hashed print  SHA1   r    sha1Hashed

User · Answer

import hashlib user   input  Enter    h   hashlib md5 user encode    h2   h hexdigest   with open  encrypted txt   w   as e      print h2 file e    with open  encrypted txt   r   as e      p   e readline   strip       print p

User · Answer

I would propose simply   def get digest file path       h   hashlib sha256        with open file path   rb   as file          while True                Reading is buffered  so we can read smaller chunks              chunk   file read h block size              if not chunk                  break             h update chunk       return h hexdigest     All other answers here seem to complicate too much  Python is already buffering when reading  in ideal manner  or you configure that buffering if you have more information about underlying storage  and so it is better to read in chunks the hash function finds ideal which makes it faster or at lest less CPU intensive to compute the hash function  So instead of disabling buffering and trying to emulate it yourself  you use Python buffering and control what you should be controlling  what the consumer of your data finds ideal  hash block size

User · Answer

Here is a Python 3  POSIX solution  not Windows   that uses mmap to map the object into memory   import hashlib import mmap  def sha256sum filename       h    hashlib sha256       with open filename   rb   as f          with mmap mmap f fileno    0  prot mmap PROT READ  as mm              h update mm      return h hexdigest

User · Answer

TL DR use buffers to not use tons of memory   We get to the crux of your problem  I believe  when we consider the memory implications of working with very large files  We don t want this bad boy to churn through 2 gigs of ram for a 2 gigabyte file so  as pasztorpisti points out  we gotta deal with those bigger files in chunks   import sys import hashlib    BUF SIZE is totally arbitrary  change for your app  BUF SIZE   65536    lets read stuff in 64kb chunks   md5   hashlib md5   sha1   hashlib sha1    with open sys argv 1    rb   as f      while True          data   f read BUF SIZE          if not data              break         md5 update data          sha1 update data   print  MD5   0   format md5 hexdigest     print  SHA1   0   format sha1 hexdigest       What we ve done is we re updating our hashes of this bad boy in 64kb chunks as we go along with hashlib s handy dandy update method  This way we use a lot less memory than the 2gb it would take to hash the guy all at once   You can test this with     mkfile 2g bigfile   python hashes py bigfile MD5  a981130cf2b7e09f4686dc273cf7187e SHA1  91d50642dd930e9542c39d36f0516d45f4e1af0d   md5 bigfile MD5  bigfile    a981130cf2b7e09f4686dc273cf7187e   shasum bigfile 91d50642dd930e9542c39d36f0516d45f4e1af0d  bigfile   Hope that helps   Also all of this is outlined in the linked question on the right hand side  Get MD5 hash of big files in Python    Addendum   In general when writing python it helps to get into the habit of following pep-8  For example  in python variables are typically underscore separated not camelCased  But that s just style and no one really cares about those things except people who have to read bad style    which might be you reading this code years from now

User · Answer

I have programmed a module wich is able to hash big files with different algorithms   pip3 install py essentials   Use the module like this   from py essentials import hashing as hs hash   hs fileChecksum  path to the file txt    sha256

User · Answer

For the correct and efficient computation of the hash value of a file  in Python 3     Open the file in binary mode  i e  add  b  to the filemode  to avoid character encoding and line-ending conversion issues  Don t read the complete file into memory  since that is a waste of memory  Instead  sequentially read it block by block and update the hash for each block  Eliminate double buffering  i e  don t use buffered IO  because we already use an optimal block size  Use readinto   to avoid buffer churning    Example   import hashlib  def sha256sum filename       h    hashlib sha256       b    bytearray 128 1024      mv   memoryview b      with open filename   rb   buffering 0  as f          for n in iter lambda   f readinto mv   0               h update mv  n       return h hexdigest

[python] Hashing a file in Python

Examples related to python

Examples related to hash

Examples related to md5

Examples related to sha1

Examples related to hashlib