[c] How to create a md5 hash of a string in C?

I've found some md5 code that consists of the following prototypes...

I've been trying to find out where I have to put the string I want to hash, what functions I need to call, and where to find the string once it has been hashed. I'm confused with regards to what the uint32 buf[4] and uint32 bits[2] are in the struct.

struct MD5Context {
    uint32 buf[4];
    uint32 bits[2];
    unsigned char in[64];
};

/*
 * Start MD5 accumulation.  Set bit count to 0 and buffer to mysterious
 * initialization constants.
 */
void MD5Init(struct MD5Context *context);

/*
 * Update context to reflect the concatenation of another buffer full
 * of bytes.
 */
void MD5Update(struct MD5Context *context, unsigned char const *buf, unsigned len);

/*
 * Final wrapup - pad to 64-byte boundary with the bit pattern 
 * 1 0* (64-bit count of bits processed, MSB-first)
 */
void MD5Final(unsigned char digest[16], struct MD5Context *context);

/*
 * The core of the MD5 algorithm, this alters an existing MD5 hash to
 * reflect the addition of 16 longwords of new data.  MD5Update blocks
 * the data and converts bytes into longwords for this routine.
 */
void MD5Transform(uint32 buf[4], uint32 const in[16]);

This question is related to c md5

The answer is


It would appear that you should

  • Create a struct MD5context and pass it to MD5Init to get it into a proper starting condition
  • Call MD5Update with the context and your data
  • Call MD5Final to get the resulting hash

These three functions and the structure definition make a nice abstract interface to the hash algorithm. I'm not sure why you were shown the core transform function in that header as you probably shouldn't interact with it directly.

The author could have done a little more implementation hiding by making the structure an abstract type, but then you would have been forced to allocate the structure on the heap every time (as opposed to now where you can put it on the stack if you so desire).


All of the existing answers use the deprecated MD5Init(), MD5Update(), and MD5Final().

Instead, use EVP_DigestInit_ex(), EVP_DigestUpdate(), and EVP_DigestFinal_ex(), e.g.

// example.c
//
// gcc example.c -lssl -lcrypto -o example

#include <openssl/evp.h>
#include <stdio.h>
#include <string.h>

void bytes2md5(const char *data, int len, char *md5buf) {
  // Based on https://www.openssl.org/docs/manmaster/man3/EVP_DigestUpdate.html
  EVP_MD_CTX *mdctx = EVP_MD_CTX_new();
  const EVP_MD *md = EVP_md5();
  unsigned char md_value[EVP_MAX_MD_SIZE];
  unsigned int md_len, i;
  EVP_DigestInit_ex(mdctx, md, NULL);
  EVP_DigestUpdate(mdctx, data, len);
  EVP_DigestFinal_ex(mdctx, md_value, &md_len);
  EVP_MD_CTX_free(mdctx);
  for (i = 0; i < md_len; i++) {
    snprintf(&(md5buf[i * 2]), 16 * 2, "%02x", md_value[i]);
  }
}

int main(void) {
  const char *hello = "hello";
  char md5[33]; // 32 characters + null terminator
  bytes2md5(hello, strlen(hello), md5);
  printf("%s\n", md5);
}

As other answers have mentioned, the following calls will compute the hash:

MD5Context md5;
MD5Init(&md5);
MD5Update(&md5, data, datalen);
MD5Final(digest, &md5);

The purpose of splitting it up into that many functions is to let you stream large datasets.

For example, if you're hashing a 10GB file and it doesn't fit into ram, here's how you would go about doing it. You would read the file in smaller chunks and call MD5Update on them.

MD5Context md5;
MD5Init(&md5);

fread(/* Read a block into data. */)
MD5Update(&md5, data, datalen);

fread(/* Read the next block into data. */)
MD5Update(&md5, data, datalen);

fread(/* Read the next block into data. */)
MD5Update(&md5, data, datalen);

...

//  Now finish to get the final hash value.
MD5Final(digest, &md5);

Here's a complete example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#if defined(__APPLE__)
#  define COMMON_DIGEST_FOR_OPENSSL
#  include <CommonCrypto/CommonDigest.h>
#  define SHA1 CC_SHA1
#else
#  include <openssl/md5.h>
#endif

char *str2md5(const char *str, int length) {
    int n;
    MD5_CTX c;
    unsigned char digest[16];
    char *out = (char*)malloc(33);

    MD5_Init(&c);

    while (length > 0) {
        if (length > 512) {
            MD5_Update(&c, str, 512);
        } else {
            MD5_Update(&c, str, length);
        }
        length -= 512;
        str += 512;
    }

    MD5_Final(digest, &c);

    for (n = 0; n < 16; ++n) {
        snprintf(&(out[n*2]), 16*2, "%02x", (unsigned int)digest[n]);
    }

    return out;
}

    int main(int argc, char **argv) {
        char *output = str2md5("hello", strlen("hello"));
        printf("%s\n", output);
        free(output);
        return 0;
    }

To be honest, the comments accompanying the prototypes seem clear enough. Something like this should do the trick:

void compute_md5(char *str, unsigned char digest[16]) {
    MD5Context ctx;
    MD5Init(&ctx);
    MD5Update(&ctx, str, strlen(str));
    MD5Final(digest, &ctx);
}

where str is a C string you want the hash of, and digest is the resulting MD5 digest.