How is a CRC32 checksum calculated

Question

Maybe I m just not seeing it  but CRC32 seems either needlessly complicated  or insufficiently explained anywhere I could find on the web   I understand that it is the remainder from a non-carry-based arithmetic division of the message value  divided by the  generator  polynomial  but the actual implementation of it escapes me   I ve read A Painless Guide To CRC Error Detection Algorithms  and I must say it was not painless  It goes over the theory rather well  but the author never gets to a simple  this is it   He does say what the parameters are for the standard CRC32 algorithm  but he neglects to lay out clearly how you get to it   The part that gets me is when he says  this is it  and then adds on   oh by the way  it can be reversed or started with different initial conditions   and doesn t give a clear answer of what the final way of calculating a CRC32 checksum given all of the changes he just added    Is there a simpler explanation of how CRC32 is calculated    I attempted to code in C how the table is formed   for  i   0  i  lt  256  i          temp   i       for  j   0  j  lt  8  j                  if  temp  amp  1                        temp  gt  gt   1              temp    0xEDB88320                    else  temp  gt  gt   1             testcrc i    temp      but this seems to generate values inconsistent with values I have found elsewhere on the Internet  I could use the values I found online  but I want to understand how they were created   Any help in clearing up these incredibly confusing numbers would be very appreciated

User · Answer

A CRC is pretty simple  you take a polynomial represented as bits and the data  and divide the polynomial into the data  or you represent the data as a polynomial and do the same thing    The remainder  which is between 0 and the polynomial is the CRC   Your code is a bit hard to understand  partly because it s incomplete  temp and testcrc are not declared  so it s unclear what s being indexed  and how much data is running through the algorithm   The way to understand CRCs is to try to compute a few using a short piece of data  16 bits or so  with a short polynomial -- 4 bits  perhaps   If you practice this way  you ll really understand how you might go about coding it   If you re doing it frequently  a CRC is quite slow to compute in software   Hardware computation is much more efficient  and requires just a few gates

User · Answer

In order to reduce crc32 to taking the reminder you need to:

Invert bits on each byte
xor first four bytes with 0xFF (this is to avoid errors on the leading 0s)
Add padding at the end (this is to make the last 4 bytes take part in the hash)
Compute the reminder
Reverse the bits again
xor the result again.

In code this is:


func CRC32 (file []byte) uint32 {
    for i , v := range(file) {
        file[i] = bits.Reverse8(v)
    }
    for i := 0; i < 4; i++ {
        file[i] ^= 0xFF
    }

    // Add padding
    file = append(file, []byte{0, 0, 0, 0}...)
    newReminder := bits.Reverse32(reminderIEEE(file))

    return newReminder ^ 0xFFFFFFFF
}

where reminderIEEE is the pure reminder on GF(2)[x]

User · Answer

In addition to the Wikipedia Cyclic redundancy check and Computation of CRC articles  I found a paper entitled Reversing CRC - Theory and Practice  to be a good reference     There are essentially three approaches for computing a CRC  an algebraic approach  a bit-oriented approach  and a table-driven approach   In Reversing CRC - Theory and Practice   each of these three algorithms approaches is explained in theory accompanied in the APPENDIX by an implementation for the CRC32 in the C programming language     PDF Link  Reversing CRC     Theory and Practice   HU Berlin Public Report  SAR-PR-2006-05  May 2006  Authors   Martin Stigge  Henryk Pl  tz  Wolf M  ller  Jens-Peter Redlich

User · Answer

The polynomial for CRC32 is      x32   x26   x23   x22   x16   x12   x11   x10   x8   x7   x5   x4   x2   x   1    Wikipedia CRC calculation   Or in hex and binary      0x 01 04 C1 1D B7   1 0000 0100 1100 0001 0001 1101 1011 0111   The highest term  x32  is usually not explicitly written  so it can instead be represented in hex just as     0x 04 C1 1D B7   Feel free to count the 1s and 0s  but you ll find they match up with the polynomial  where 1 is bit 0  or the first bit  and x is bit 1  or the second bit    Why this polynomial  Because there needs to be a standard given polynomial and the standard was set by IEEE 802 3  Also it is extremely difficult to find a polynomial that detects different bit errors effectively   You can think of the CRC-32 as a series of  Binary Arithmetic with No Carries   or basically  XOR and shift operations   This is technically called Polynomial Arithmetic    CRC primer  Chapter 5   To better understand it  think of this multiplication    x 3   x 2   x 0  x 3   x 1   x 0     x 6   x 4   x 3    x 5   x 3   x 2    x 3   x 1   x 0    x 6   x 5   x 4   3 x 3   x 2   x 1   x 0   If we assume x is base 2 then we get   x 7   x 3   x 2   x 1   x 0    CRC primer Chp 5   Why  Because 3x 3 is 11x 11  but we need only 1 or 0 pre digit  so we carry over    1x 110   1x 101   1x 100            11x 11   1x 10   1x 1   x 0  1x 110   1x 101   1x 100   1x 100   1x 11   1x 10   1x 1   x 0  1x 110   1x 101   1x 101            1x 11   1x 10   1x 1   x 0  1x 110   1x 110                     1x 11   1x 10   1x 1   x 0  1x 111                              1x 11   1x 10   1x 1   x 0   But mathematicians changed the rules so that it is mod 2  So basically any binary polynomial mod 2 is just addition without carry or XORs  So our original equation looks like      1x 110   1x 101   1x 100   11x 11   1x 10   1x 1   x 0   MOD 2    1x 110   1x 101   1x 100    1x 11   1x 10   1x 1   x 0     x 6   x 5   x 4   3 x 3   x 2   x 1   x 0  or that original number we had    I know this is a leap of faith but this is beyond my capability as a line-programmer  If you are a hard-core CS-student or engineer I challenge to break this down  Everyone will benefit from this analysis   So to work out a full example      Original message                  1101011011    Polynomial of  W idth 4                10011    Message after appending W zeros   11010110110000   Now we divide the augmented Message by the Poly using CRC arithmetic  This is the same division as before               1100001010   Quotient  nobody cares about the quotient                         10011   11010110110000   Augmented message  1101011011   0000   Poly   10011                  -----                   10011                  10011                  -----                   00001                  00000                  -----                   00010                  00000                  -----                   00101                  00000                  -----                   01011                  00000                  -----                   10110                  10011                  -----                   01010                  00000                  -----                   10100                  10011                  -----                   01110                  00000                  -----                   1110   Remainder   THE CHECKSUM       The division yields a quotient  which we throw away  and a remainder  which is the calculated checksum  This ends the calculation  Usually  the checksum is then appended to the message and the result transmitted  In this case the transmission would be  11010110111110    CRC primer  Chapter 7   Only use a 32-bit number as your divisor and use your entire stream as your dividend  Throw out the quotient and keep the remainder  Tack the remainder on the end of your message and you have a CRC32   Average guy review            QUOTIENT         ---------- DIVISOR   DIVIDEND                    REMAINDER    Take the first 32 bits  Shift bits If 32 bits are less than DIVISOR  go to step 2  XOR 32 bits by DIVISOR  Go to step 2     Note that the stream has to be dividable by 32 bits or it should be padded  For example  an 8-bit ANSI stream would have to be padded  Also at the end of the stream  the division is halted

User · Answer

For IEEE802 3  CRC-32  Think of the entire message as a serial bit stream  append 32 zeros to the end of the message  Next  you MUST reverse the bits of EVERY byte of the message and do a 1 s complement the first 32 bits  Now divide by the CRC-32 polynomial  0x104C11DB7  Finally  you must 1 s complement the 32-bit remainder of this division bit-reverse each of the 4 bytes of the remainder  This becomes the 32-bit CRC that is appended to the end of the message    The reason for this strange procedure is that the first Ethernet implementations would serialize the message one byte at a time and transmit the least significant bit of every byte first  The serial bit stream then went through a serial CRC-32 shift register computation  which was simply complemented and sent out on the wire after the message was completed  The reason for complementing the first 32 bits of the message is so that you don t get an all zero CRC even if the message was all zeros

User · Answer

I spent a while trying to uncover the answer to this question, and I finally published a tutorial on CRC-32 today: CRC-32 hash tutorial - AutoHotkey Community

In this example from it, I demonstrate how to calculate the CRC-32 hash for the ASCII string 'abc':

calculate the CRC-32 hash for the ASCII string 'abc':

inputs:
dividend: binary for 'abc': 0b011000010110001001100011 = 0x616263
polynomial: 0b100000100110000010001110110110111 = 0x104C11DB7

011000010110001001100011
reverse bits in each byte:
100001100100011011000110
append 32 0 bits:
10000110010001101100011000000000000000000000000000000000
XOR the first 4 bytes with 0xFFFFFFFF:
01111001101110010011100111111111000000000000000000000000

'CRC division':
01111001101110010011100111111111000000000000000000000000
 100000100110000010001110110110111
 ---------------------------------
  111000100010010111111010010010110
  100000100110000010001110110110111
  ---------------------------------
   110000001000101011101001001000010
   100000100110000010001110110110111
   ---------------------------------
    100001011101010011001111111101010
    100000100110000010001110110110111
    ---------------------------------
         111101101000100000100101110100000
         100000100110000010001110110110111
         ---------------------------------
          111010011101000101010110000101110
          100000100110000010001110110110111
          ---------------------------------
           110101110110001110110001100110010
           100000100110000010001110110110111
           ---------------------------------
            101010100000011001111110100001010
            100000100110000010001110110110111
            ---------------------------------
              101000011001101111000001011110100
              100000100110000010001110110110111
              ---------------------------------
                100011111110110100111110100001100
                100000100110000010001110110110111
                ---------------------------------
                    110110001101101100000101110110000
                    100000100110000010001110110110111
                    ---------------------------------
                     101101010111011100010110000001110
                     100000100110000010001110110110111
                     ---------------------------------
                       110111000101111001100011011100100
                       100000100110000010001110110110111
                       ---------------------------------
                        10111100011111011101101101010011

remainder: 0b10111100011111011101101101010011 = 0xBC7DDB53
XOR the remainder with 0xFFFFFFFF:
0b01000011100000100010010010101100 = 0x438224AC
reverse bits:
0b00110101001001000100000111000010 = 0x352441C2

thus the CRC-32 hash for the ASCII string 'abc' is 0x352441C2

User · Answer

Then there is always Rosetta Code  which shows crc32 implemented in dozens of computer languages  https   rosettacode org wiki CRC-32 and has links to many explanations and implementations

[c] How is a CRC32 checksum calculated?

The answer is

Examples related to c

Examples related to checksum

Examples related to crc32

Tags