Base64 length calculation

Question

After reading the base64 wiki      I m trying to figure out how s the formula  working     Given a string with length of n   the base64 length will be   Which is    4 Math Ceiling   double s Length 3     I already know that base64 length must be  4  0 to allow the decoder know what was the original text length   The max number of padding for a sequence can be   or         wiki  The number of output bytes per input byte is approximately 4   3  33    overhead    Question   How does the information above settle with the output length

User · Answer

In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:

Mine64 string allocation size (approximate) = (((4 * ((binary buffer size) + 1)) / 3) + 1)

So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.

User · Answer

In an attempt to give a succinct yet complete derivation    Every input byte has 8 bits  so for n input bytes we get      n    8   nbsp  nbsp  nbsp  nbsp   input bits   Every 6 bits is an output byte  so      ceil n    8   6   nbsp   nbsp   ceil n    4   3    nbsp  nbsp  nbsp  nbsp    output bytes   This is without padding   With padding  we round that up to multiple-of-four output bytes      ceil ceil n    4   3    4     4  nbsp   nbsp  ceil n    4   3   4     4   nbsp   nbsp  ceil n   3     4  nbsp  nbsp  nbsp  nbsp    output bytes   See Nested Divisions  Wikipedia  for the first equivalence   Using integer arithmetics  ceil n   m  can be calculated as  n   m     1  div m  hence we get       n   4   2  div 3  nbsp  nbsp  nbsp  nbsp    without padding       n   2  div 3   4  nbsp  nbsp  nbsp  nbsp    with padding   For illustration    n   with padding     n   2  div 3   4    without padding    n   4   2  div 3  ------------------------------------------------------------------------------  0                           0                                      0  1   AA                      4            AA                        2  2   AAA                     4            AAA                       3  3   AAAA                    4            AAAA                      4  4   AAAAAA                  8            AAAAAA                    6  5   AAAAAAA                 8            AAAAAAA                   7  6   AAAAAAAA                8            AAAAAAAA                  8  7   AAAAAAAAAA             12            AAAAAAAAAA               10  8   AAAAAAAAAAA            12            AAAAAAAAAAA              11  9   AAAAAAAAAAAA           12            AAAAAAAAAAAA             12 10   AAAAAAAAAAAAAA         16            AAAAAAAAAAAAAA           14 11   AAAAAAAAAAAAAAA        16            AAAAAAAAAAAAAAA          15 12   AAAAAAAAAAAAAAAA       16            AAAAAAAAAAAAAAAA         16   Finally  in the case of MIME Base64 encoding  two additional bytes  CR LF  are needed per every 76 output bytes  rounded up or down depending on whether a terminating newline is required

User · Answer

If there is someone interested in achieve the  Pedro Silva solution in JS  I just ported this same solution for it    const getBase64Size    base64    gt      let padding   base64 length       getBase64Padding base64        0   return   Math ceil base64 length   4    3   - padding    1000    const getBase64Padding    base64    gt      return endsWith base64              2       1    const endsWith    str  end    gt      let charsFromEnd   end length   let extractedEnd   str slice -charsFromEnd    return extractedEnd     end

User · Answer

Here is a function to calculate the original size of an encoded Base 64 file as a String in KB   private Double calcBase64SizeInKBytes String base64String        Double result   -1 0      if StringUtils isNotEmpty base64String             Integer padding   0          if base64String endsWith                      padding   2                    else               if  base64String endsWith       padding   1                    result    Math ceil base64String length     4    3   - padding            return result   1000

User · Answer

For all people who speak C  take a look at these two macros      calculate the size of  output  buffer required for a  input  buffer of length x during Base64 encoding operation  define B64ENCODE OUT SAFESIZE x      x    3 - 1  3    4   1       calculate the size of  output  buffer required for a  input  buffer of length x during Base64 decoding operation  define B64DECODE OUT SAFESIZE x     x  3  4     Taken from here

User · Answer

While everyone else is debating algebraic formulas  I d rather just use BASE64 itself to tell me      echo  Including padding  a base64 string requires four bytes for every three-byte chunk of the original string  including any partial chunks  One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added  Unless you have a very specific use  it is best to add the padding  usually an equals character  I added an extra byte for a null character in C  because ASCII strings without this are a little dangerous and you d need to carry the string length separately    wc -c   525    echo  Including padding  a base64 string requires four bytes for every three-byte chunk of the original string  including any partial chunks  One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added  Unless you have a very specific use  it is best to add the padding  usually an equals character  I added an extra byte for a null character in C  because ASCII strings without this are a little dangerous and you d need to carry the string length separately     base64   wc -c   710  So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct

User · Answer

For reference  the Base64 encoder s length formula is as follows     As you said  a Base64 encoder given n bytes of data will produce a string of 4n 3 Base64 characters  Put another way  every 3 bytes of data will result in 4 Base64 characters  EDIT  A comment correctly points out that my previous graphic did not account for padding  the correct formula is Ceiling 4n 3    The Wikipedia article shows exactly how the ASCII string Man  encoded into the Base64 string TWFu in its example  The input string is 3 bytes  or 24 bits  in size  so the formula correctly predicts the output will be 4 bytes  or 32 bits   long  TWFu  The process encodes every 6 bits of data into one of the 64 Base64 characters  so the 24-bit input divided by 6 results in 4 Base64 characters   You ask in a comment what the size of encoding 123456 would be  Keeping in mind that every every character of that string is 1 byte  or 8 bits  in size  assuming ASCII UTF8 encoding   we are encoding 6 bytes  or 48 bits  of data  According to the equation  we expect the output length to be  6 bytes   3 bytes    4 characters   8 characters    Putting 123456 into a Base64 encoder creates MTIzNDU2  which is 8 characters long  just as we expected

User · Answer

I don t see the simplified formula in other responses  The logic is covered but I wanted a most basic form for my embedded use    Unpadded     4   n    2    3    Padded   4     n   2    3   NOTE  When calculating the unpadded count we round up the integer division i e  add Divisor-1 which is  2 in this case

User · Answer

Integers  Generally we don t want to use doubles because we don t want to use the floating point ops  rounding errors etc  They are just not necessary   For this it is a good idea to remember how to perform the ceiling division  ceil x   y  in doubles can be written as  x   y - 1    y  while avoiding negative numbers  but beware of overflow    Readable  If you go for readability you can of course also program it like this  example in Java  for C you could use macro s  of course    public static int ceilDiv int x  int y        return  x   y - 1    y     public static int paddedBase64 int n        int blocks   ceilDiv n  3       return blocks   4     public static int unpaddedBase64 int n        int bits   8   n      return ceilDiv bits  6         test only public static void main String   args        for  int n   0  n  lt  21  n              System out println  Base 64 padded      paddedBase64 n            System out println  Base 64 unpadded      unpaddedBase64 n              Inlined  Padded  We know that we need 4 characters blocks at the time for each 3 bytes  or less   So then the formula becomes  for x   n and y   3    blocks    bytes   3 - 1    3 chars   blocks   4   or combined   chars     bytes   3 - 1    3    4   your compiler will optimize out the 3 - 1  so just leave it like this to maintain readability   Unpadded  Less common is the unpadded variant  for this we remember that each we need a character for each 6 bits  rounded up   bits   bytes   8 chars    bits   6 - 1    6   or combined   chars    bytes   8   6 - 1    6   we can however still divide by two  if we want to    chars    bytes   4   3 - 1    3   Unreadable  In case you don t trust your compiler to do the final optimizations for you  or if you want to confuse your colleagues    Padded    n   2    3   lt  lt  2   Unpadded    n  lt  lt  2    2    3     So there we are  two logical ways of calculation  and we don t need any branches  bit-ops or modulo ops - unless we really want to   Notes    Obviously you may need to add 1 to the calculations to include a null termination byte  For Mime you may need to take care of possible line termination characters and such  look for other answers for that

User · Answer

Each character is used to represent 6 bits  log2 64    6     Therefore 4 chars are used to represent 4   6   24 bits   3 bytes   So you need 4  n 3  chars to represent n bytes  and this needs to be rounded up to a multiple of 4    The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0  1  2 or 3

User · Answer

Seems to me that the right formula should be   n64   4    n   3     n   3    0   4   0

User · Answer

I believe that this one is an exact answer if n 3 not zero  no         n   3-n 3  4   ---------        3   Mathematica version    SizeB64 n      If Mod n  3     0  4 n 3  4  n   3 - Mod n  3   3    Have fun  GI

User · Answer

I think the given answers miss the point of the original question  which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes   The answer is  floor n   3    1    4   1  This includes padding and a terminating null character   You may not need the floor call if you are doing integer arithmetic   Including padding  a base64 string requires four bytes for every three-byte chunk of the original string  including any partial chunks   One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added   Unless you have a very specific use  it is best to add the padding  usually an equals character   I added an extra byte for a null character in C  because ASCII strings without this are a little dangerous and you d need to carry the string length separately

User · Answer

4   n   3 gives unpadded length   And round up to the nearest multiple of 4 for padding  and as 4 is a power of 2 can use bitwise logical operations      4   n   3    3   amp   3

User · Answer

Simple implementantion in javascript  function sizeOfBase64String base64String        if   base64String  return 0      const padding    base64String match                 1  length      return 4   Math ceil  base64String length   3   - padding

[string] Base64 length calculation?

Examples related to string

Examples related to base64

Examples related to padding

Examples related to formula