Why do we use Base64

Question

Wikipedia says     Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data  This is to ensure that the data remains intact without modification during transport    But is it not that data is always stored transmitted in binary because the memory that our machines have store binary and it just depends how you interpret it  So  whether you encode the bit pattern 010011010110000101101110 as Man in ASCII or as TWFu in Base64  you are eventually going to store the same bit pattern   If the ultimate encoding is in terms of zeros and ones and every machine and media can deal with them  how does it matter if the data is represented as ASCII or Base64   What does it mean  media that are designed to deal with textual data   They can deal with binary    they can deal with anything     Thanks everyone  I think I understand now   When we send over data  we cannot be sure that the data would be interpreted in the same format as we intended it to be  So  we send over data coded in some format  like Base64  that both parties understand  That way even if sender and receiver interpret same things differently  but because they agree on the coded format  the data will not get interpreted wrongly   From Mark Byers example  If I want to send   Hello world    One way is to send it in ASCII like   72 101 108 108 111 10 119 111 114 108 100 33   But byte 10 might not be interpreted correctly as a newline at the other end  So  we use a subset of ASCII to encode it like this  83 71 86 115 98 71 56 115 67 110 100 118 99 109 120 107 73 61 61   which at the cost of more data transferred for the same amount of information ensures that the receiver can decode the data in the intended way  even if the receiver happens to have different interpretations for the rest of the character set

User · Answer

What does it mean  media that are   designed to deal with textual data     That those protocols were designed to handle text  often  only English text  instead of binary data  like  png and  jpg images       They can deal with binary    they can   deal with anything    But the converse is not true   A protocol designed to represent text may improperly treat binary data that happens to contain    The bytes 0x0A and 0x0D  used for line endings  which differ by platform  Other control characters like 0x00  NULL   C string terminator   0x03  END OF TEXT   0x04  END OF TRANSMISSION   or 0x1A  DOS end-of-file  which may prematurely signal the end of data  Bytes above 0x7F  if the protocol that was designed for ASCII   Byte sequences that are invalid UTF-8    So you can t just send binary data over a text-based protocol   You re limited to the bytes that represent the non-space non-control ASCII characters  of which there are 94   The reason Base 64 was chosen was that it s faster to work with powers of two  and 64 is the largest one that works      One question though  How is that   systems still don t agree on a common   encoding technique like the so common   UTF-8    On the Web  at least  they mostly have   A majority of sites use UTF-8   The problem in the West is that there is a lot of old software that ass-u-me-s that 1 byte   1 character and can t work with UTF-8   The problem in the East is their attachment to encodings like GB2312 and Shift JIS   And the fact that Microsoft seems to have still not gotten over having picked the wrong UTF encoding   If you want to use the Windows API or the Microsoft C runtime library  you re limited to UTF-16 or the locale s  ANSI  encoding   This makes it painful to use UTF-8 because you have to convert all the time

User · Answer

Here is a summary of my understanding after reading what others have posted  Important  Base64 encoding is not meant to provide security Base64 encoding is not meant to compress data Why do we use Base64 Base64 is a text representation of data that consists of only 64 characters which are the alphanumeric characters  lowercase and uppercase        and    These 64 characters are considered    safe     that is  they can not be misinterpreted by legacy computers and programs unlike characters such as  lt    gt   n and many others

User · Answer

What does it mean  media that are designed to deal with textual data      Back in the day when ASCII ruled the world dealing with non-ASCII values was a headache  People jumped through all sorts of hoops to get these transferred over the wire without losing out information

User · Answer

Base64 instead of escaping special characters  I ll give you a very different but real example  I write javascript code to be run in a browser  HTML tags have ID values  but there are constraints on what characters are valid in an ID   But I want my ID to losslessly refer to files in my file system  Files in reality can have all manner of weird and wonderful characters in them from exclamation marks  accented characters  tilde  even emoji  I cannot do this    lt div id   path to my strangely named file     jpg  gt       lt img src  http   myserver com path to my strangely named file     jpg  gt      Here s a pic I took in Moscow   lt  div gt    Suppose I want to run some code like this     ERROR document getElementById   path to my strangely named file     jpg      I think this code will fail when executed    With Base64 I can refer to something complicated without worrying about which language allows what special characters and which need escaping   document getElementById  18GerPD8fY4iTbNpC9hHNXNHyrDMampPLA      Unlike using an MD5 or some other hashing function  you can reverse the encoding to find out what exactly the data was that actually useful   I wish I knew about Base64 years ago  I would have avoided tearing my hair out with    encodeURIComponent    and str replace     n         n      SSH transfer of text   If you re trying to pass complex data over ssh  e g  a dotfile so you can get your shell personalizations   good luck doing it without Base 64  This is how you would do it with base 64  I know you can use SCP  but that would take multiple commands - which complicates key bindings for sshing into a server     https   superuser com a 1376076 114723

User · Answer

In addition to the other  somewhat lengthy  answers  even ignoring old systems that support only 7-bit ASCII  basic problems with supplying binary data in text-mode are    Newlines are typically transformed in text-mode  One must be careful not to treat a NUL byte as the end of a text string  which is all too easy to do in any program with C lineage

User · Answer

Why not look to the RFC that currently defines Base64      Base encoding of data is used in   many situations to store or transfer   data in environments that  perhaps for   legacy reasons  are restricted to   US-ASCII  1  data Base encoding can   also be used in new applications   that do not have legacy restrictions    simply because it makes it possible   to manipulate objects with text   editors       In the past  different applications   have had different requirements and   thus sometimes implemented base   encodings in slightly different    ways   Today  protocol specifications   sometimes use base encodings in    general  and  base64  in particular    without a precise description or   reference   Multipurpose Internet Mail   Extensions  MIME   4  is often used   as a reference for base64 without   considering the consequences for   line-wrapping or non-alphabet   characters   The purpose of this   specification is to establish common   alphabet and encoding   considerations   This will hopefully   reduce ambiguity in other   documents  leading to better   interoperability    Base64 was originally devised as a way to allow binary data to be attached to emails as a part of the Multipurpose Internet Mail Extensions

User · Answer

Encoding binary data in XML  Suppose you want to embed a couple images within an XML document  The images are binary data  while the XML document is text  But XML cannot handle embedded binary data  So how do you do it   One option is to encode the images in base64  turning the binary data into text that XML can handle   Instead of    lt images gt     lt image name  Sally  gt  binary gibberish that breaks XML parsers  lt  image gt     lt image name  Bobby  gt  binary gibberish that breaks XML parsers  lt  image gt   lt  images gt    you do    lt images gt     lt image name  Sally  encoding  base64  gt j23894uaiAJSD3234kljasjkSD    lt  image gt     lt image name  Bobby  encoding  base64  gt Ja3k23JKasil3452AsdfjlksKsasKD    lt  image gt   lt  images gt    And the XML parser will be able to parse the XML document correctly and extract the image data

User · Answer

Media that is designed for textual data is of course eventually binary as well  but textual media often use certain binary values for control characters  Also  textual media may reject certain binary values as non-text   Base64 encoding encodes binary data as values that can only be interpreted as text in textual media  and is free of any special characters and or control characters  so that the data will be preserved across textual media as well

User · Answer

Most computers store data in 8-bit binary format  but this is not a requirement  Some machines and transmission media can only handle 7 bits  or maybe even lesser  at a time  Such a medium would interpret the stream in multiples of 7 bits  so if you were to send 8-bit data  you won t receive what you expect on the other side  Base-64 is just one way to solve this problem  you encode the input into a 6-bit format  send it over your medium and decode it back to 8-bit format at the receiving end

User · Answer

One example of when I found it convenient was when trying to embed binary data in XML   Some of the binary data was being misinterpreted by the SAX parser because that data could be literally anything  including XML special characters   Base64 encoding the data on the transmitting end and decoding it on the receiving end fixed that problem

User · Answer

Your first mistake is thinking that ASCII encoding and Base64 encoding are interchangeable  They are not  They are used for different purposes   When you encode text in ASCII  you start with a text string and convert it to a sequence of bytes  When you encode data in Base64  you start with a sequence of bytes and convert it to a text string   To understand why Base64 was necessary in the first place we need a little history of computing   Computers communicate in binary - 0s and 1s - but people typically want to communicate with more rich forms data such as text or images  In order to transfer this data between computers it first has to be encoded into 0s and 1s  sent  then decoded again  To take text as an example - there are many different ways to perform this encoding  It would be much simpler if we could all agree on a single encoding  but sadly this is not the case  Originally a lot of different encodings were created  e g  Baudot code  which used a different number of bits per character until eventually ASCII became a standard with 7 bits per character  However most computers store binary data in bytes consisting of 8 bits each so ASCII is unsuitable for tranferring this type of data  Some systems would even wipe the most significant bit  Furthermore the difference in line ending encodings across systems mean that the ASCII character 10 and 13 were also sometimes modified  To solve these problems Base64 encoding was introduced  This allows you to encode arbitrary bytes to bytes which are known to be safe to send without getting corrupted  ASCII alphanumeric characters and a couple of symbols   The disadvantage is that encoding the message using Base64 increases its length - every 3 bytes of data is encoded to 4 ASCII characters  To send text reliably you can first encode to bytes using a text encoding of your choice  for example UTF-8  and then afterwards Base64 encode the resulting binary data into a text string that is safe to send encoded as ASCII  The receiver will have to reverse this process to recover the original message  This of course requires that the receiver knows which encodings were used  and this information often needs to be sent separately  Historically it has been used to encode binary data in email messages where the email server might modify line-endings  A more modern example is the use of Base64 encoding to embed image data directly in HTML source code  Here it is necessary to encode the data to avoid characters like   lt   and   gt   being interpreted as tags   Here is a working example  I wish to send a text message with two lines   Hello world   If I send it as ASCII  or UTF-8  it will look like this  72 101 108 108 111 10 119 111 114 108 100 33  The byte 10 is corrupted in some systems so we can base 64 encode these bytes as a Base64 string  SGVsbG8Kd29ybGQh Which when encoded using ASCII looks like this  83 71 86 115 98 71 56 75 100 50 57 121 98 71 81 104  All the bytes here are known safe bytes  so there is very little chance that any system will corrupt this message  I can send this instead of my original message and let the receiver reverse the process to recover the original message

User · Answer

It is more that the media validates the string encoding  so we want to ensure that the data is acceptable by a handling application  and doesn t contain a binary sequence representing EOL for example   Imagine you want to send binary data in an email with encoding UTF-8 -- The email may not display correctly if the stream of ones and zeros creates a sequence which isn t valid Unicode in UTF-8 encoding   The same type of thing happens in URLs when we want to encode characters not valid for a URL in the URL itself      http   www foo com hello my friend -  http   www foo com hello 20my 20friend   This is because we want to send a space over a system that will think the space is smelly   All we are doing is ensuring there is a 1-to-1 mapping between a known good  acceptable and non-detrimental sequence of bits to another literal sequence of bits  and that the handling application doesn t distinguish the encoding   In your example  man may be valid ASCII in first form  but often you may want to transmit values that are random binary  ie sending an image in an email       MIME-Version  1 0   Content-Description   Base64 encode of a gif    Content-Type  image gif  name  a gif    Content-Transfer-Encoding  Base64   Content-Disposition  attachment  filename  a gif      Here we see that a GIF image is encoded in base64 as a chunk of an email  The email client reads the headers and decodes it  Because of the encoding  we can be sure the GIF doesn t contain anything that may be interpreted as protocol and we avoid inserting data that SMTP or POP may find significant

User · Answer

Why  How do we use Base64 encoding   Base64 is one of the binary-to-text encoding scheme having 75  efficiency  It is used so that typical binary data  such as images  may be safely sent over legacy  not 8-bit clean  channels  In earlier email networks  till early 1990s   most email messages were plain text in the 7-bit US-ASCII character set  So many early comm protocol standards were designed to work over  7-bit  comm links  not 8-bit clean   Scheme efficiency is the ratio between number of bits in the input and the number of bits in the encoded output  Hexadecimal  Base16  is also one of the binary-to-text encoding scheme with 50  efficiency   Base64 Encoding Steps  Simplified     Binary data is arranged in continuous chunks of 24 bits  3 bytes  each   Each 24 bits chunk is grouped in to four parts of 6 bit each  Each 6 bit group is converted into their corresponding Base64 character values  i e  Base64 encoding converts three octets into four encoded characters  The ratio of output bytes to input bytes is 4 3  33  overhead   Interestingly  the same characters will be encoded differently depending on their position within the three-octet group which is encoded to produce the four characters  The receiver will have to reverse this process to recover the original message

[algorithm] Why do we use Base64?

Examples related to algorithm

Examples related to character-encoding

Examples related to binary

Examples related to ascii

Examples related to base64