base64 encoding takes 8-bit binary byte data and encodes it uses only the characters A-Z
, a-z
, 0-9
, +
, /
* so it can be transmitted over channels that do not preserve all 8-bits of data, such as email.
Hence, it wants a string of 8-bit bytes. You create those in Python 3 with the b''
syntax.
If you remove the b
, it becomes a string. A string is a sequence of Unicode characters. base64 has no idea what to do with Unicode data, it's not 8-bit. It's not really any bits, in fact. :-)
In your second example:
>>> encoded = base64.b64encode('data to be encoded')
All the characters fit neatly into the ASCII character set, and base64 encoding is therefore actually a bit pointless. You can convert it to ascii instead, with
>>> encoded = 'data to be encoded'.encode('ascii')
Or simpler:
>>> encoded = b'data to be encoded'
Which would be the same thing in this case.
* Most base64 flavours may also include a =
at the end as padding. In addition, some base64 variants may use characters other than +
and /
. See the Variants summary table at Wikipedia for an overview.