Best way to convert string to bytes in Python 3

Question

There appear to be two different ways to convert a string to bytes  as seen in the answers to TypeError    39 str  39  does not support the buffer interface  Which of these methods would be better or more Pythonic  Or is it just a matter of personal preference   b   bytes mystring   utf-8    b   mystring encode  utf-8

User · Accepted Answer

If you look at the docs for bytes  it points you to bytearray   bytearray  source   encoding   errors     Return a new array of bytes  The bytearray type is a mutable sequence of integers in the range 0  lt   x  lt  256  It has most of the usual methods of mutable sequences  described in Mutable Sequence Types  as well as most methods that the bytes type has  see Bytes and Byte Array Methods  The optional source parameter can be used to initialize the array in a few different ways  If it is a string  you must also give the encoding  and optionally  errors  parameters  bytearray   then converts the string to bytes using str encode    If it is an integer  the array will have that size and will be initialized with null bytes  If it is an object conforming to the buffer interface  a read-only buffer of the object will be used to initialize the bytes array  If it is an iterable  it must be an iterable of integers in the range 0  lt   x  lt  256  which are used as the initial contents of the array  Without an argument  an array of size 0 is created   So bytes can do much more than just encode a string  It s Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense  For  encoding a string  I think that some string encode encoding  is more Pythonic than using the constructor  because it is the most self documenting --  quot take this string and encode it with this encoding quot  is clearer than bytes some string  encoding  -- there is no explicit verb when you use the constructor  Edit  I checked the Python source  If you pass a unicode string to bytes using CPython  it calls PyUnicode AsEncodedString  which is the implementation of encode  so you re just skipping a level of indirection if you call encode yourself  Also  see Serdalis  comment -- unicode string encode encoding  is also more Pythonic because its inverse is byte string decode encoding  and symmetry is nice

User · Answer

Answer for a slightly different problem  You have a sequence of raw unicode that was saved into a str variable  s str  str    quot  x00 x01 x00 xc0 x01 x00 x00 x00 x04 quot   You need to be able to get the byte literal of that unicode  for struct unpack    etc   s bytes  bytes   b  x00 x01 x00 xc0 x01 x00 x00 x00 x04   Solution  s new  bytes   bytes s  encoding  quot raw unicode escape quot    Reference  scroll up for standard encodings   Python Specific Encodings

User · Answer

It s easier than it is thought   my str    hello world  my str as bytes   str encode my str  type my str as bytes    ensure it is byte representation my decoded str   my str as bytes decode   type my decoded str    ensure it is string representation

User · Answer

The absolutely best way is neither of the 2  but the 3rd  The first parameter to encode defaults to  utf-8  ever since Python 3 0  Thus the best way is   b   mystring encode     This will also be faster  because the default argument results not in the string  utf-8  in the C code  but NULL  which is much faster to check   Here be some timings   In  1    timeit -r 10  abc  encode  utf-8   The slowest run took 38 07 times longer than the fastest   This could mean that an intermediate result is being cached  10000000 loops  best of 10  183 ns per loop  In  2    timeit -r 10  abc  encode   The slowest run took 27 34 times longer than the fastest   This could mean that an intermediate result is being cached  10000000 loops  best of 10  137 ns per loop   Despite the warning the times were very stable after repeated runs - the deviation was just  2 per cent     Using encode   without an argument is not Python 2 compatible  as in Python 2 the default character encoding is ASCII    gt  gt  gt           encode   Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  UnicodeDecodeError   ascii  codec can t decode byte 0xc3 in position 0  ordinal not in range 128

[python] Best way to convert string to bytes in Python 3?

Examples related to python

Examples related to string

Examples related to character-encoding

Examples related to python-3.x