I have read in an XML email attachment with
bytes_string=part.get_payload(decode=False)
The payload comes in as a byte string, as my variable name suggests.
I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.
The example shows:
str(b'abc','utf-8')
How can I apply the b
(bytes) keyword argument to my variable bytes_string
and use the recommended approach?
The way I tried doesn't work:
str(bbytes_string, 'utf-8')
This question is related to
python-3.x
string
type-conversion
UPDATED:
TO NOT HAVE ANY
b
and quotes at first and endHow to convert
bytes
as seen to strings, even in weird situations.
As your code may have unrecognizable characters to 'utf-8'
encoding,
it's better to use just str without any additional parameters:
some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]
print(text)
Output: \x02-\xdfI
if you add 'utf-8'
parameter, to these specific bytes, you should receive error.
As PYTHON 3 standard says, text
would be in utf-8 now with no concern.
How to filter (skip) non-UTF8 charachers from array?
To address this comment in @uname01's post and the OP, ignore the errors:
Code
>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'
Details
From the docs, here are more examples using the same errors
parameter:
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
invalid start byte
The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are
'strict'
(raise aUnicodeDecodeError
exception),'replace'
(useU+FFFD
,REPLACEMENT CHARACTER
), or'ignore'
(just leave the character out of the Unicode result).
Call decode()
on a bytes
instance to get the text which it encodes.
str = bytes.decode()
Source: Stackoverflow.com