Convert bytes to a string

Question

I m using this code to get standard output from an external program    gt  gt  gt  from subprocess import    gt  gt  gt  command stdout   Popen   ls    -l    stdout PIPE  communicate   0    The communicate   method returns an array of bytes    gt  gt  gt  command stdout b total 0 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file1 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file2 n    However  I d like to work with the output as a normal Python string  So that I could print it like this    gt  gt  gt  print command stdout  -rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file1 -rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file2   I thought that s what the binascii b2a qp   method is for  but when I tried it  I got the same byte array again    gt  gt  gt  binascii b2a qp command stdout  b total 0 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file1 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file2 n    How do I convert the bytes value back to string  I mean  using the  batteries  instead of doing it manually  And I d like it to be OK with Python nbsp 3

User · Answer

While  Aaron Maenpaa s answer just works  a user recently asked      Is there any more simply way   fhand read   decode  ASCII          It s so long    You can use   command stdout decode     decode   has a standard argument      codecs decode obj  encoding  utf-8   errors  strict

User · Answer

You can just do this  bstr b your string   print bstr decode     Output  your string

User · Answer

def toString string           try          return v decode  utf-8       except ValueError          return string  b   b 97 080 500  s    97 080 500  print toString b   print toString s

User · Answer

When working with data from Windows systems (with \r\n line endings), my answer is

String = Bytes.decode("utf-8").replace("\r\n", "\n")

Why? Try this with a multiline Input.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

All your line endings will be doubled (to \r\r\n), leading to extra empty lines. Python's text-read functions usually normalize line endings so that strings use only \n. If you receive binary data from a Windows system, Python does not have a chance to do that. Thus,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

will replicate your original file.

User · Answer

I think you actually want this    gt  gt  gt  from subprocess import    gt  gt  gt  command stdout   Popen   ls    -l    stdout PIPE  communicate   0   gt  gt  gt  command text   command stdout decode encoding  windows-1252     Aaron s answer was correct  except that you need to know which encoding to use  And I believe that Windows uses  windows-1252   It will only matter if you have some unusual  non-ASCII  characters in your content  but then it will make a difference   By the way  the fact that it does matter is the reason that Python moved to using two different types for binary and text data  it can t convert magically between them  because it doesn t know the encoding unless you tell it  The only way YOU would know is to read the Windows documentation  or read it here

User · Answer

Since this question is actually asking about subprocess output, you have more direct approaches available. The most modern would be using subprocess.check_output and passing text=True (Python 3.7+) to automatically decode stdout using the system default coding:

text = subprocess.check_output(["ls", "-l"], text=True)

For Python 3.6, Popen accepts an encoding keyword:

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

The general answer to the question in the title, if you're not dealing with subprocess output, is to decode bytes to text:

>>> b'abcde'.decode()
'abcde'

With no argument, sys.getdefaultencoding() will be used. If your data is not sys.getdefaultencoding(), then you must specify the encoding explicitly in the decode call:

>>> b'caf\xe9'.decode('cp1250')
'café'

User · Answer

From sys     System-specific parameters and functions   To write or read binary data from to the standard streams  use the underlying binary buffer  For example  to write bytes to stdout  use sys stdout buffer write b abc

User · Answer

In Python 3  the default encoding is  utf-8   so you can directly use   b hello  decode     which is equivalent to  b hello  decode encoding  utf-8     On the other hand  in Python 2  encoding defaults to the default string encoding  Thus  you should use   b hello  decode encoding    where encoding is the encoding you want   Note  support for keyword arguments was added in Python nbsp 2 7

User · Answer

For Python 3  this is a much safer and Pythonic approach to convert from byte to string   def byte to str bytes or str       if isinstance bytes or str  bytes     Check if it s in bytes         print bytes or str decode  utf-8        else          print  Object not of byte type    byte to str b total 0 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file1 n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file2 n     Output   total 0 -rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file1 -rw-rw-r-- 1 thomas thomas 0 Mar  3 07 03 file2

User · Answer

try this  bytes fromhex  c3a9   decode  utf-8

User · Answer

If you should get the following by trying decode():

AttributeError: 'str' object has no attribute 'decode'

You can also specify the encoding type straight in a cast:

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

User · Answer

Set universal newlines to True  i e   command stdout   Popen   ls    -l    stdout PIPE  universal newlines True  communicate   0

User · Answer

I made a function to clean a list  def cleanLists self  lista       lista    x strip   for x in lista      lista    x replace   n       for x in lista      lista    x replace   b       for x in lista      lista    x encode  utf8   for x in lista      lista    x decode  utf8   for x in lista       return lista

User · Answer

I think this way is easy    gt  gt  gt  bytes data    112  52  52   gt  gt  gt     join map chr  bytes data    p44

User · Answer

For your specific case of  run a shell command and get its output as text instead of bytes   on Python 3 7  you should use subprocess run and pass in text True  as well as capture output True to capture the output   command result   subprocess run   ls    -l    capture output True  text True  command result stdout    is a  str  containing your program s stdout   text used to be called universal newlines  and was changed  well  aliased  in Python 3 7  If you want to support Python versions before 3 7  pass in universal newlines True instead of text True

User · Answer

If you want to convert any bytes  not just string converted to bytes   with open  bytesfile    rb   as infile      str   base64 b85encode imageFile read     with open  bytesfile    rb   as infile      str2   json dumps list infile read       This is not very efficient  however  It will turn a 2 MB picture into 9 MB

User · Answer

If you don t know the encoding  then to read binary input into string in Python 3 and Python 2 compatible way  use the ancient MS-DOS CP437 encoding   PY3K   sys version info  gt    3  0   lines      for line in stream      if not PY3K          lines append line      else          lines append line decode  cp437      Because encoding is unknown  expect non-English symbols to translate to characters of cp437  English characters are not translated  because they match in most single byte encodings and UTF-8    Decoding arbitrary binary input to UTF-8 is unsafe  because you may get this    gt  gt  gt  b  x00 x01 xffsd  decode  utf-8   Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  UnicodeDecodeError   utf-8  codec can t decode byte 0xff in position 2  invalid start byte   The same applies to latin-1  which was popular  the default   for Python 2  See the missing points in Codepage Layout - it is where Python chokes with infamous ordinal not in range   UPDATE 20150604  There are rumors that Python 3 has the surrogateescape error strategy for encoding stuff into binary data without data loss and crashes  but it needs conversion tests   binary  - gt   str  - gt   binary   to validate both performance and reliability   UPDATE 20170116  Thanks to comment by Nearoo - there is also a possibility to slash escape all unknown bytes with backslashreplace error handler  That works only for Python 3  so even with this workaround you will still get inconsistent output from different Python versions   PY3K   sys version info  gt    3  0   lines      for line in stream      if not PY3K          lines append line      else          lines append line decode  utf-8    backslashreplace      See Python   s Unicode Support for details   UPDATE 20170119  I decided to implement slash escaping decode that works for both Python nbsp 2 and Python nbsp 3  It should be slower than the cp437 solution  but it should produce identical results on every Python version     --- preparation  import codecs  def slashescape err           codecs error handler  err is UnicodeDecode instance  return     a tuple with a replacement for the unencodable part of the input     and a position where encoding should continue         print err  dir err   err start  err end  err object  err start      thebyte   err object err start err end      repl   u   x  hex ord thebyte   2       return  repl  err end   codecs register error  slashescape   slashescape     --- processing  stream    b  x80abc    lines      for line in stream      lines append line decode  utf-8    slashescape

User · Answer

You can just do  print command stdout decode  utf-8

User · Answer

You need to decode the bytes object to produce a string    gt  gt  gt  b abcde  b abcde     utf-8 is used here because it is a very common encoding  but you   need to use the encoding your data is actually in   gt  gt  gt  b abcde  decode  utf-8     abcde

User · Answer

You need to decode the byte string and turn it in to a character (Unicode) string.

On Python 2

encoding = 'utf-8'
'hello'.decode(encoding)

or

unicode('hello', encoding)

On Python 3

encoding = 'utf-8'
b'hello'.decode(encoding)

or

str(b'hello', encoding)

User · Answer

To interpret a byte sequence as a text  you have to know the corresponding character encoding   unicode text   bytestring decode character encoding    Example    gt  gt  gt  b  xc2 xb5  decode  utf-8          ls command may produce output that can t be interpreted as text  File names on Unix may be any sequence of bytes except slash b    and zero b  0     gt  gt  gt  open bytes range 0x100   translate None  b  0      w   close     Trying to decode such byte soup using utf-8 encoding raises UnicodeDecodeError   It can be worse  The decoding may fail silently and produce mojibake if you use a wrong incompatible encoding    gt  gt  gt        encode  utf-8   decode  cp1252                The data is corrupted but your program remains unaware that a failure has occurred   In general  what character encoding to use is not embedded in the byte sequence itself  You have to communicate this info out-of-band  Some outcomes are more likely than others and therefore chardet module exists that can guess the character encoding  A single Python script may use multiple character encodings in different places     ls output can be converted to a Python string using os fsdecode   function that succeeds even for undecodable filenames  it uses sys getfilesystemencoding   and surrogateescape error handler on Unix    import os import subprocess  output   os fsdecode subprocess check output  ls      To get the original bytes  you could use os fsencode     If you pass universal newlines True parameter then subprocess uses locale getpreferredencoding False  to decode bytes e g   it can be cp1252 on Windows   To decode the byte stream on-the-fly  io TextIOWrapper   could be used  example   Different commands may use different character encodings for their output e g   dir internal command  cmd  may use cp437  To decode its output  you could pass the encoding explicitly  Python 3 6     output   subprocess check output  dir   shell True  encoding  cp437     The filenames may differ from os listdir    which uses Windows Unicode API  e g     xb6  can be substituted with   x14    Python s cp437 codec maps b  x14  to control character U 0014 instead of U 00B6       To support filenames with arbitrary Unicode characters  see  Decode PowerShell output possibly containing non-ASCII Unicode characters into a Python string

[python] Convert bytes to a string

The answer is

Examples related to python

Examples related to string

Examples related to python-3.x

Tags