How do I get rid of the b-prefix in a string in python

Question

A bunch of the tweets I am importing are having this issue where they read   b I posted a new photo to Facebook    I gather the b indicates it is a byte  But this is proving problematic because in my CSV files that I end up writing  the b doesn t go away and is interferring in future code    Is there a simple way to remove this b prefix from my lines of text    Keep in mind  I seem to need to have the text encoded in utf-8 or tweepy has trouble pulling them from the web      Here s the link content I m analyzing   https   www dropbox com s sjmsbuhrghj7abt new tweets txt dl 0  new tweets    content in the link    Code Attempt  outtweets     tweet text encode  utf-8   decode  utf-8    for tweet in new tweets  print outtweets    Error  UnicodeEncodeError                        Traceback  most recent call last   lt ipython-input-21-6019064596bf gt  in  lt module gt          1 for screen name in user list  ---- gt  2     get all tweets screen name  instance file     lt ipython-input-19-e473b4771186 gt  in get all tweets screen name  mode       99             with open os path join save location   s instance    screen name    w   as f      100                 writer   csv writer f  -- gt  101                 writer writerows outtweets      102         else      103             with open os path join save location   s csv    screen name    w   as f   C  Users Stan Shunpike Anaconda3 lib encodings cp1252 py in encode self  input  final       17 class IncrementalEncoder codecs IncrementalEncoder        18     def encode self  input  final False   --- gt  19         return codecs charmap encode input self errors encoding table  0       20       21 class IncrementalDecoder codecs IncrementalDecoder    UnicodeEncodeError   charmap  codec can t encode characters in position 64-65  character maps to  lt undefined gt

User · Answer

Assuming you don t want to immediately decode it again like others are suggesting here  you can parse it to a string and then just strip the leading  b and trailing      gt  gt  gt  x    Hi there     gt  gt  gt  x    Hi there   encode  utf-8     gt  gt  gt  x b Hi there  xef xbf xbd   gt  gt  gt  str x  2 -1   Hi there   xef  xbf  xbd

User · Answer

you need to decode the bytes of you want a string   b   b 1234  print b decode  utf-8        1234

User · Answer

I got it done by only encoding the output using utf-8   Here is the code example  new tweets   api GetUserTimeline screen name   user count 200  result   new tweets 0  try  text   result text except  text       with open file name   a   encoding  utf-8   as f      writer   csv writer f      writer writerows text    i e  do not encode when collecting data from api  encode the output  print or write  only

User · Answer

How to remove b    chars which is decoded string in python       import base64 a  cm9vdA    b base64 b64decode a  decode  utf-8   print b

User · Answer

On python 3 6 with django 2 0  decode on a byte literal does not works as expected   Yeah i get the right result when i print it  but the b value  is still there even if you print it right   This is what im encoding  uid   urlsafe base64 encode force bytes user pk      This is what im decoding   uid   force text urlsafe base64 decode uidb64       This is what django 2 0 says    urlsafe base64 encode s  source    Encodes a bytestring in base64 for use in URLs  stripping any trailing equal signs   urlsafe base64 decode s  source    Decodes a base64 encoded string  adding back any trailing equal signs that might have been stripped     This is my account activation email test html file     autoescape off    Hi    user username      Please click on the link below to confirm your registration   http      domain      url  accounts activate  uidb64 uid token token       endautoescape        This is my console response   Content-Type  text plain  charset  utf-8  MIME-Version  1 0 Content-Transfer-Encoding  7bit Subject  Activate Your MySite Account From  webmaster localhost To  testuser yahoo com Date  Fri  20 Apr 2018 06 26 46 -0000 Message-ID   lt 152420560682 16725 4597194169307598579 Dash-U   Hi testuser   Please click on the link below to confirm your registration   http   127 0 0 1 8000 activate b MjU  4vi-fasdtRf2db2989413ba    as you can see uid   b MjU   expected uid   MjU    test in console     python Python 3 6 4  default  Apr  7 2018  00 45 33    GCC 5 4 0 20160609  on linux Type  help    copyright    credits  or  license  for more information   gt  gt  gt  from django utils http import urlsafe base64 encode  urlsafe base64 decode  gt  gt  gt  from django utils encoding import force bytes  force text  gt  gt  gt  var1 urlsafe base64 encode force bytes 3    gt  gt  gt  print var1  b Mw   gt  gt  gt  print var1 decode    Mw  gt  gt  gt     After investigating it seems like its related to python 3  My workaround was quite simple    uid   user pk    i receive it as uidb64 on my activate function   user   User objects get pk uidb64    and  voila   Content-Transfer-Encoding  7bit Subject  Activate Your MySite Account From  webmaster localhost To  testuser yahoo com Date  Fri  20 Apr 2018 20 44 46 -0000 Message-ID   lt 152425708646 11228 13738465662759110946 Dash-U gt    Hi testuser   Please click on the link below to confirm your registration   http   127 0 0 1 8000 activate 45 4vi-3895fbb6b74016ad1882    now it works fine

User · Answer

You need to decode it to convert it to a string  Check the answer here about bytes literal in python3   In  1   b I posted a new photo to Facebook  decode  utf-8   Out 1    I posted a new photo to Facebook

User · Answer

It is just letting you know that the object you are printing is not a string  rather a byte object as a byte literal  People explain this in incomplete ways  so here is my take   Consider creating a byte object by typing a byte literal  literally defining a byte object without actually using a byte object e g  by typing b    and converting it into a string object encoded in utf-8   Note that converting here means decoding   byte object  b test    byte object by literally typing characters print byte object    Prints b test  print byte object decode  utf8      Prints  test  without quotations   You see that we simply apply the  decode utf8  function   Bytes in Python  https   docs python org 3 3 library stdtypes html bytes  String literals are described by the following lexical definitions   https   docs python org 3 3 reference lexical analysis html string-and-bytes-literals  stringliteral         stringprefix  shortstring   longstring  stringprefix          r     u     R     U  shortstring              shortstringitem            shortstringitem      longstring                 longstringitem                longstringitem        shortstringitem      shortstringchar   stringescapeseq longstringitem       longstringchar   stringescapeseq shortstringchar       lt any source character except     or newline or the quote gt  longstringchar        lt any source character except     gt  stringescapeseq           lt any source character gt   bytesliteral        bytesprefix shortbytes   longbytes  bytesprefix          b     B     br     Br     bR     BR     rb     rB     Rb     RB  shortbytes              shortbytesitem            shortbytesitem      longbytes                 longbytesitem                longbytesitem        shortbytesitem      shortbyteschar   bytesescapeseq longbytesitem       longbyteschar   bytesescapeseq shortbyteschar       lt any ASCII character except     or newline or the quote gt  longbyteschar        lt any ASCII character except     gt  bytesescapeseq           lt any ASCII character gt

User · Answer

Although the question is very old  I think it may be helpful to who is facing the same problem  Here the texts is a string like below   text   b I posted a new photo to Facebook     Thus you can not remove b by encoding it because it s not a byte  I did the following to remove it   cleaned text   text split  b    1    which will give    I posted a new photo to Facebook

[python] How do I get rid of the b-prefix in a string in python?

Examples related to python