Saving utf-8 texts with json dumps as UTF8 not as u escape sequence

Question

Sample code   gt  gt  gt  import json  gt  gt  gt  json string   json dumps  quot          quot    gt  gt  gt  print json string   quot  u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4 quot   The problem  it s not human readable  My  smart  users want to verify or even edit text files with JSON dumps  and I   d rather not use XML   Is there a way to serialize objects into UTF-8 JSON strings  instead of   uXXXX

User · Answer

Thanks for the original answer here  With python 3 the following line of code   print json dumps result dict ensure ascii False     was ok  Consider trying not writing too much text in the code if it s not imperative   This might be good enough for the python console  However  to satisfy a server you might need to set the locale as explained here  if it is on apache2  http   blog dscpl com au 2014 09 setting-lang-and-lcall-when-using html  basically install he IL or whatever language locale on ubuntu  check it is not installed  locale -a    install it where XX is your language  sudo apt-get install language-pack-XX   For example   sudo apt-get install language-pack-he   add  the following text to  etc apache2 envvrs  export LANG  he IL UTF-8  export LC ALL  he IL UTF-8    Than you would hopefully not get python errors on from apache like       print  js    UnicodeEncodeError   ascii  codec can t encode characters in position 41-45  ordinal not in range 128    Also in apache try to make utf the default encoding as explained here    How to change the default encoding to UTF-8 for Apache   Do it early because apache errors can be pain to debug and you can mistakenly think it s from python which possibly isn t the case in that situation

User · Answer

Using ensure ascii False in json dumps is the right direction to solve this problem  as pointed out by Martijn  However  this may raise an exception   UnicodeDecodeError   ascii  codec can t decode byte 0xe7 in position 1  ordinal not in range 128    You need extra settings in either site py or sitecustomize py to set your sys getdefaultencoding   correct  site py is under lib python2 7  and sitecustomize py is under lib python2 7 site-packages   If you want to use site py  under def setencoding    change the first if 0  to if 1  so that python will use your operation system s locale   If you prefer to use sitecustomize py  which may not exist if you haven t created it  simply put these lines   import sys reload sys  sys setdefaultencoding  utf-8     Then you can do some Chinese json output in utf-8 format  such as   name     last name   u     json dumps name  ensure ascii False    You will get an utf-8 encoded string  rather than  u escaped json string   To verify your default encoding   print sys getdefaultencoding     You should get  utf-8  or  UTF-8  to verify your site py or sitecustomize py settings   Please note that you could not do sys setdefaultencoding  utf-8   at interactive python console

User · Answer

Peters  python 2 workaround fails on an edge case   d    u keyword   u bad credit   xe7redit cards   with io open  filename    w   encoding  utf8   as json file      data   json dumps d  ensure ascii False  decode  utf8       try          json file write data      except TypeError            Decode data to Unicode first         json file write data decode  utf8     UnicodeEncodeError   ascii  codec can t encode character u  xe7  in position 25  ordinal not in range 128    It was crashing on the  decode  utf8   part of line 3    I fixed the problem by making the program much simpler by avoiding that step as well as the special casing of ascii   with io open  filename    w   encoding  utf8   as json file    data   json dumps d  ensure ascii False  encoding  utf8     json file write unicode data    cat filename   keyword    bad credit    redit cards

User · Answer

Here s my solution using json dump     def jsonWrite p  pyobj  ensure ascii False  encoding SYSTEM ENCODING    kwargs       with codecs open p   wb    utf 8   as fileobj          json dump pyobj  fileobj  ensure ascii ensure ascii encoding encoding    kwargs    where SYSTEM ENCODING is set to   locale setlocale locale LC ALL      SYSTEM ENCODING   locale getlocale   1

User · Answer

use unicode-escape to solve problem   gt  gt  gt import json  gt  gt  gt json string   json dumps              gt  gt  gt json string encode  ascii   decode  unicode-escape                  explain   gt  gt  gt s        a         gt  gt  gt print  unicode      s encode  unicode-escape   decode  utf-8    unicode   u6f22   u03c7 u03b1 u03bd   u0445 u0430 u043d   gt  gt  gt u   s encode  unicode-escape   decode  utf-8    gt  gt  gt print  original      u encode  utf-8   decode  unicode-escape    original      a         original resource https   blog csdn net chuatony article details 72628868

User · Answer

As of Python 3 7 the following code works fine   from json import dumps result     symbol         json string   dumps result  sort keys True  indent 2  ensure ascii False  print json string     Output     symbol

User · Answer

Use the ensure ascii False switch to json dumps    then encode the value to UTF-8 manually    gt  gt  gt  json string   json dumps             ensure ascii False  encode  utf8    gt  gt  gt  json string b   xd7 x91 xd7 xa8 xd7 x99  xd7 xa6 xd7 xa7 xd7 x9c xd7 x94    gt  gt  gt  print json string decode                 If you are writing to a file  just use json dump   and leave it to the file object to encode   with open  filename    w   encoding  utf8   as json file      json dump             json file  ensure ascii False    Caveats for Python 2  For Python 2  there are some more caveats to take into account  If you are writing this to a file  you can use io open   instead of open   to produce a file object that encodes Unicode values for you as you write  then use json dump   instead to write to that file   with io open  filename    w   encoding  utf8   as json file      json dump u            json file  ensure ascii False    Do note that there is a bug in the json module where the ensure ascii False flag can produce a mix of unicode and str objects  The workaround for Python 2 then is   with io open  filename    w   encoding  utf8   as json file      data   json dumps u            ensure ascii False        unicode data  auto-decodes data to unicode if str     json file write unicode data     In Python 2  when using byte strings  type str   encoded to UTF-8  make sure to also set the encoding keyword    gt  gt  gt  d   1              2  u              gt  gt  gt  d  1    xd7 x91 xd7 xa8 xd7 x99  xd7 xa6 xd7 xa7 xd7 x9c xd7 x94   2  u  u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4     gt  gt  gt  s json dumps d  ensure ascii False  encoding  utf8    gt  gt  gt  s u   1     u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4    2     u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4     gt  gt  gt  json loads s   1   u  u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4   gt  gt  gt  json loads s   2   u  u05d1 u05e8 u05d9  u05e6 u05e7 u05dc u05d4   gt  gt  gt  print json loads s   1             gt  gt  gt  print json loads s   2

User · Answer

If you are loading JSON string from a file  amp  file contents arabic texts  Then this will work   Assume File like  arabic json      key1                  key2                       Get the arabic contents from the arabic json file  with open arabic json  encoding  utf-8   as f       deserialises it    json data   json load f     f close       json formatted string json data2   json dumps json data  ensure ascii   False    To use JSON Data in Django Template follow below steps     If have to get the JSON index in Django Template file  then simply decode the encoded string   json JSONDecoder   decode json data2       done   Now we can get the results as JSON index with arabic value

User · Answer

To write to a file  import codecs import json  with codecs open  your file txt    w   encoding  utf-8   as f      json dump   message   xin ch  o vi t nam    f  ensure ascii False    To print to stdout  import json print json dumps   message   xin ch  o vi t nam    ensure ascii False

User · Answer

Use codecs if possible   with codecs open  file path    a     utf-8   as fp      fp write json dumps res  ensure ascii False

User · Answer

The following is my understanding var reading answer above and google       coding utf-8 r     update  2017-01-09 14 44 39  explain  str  unicode  bytes in python2to3      python2 UnicodeDecodeError   ascii  codec can t decode byte 0xe4 in position 7  ordinal not in range 128       1 reload      importlib sys      importlib reload sys       sys setdefaultencoding  utf-8    python3 don t have this attribute       not suggest even in python2  see http   stackoverflow com questions 3828723 why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script      2 overwrite  usr lib python2 7 sitecustomize py or  sitecustomize py and PYTHONPATH     PYTHONPATH  python       too complex      3 control by your own  best          gt  all string must be unicode like python3  u xx  b xx  encode  utf-8     unicode  s disappeared in python3       see  http   blog ernest me post python-setdefaultencoding-unicode-bytes       how to Saving utf-8 texts in json dumps as UTF8  not as  u escape sequence      http   stackoverflow com questions 18337407 saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence      from   future   import print function import json  a     b   u         add u for python2 compatibility print   r    a  print   r    json dumps a   print   r     json dumps a  encode  utf8     a     b   u      print   r    json dumps a  ensure ascii False   print   r     json dumps a  ensure ascii False  encode  utf8       print a encode  utf8     AttributeError   dict  object has no attribute  encode  print        python2 bytes str  python3 bytes b   a  b   encode  utf-8   print   r    b  print   r    b decode  utf-8    print        python2 unicode  python3 str unicode c   b decode  utf-8   print   r    c  print   r    c encode  utf-8         python2   b   u  u4e2d u6587      b      u4e2d  u6587       b      u4e2d  u6587    u   b     u4e2d u6587       b     xe4 xb8 xad xe6 x96 x87       xe4 xb8 xad xe6 x96 x87  u  u4e2d u6587   u  u4e2d u6587    xe4 xb8 xad xe6 x96 x87    python3   b            b      u4e2d  u6587    b   b      u4e2d  u6587       b          b   b     xe4 xb8 xad xe6 x96 x87     b  xe4 xb8 xad xe6 x96 x87             b  xe4 xb8 xad xe6 x96 x87

User · Answer

UPDATE  This is wrong answer  but it s still useful to understand why it s wrong  See comments    How about unicode-escape    gt  gt  gt  d    1              2  u             gt  gt  gt  json str   json dumps d  decode  unicode-escape   encode  utf8    gt  gt  gt  print json str   1                2

[python] Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence

Examples related to python

Examples related to json

Examples related to unicode

Examples related to utf-8

Examples related to escaping