JSON and escaping characters

Question

I have a string which gets serialized to JSON in Javascript  and then deserialized to Java   It looks like if the string contains a degree symbol  then I get a problem   I could use some help in figuring out who to blame    is it the Spidermonkey 1 8 implementation   this has a JSON implementation built-in  is it Google gson  is it me for not doing something properly    Here s what happens in JSDB   js gt s  15 u00f8C  15  C js gt JSON stringify s   15  C    I would have expected  15 u00f8C  which leads me to believe that Spidermonkey s JSON implementation isn t doing the right thing    except that the JSON homepage s syntax description  is that the spec   says that a char can be      any-Unicode-character-       except- -or- -or-       control-character     so maybe it passes the string along as-is without encoding it as  u00f8    in which case I would think the problem is with the gson library   Can anyone help   I suppose my workaround is to use either a different JSON library  or manually escape strings myself after calling JSON stringify   -- but if this is a bug then I d like to file a bug report

User · Answer

hmm  well here s a workaround anyway   function JSON stringify s  emit unicode       var json   JSON stringify s      return emit unicode   json   json replace    u007f- uffff  g        function c             return    u    0000  c charCodeAt 0  toString 16   slice -4                     test case   js gt s  15 u00f8C 3 u0111   15  C 3  js gt JSON stringify s  true   15  C 3   js gt JSON stringify s  false   15 u00f8C 3 u0111

User · Answer

This is not a bug in either implementation  There is no requirement to escape U 00B0  To quote the RFC      2 5   Strings      The representation of strings is   similar to conventions used in the C   family of programming languages   A   string begins and ends with quotation   marks   All Unicode characters may be   placed within the quotation marks   except for the characters that must be   escaped  quotation mark  reverse   solidus  and the control characters    U 0000 through U 001F        Any character may be escaped    Escaping everything inflates the size of the data  all code points can be represented in four or fewer bytes in all Unicode transformation formats  whereas encoding them all makes them six or twelve bytes    It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem  It is a requirement of the JSON spec that all data use a Unicode encoding

User · Answer

This is SUPER late and probably not relevant anymore  but if anyone stumbles upon this answer  I believe I know the cause  So the JSON encoded string is perfectly valid with the degree symbol in it  as the other answer mentions  The problem is most likely in the character encoding that you are reading writing with  Depending on how you are using Gson  you are probably passing it a java io Reader instance  Any time you are creating a Reader from an InputStream  you need to specify the character encoding  or java nio charset Charset instance  it s usually best to use java nio charset StandardCharsets UTF 8   If you don t specify a Charset  Java will use your platform default encoding  which on Windows is usually CP-1252

[json] JSON and escaping characters

Examples related to json

Examples related to unicode