What does Content-type application json charset utf-8 really mean

Question

When I make a POST request with a JSON body to my REST service I include Content-type  application json  charset utf-8 in the message header  Without this header  I get an error from the service  I can also successfully use Content-type  application json without the  charset utf-8 portion    What exactly does charset utf-8 do   I know it specifies the character encoding but the service works fine without it  Does this encoding limit the characters that can be in the message body

User · Accepted Answer

The header just denotes what the content is encoded in. It is not necessarily possible to deduce the type of the content from the content itself, i.e. you can't necessarily just look at the content and know what to do with it. That's what HTTP headers are for, they tell the recipient what kind of content they're (supposedly) dealing with.

Content-type: application/json; charset=utf-8 designates the content to be in JSON format, encoded in the UTF-8 character encoding. Designating the encoding is somewhat redundant for JSON, since the default (only?) encoding for JSON is UTF-8. So in this case the receiving server apparently is happy knowing that it's dealing with JSON and assumes that the encoding is UTF-8 by default, that's why it works with or without the header.

Does this encoding limit the characters that can be in the message body?

No. You can send anything you want in the header and the body. But, if the two don't match, you may get wrong results. If you specify in the header that the content is UTF-8 encoded but you're actually sending Latin1 encoded content, the receiver may produce garbage data, trying to interpret Latin1 encoded data as UTF-8. If of course you specify that you're sending Latin1 encoded data and you're actually doing so, then yes, you're limited to the 256 characters you can encode in Latin1.

User · Answer

To substantiate  deceze s claim that the default JSON encoding is UTF-8     From IETF RFC4627      JSON text SHALL be encoded in Unicode   The default encoding is     UTF-8       Since the first two characters of a JSON text will always be ASCII     characters  RFC0020   it is possible to determine whether an octet     stream is UTF-8  UTF-16  BE or LE   or UTF-32  BE or LE  by looking     at the pattern of nulls in the first four octets         00 00 00 xx  UTF-32BE       00 xx 00 xx  UTF-16BE       xx 00 00 00  UTF-32LE       xx 00 xx 00  UTF-16LE       xx xx xx xx  UTF-8

User · Answer

I exactly agree with  deceze but I want to develop this  quot I get an error from the service quot  part of the question  We getting this kind of errors as http 415  Http 415 Unsupported Media type error  The HTTP 415 Unsupported Media Type client error response code indicates that the server refuses to accept the request because the payload format is in an unsupported format  The format problem might be due to the request s indicated Content-Type or Content-Encoding  or as a result of inspecting the data directly  In other words  such is seen in this example   We have to set the correct content type and we have to accept the right content type as seen Add Content-Type  application json and Accept  application json  Otherwise  it will assume the default

User · Answer

Note that IETF RFC4627 has been superseded by IETF RFC7158   In section  8 1  it retracts the text cited by  Drew earlier by saying     Implementations MUST NOT add a byte order mark to the beginning of a JSON text

User · Answer

Dart http s implementation process the bytes thanks to that  charset utf-8   so i m sure several implementations out there supports this  to avoid the  latin-1  fallback charset when reading the bytes from the response  In my case  I totally lose format on the response body string  so I have to do the bytes encoding manually to utf8  or add that header  inner  parameter on my server s API response

User · Answer

I was using HttpClient and getting back response header with content-type of application json  I lost characters such as foreign languages or symbol that used unicode since HttpClient is default to ISO-8859-1  So  be explicit as possible as mentioned by  WesternGun to avoid any possible problem   There is no way handle that due to server doesn t handle requested-header charset  method setRequestHeader  accept-charset    UTF-8     for me and I had to retrieve response data as draw bytes and convert it into String using UTF-8  So  it is recommended to be explicit and avoid assumption of default value

[character-encoding] What does "Content-type: application/json; charset=utf-8" really mean?

Examples related to character-encoding

Examples related to mime-types