I'm building a web service and have a node that accepts a POST to create a new resource. The resource expects one of two content-types - an XML format I'll be defining, or form-encoded variables.
The idea is that consuming applications can POST XML directly and benefit from better validation etc., but there's also an HTML interface that will POST the form-encoded stuff. Obviously the XML format has a charset declaration, but I can't see how I detect the form's charset just from looking at the POST.
A typical post to the form from Firefox looks like this:
POST /path HTTP/1.1
Host: www.myhostname.com
User-Agent: Mozilla/5.0 [...etc...]
Accept: text/html,application/xhtml+xml, [...etc...]
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 41
field1=value1&field2=value2&field3=value3
Which doesn't seem to contain any useful indication of the character set.
From what I can see, the application/x-www-form-urlencoded type is entirely defined in HTML, which just lays out the %-encoding rules, but doesn't say anything about what charset the data should be in.
Basically, is there any way of telling the character set if I don't know the character set the HTML originally presented was? Otherwise I'll have to try and guess the character set based on what chars are present, and that's always a bit iffy from what I can tell.
This question is related to
http
rest
http-headers
the default encoding of a HTTP POST is ISO-8859-1.
else you have to look at the Content-Type header that will then look like
Content-Type: application/x-www-form-urlencoded ; charset=UTF-8
You can maybe declare your form with
<form enctype="application/x-www-form-urlencoded;charset=UTF-8">
or
<form accept-charset="UTF-8">
to force the encoding.
Some references :
Try setting the charset on your Content-Type:
httpCon.setRequestProperty( "Content-Type", "multipart/form-data; charset=UTF-8; boundary=" + boundary );
The Charset used in the POST will match that of the Charset specified in the HTML hosting the form. Hence if your form is sent using UTF-8 encoding that is the encoding used for the posted content. The URL encoding is applied after the values are converted to the set of octets for the character encoding.
Source: Stackoverflow.com