[xml] What Content-Type value should I send for my XML sitemap?

As a rule of thumb, the safest bet towards making your document be treated properly by all web servers, proxies, and client browsers, is probably the following:

  1. Use the application/xml content type
  2. Include a character encoding in the content type, probably UTF-8
  3. Include a matching character encoding in the encoding attribute of the XML document itself.

In terms of the RFC 3023 spec, which some browsers fail to implement properly, the major difference in the content types is in how clients are supposed to treat the character encoding, as follows:

For application/xml, application/xml-dtd, application/xml-external-parsed-entity, or any one of the subtypes of application/xml such as application/atom+xml, application/rss+xml or application/rdf+xml, the character encoding is determined in this order:

  1. the encoding given in the charset parameter of the Content-Type HTTP header
  2. the encoding given in the encoding attribute of the XML declaration within the document,
  3. utf-8.

For text/xml, text/xml-external-parsed-entity, or a subtype like text/foo+xml, the encoding attribute of the XML declaration within the document is ignored, and the character encoding is:

  1. the encoding given in the charset parameter of the Content-Type HTTP header, or
  2. us-ascii.

Most parsers don't implement the spec; they ignore the HTTP Context-Type and just use the encoding in the document. With so many ill-formed documents out there, that's unlikely to change any time soon.