Meaning of - xml version 1 0 encoding utf-8

Question

I am new to XML and I am trying to understand the basics  I read the line below  in  Learning XML   but it is still not clear  for me  Can someone point me to a book or website which explains these basics clearly   From Learning XML      The XML declaration describes some of the most general properties of   the document  telling the XML processor that it needs an XML parser to   interpret this document    What does this mean   I understand the xml version part - both doc and user of doc should  talk  in the same version of XML  But what about the encoding part  Why is that necessary

User · Answer

To understand the  encoding  attribute  you have to understand the difference between bytes and characters   Think of bytes as numbers between 0 and 255  whereas characters are things like  a    1  and       The set of all characters that are available is called a character set   Each character has a sequence of one or more bytes that are used to represent it  however  the exact number and value of the bytes depends on the encoding used and there are many different encodings   Most encodings are based on an old character set and encoding called ASCII which is a single byte per character  actually  only 7 bits  and contains 128 characters including a lot of the common characters used in US English   For example  here are 6 characters in the ASCII character set that are represented by the values 60 to 65   Extract of ASCII Table 60-65  ---------------------     Byte     Character        ------ --------------       60           lt               61                         62           gt               63                         64                         65          A           ---------------------    In the full ASCII set  the lowest value used is zero and the highest is 127  both of these are hidden control characters    However  once you start needing more characters than the basic ASCII provides  for example  letters with accents  currency symbols  graphic symbols  etc    ASCII is not suitable and you need something more extensive  You need more characters  a different character set  and you need a different encoding as 128 characters is not enough to fit all the characters in  Some encodings offer one byte  256 characters  or up to six bytes   Over time a lot of encodings have been created  In the Windows world  there is CP1252  or ISO-8859-1  whereas Linux users tend to favour UTF-8  Java uses UTF-16 natively   One sequence of byte values for a character in one encoding might stand for a completely different character in another encoding  or might even be invalid   For example  in ISO 8859-1     is represented by one byte of value 226  whereas in UTF-8 it is two bytes  195  162  However  in ISO 8859-1  195  162 would be two characters           Think of XML as not a sequence of characters but a sequence of bytes   Imagine the system receiving the XML sees the bytes 195  162  How does it know what characters these are   In order for the system to interpret those bytes as actual characters  and so display them or convert them to another encoding   it needs to know the encoding used in the XML   Since most common encodings are compatible with ASCII  as far as basic alphabetic characters and symbols go  in these cases  the declaration itself can get away with using only the ASCII characters to say what the encoding is  In other cases  the parser must try and figure out the encoding of the declaration  Since it knows the declaration begins with  lt  xml it is a lot easier to do this   Finally  the version attribute specifies the XML version  of which there are two at the moment  see Wikipedia XML versions  There are slight differences between the versions  so an XML parser needs to know what it is dealing with  In most cases  for English speakers anyway   version 1 0 is sufficient

User · Answer

Can someone point me to a book or website which explains these basics clearly     You can check this XML Tutorial with  examples      But what about the encoding part   Why is that necessary     W3C provides explanation about encoding        The document character set for XML and HTML 4 0 is Unicode  aka ISO   10646   This means that HTML browsers and XML processors should behave   as if they used Unicode internally  But it doesn t mean that documents   have to be transmitted in Unicode  As long as client and server agree   on the encoding  they can use any encoding that can be converted to   Unicode

User · Answer

An XML declaration is not required in all XML documents  however XHTML document authors are strongly encouraged to use XML declarations in all their documents  Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol  Here is an example of an XHTML document  In this example  the XML declaration is included    lt  xml version  1 0  encoding  UTF-8   gt    lt  DOCTYPE html   PUBLIC  -  W3C  DTD XHTML 1 0 Strict  EN   http   www w3 org TR xhtml1 DTD xhtml1-strict dtd  gt    lt html xmlns  http   www w3 org 1999 xhtml  xml lang  en  lang  en  gt     lt head gt       lt title gt Virtual Library lt  title gt     lt  head gt     lt body gt       lt p gt Moved to  lt a href  http   example org   gt example org lt  a gt   lt  p gt    lt  body gt   lt  html gt    Please refer to the W3 standards for XML

User · Answer

The encoding declaration identifies which encoding is used to   represent the characters in the document    More on the XML Declaration here  http   msdn microsoft com en-us library ms256048 aspx

User · Answer

The XML declaration in the document map consists of the following    The version number   xml version  1 0        This is mandatory  Although the number might change for future versions of XML  1 0 is the current version    The encoding declaration    encoding  UTF-8      This is optional  If used  the encoding declaration must appear immediately after the version information in the XML declaration  and must contain a value representing an existing character encoding

User · Answer

This is the XML optional preamble    version  1 0  means that this is the XML standard this file conforms to encoding  utf-8  means that the file is encoded using the UTF-8 Unicode encoding

[xml] Meaning of - <?xml version="1.0" encoding="utf-8"?>

Examples related to xml

Examples related to character-encoding

Examples related to xml-declaration

Examples related to xml-encoding