XML documents consist entirely of characters from the Unicode repertoire. Except for a small number of specifically excluded control characters, any character defined by Unicode may appear within the content of an XML document. The selection of characters which may appear within markup is somewhat more limited but still large.XML includes facilities for identifying the encoding of the Unicode characters which make up the document, and for expressing characters which, for one reason or another, cannot be used directly. The Unicode character set can be encoded into bytes for storage or transmission in a variety of different ways, called "encodings". Unicode itself defines encodings which cover the entire repertoire; well-known ones include UTF-8 and UTF-16.There are many other text encodings which pre-date Unicode, such as ASCII and ISO/IEC 8859; their character repertoires in almost every case are subsets of the Unicode character set. XML allows the use of any of the Unicode-defined encodings, and any other encodings whose characters also appear in Unicode. XML also provides a mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used. Encodings other than UTF-8 and UTF-16 will not necessarily be recognized by every XML parser.
More pages: 1 2 3 4 5 6 7