- ISO 646
- ISO/IEC 8859
- GB 2312
Characters in GB2312 are arranged in a 94x94 grid, The value of the first byte is from 0xA1-0xF7 (161-247), while the value of the second byte is from 0xA1-0xFE (161-254)
To map the code points to bytes, add 160 (0xA0) to the 1000's and 100's value of the code point to form the high byte, and add 160 (0xA0) to the 10's and 1's value of the code point to form the low byte.
For example, if you have the GB2312 code point 4566 ("foreign,"), the high byte will come from 45 (4500), and the low byte will come from 66 (0066). For the high byte, add 45 to 160, giving 205 or 0xCD. For the low byte do the same, add 66 to 160, giving 226 or 0xE2. So, the full encoding is 0xCDE2.
- GBK
A character is encoded as 1 or 2 bytes. A byte in the range 00¨C7F is a single byte that means the same thing as it does in ASCII.
A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range
81
–FE
(that is, never 80
or FF
), and the second byte is 40
–FE
for some areas and 80
–FE
for others- GB18030
Unicode
The Unicode Standard consists of a repertoire, an encoding methodology and set of standard character encodings etc.
The Unicode Consortium, the nonprofit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes,
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).
ISO 10646
ISO 10646 and Unicode have an identical repertoire and numbers. The difference between them is that Unicode adds rules and specifications that are outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts
- UTF-8
- UTF-16
- UCS2
IANA character-sets
http://www.iana.org/assignments/character-sets
Character encoding(WikiPedia)
http://en.wikipedia.org/wiki/Charset