Monday, November 2, 2009

Characters Encoding

  • ISO 646
-- ISO 7-bit coded character set for information interchange, a simple extension of ASCII

  • ISO/IEC 8859
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc.

  • GB 2312
GB2312 (1980) has been superseded by GBK and GB18030, which include additional characters.
Characters in GB2312 are arranged in a 94x94 grid, The value of the first byte is from 0xA1-0xF7 (161-247), while the value of the second byte is from 0xA1-0xFE (161-254)

To map the code points to bytes, add 160 (0xA0) to the 1000's and 100's value of the code point to form the high byte, and add 160 (0xA0) to the 10's and 1's value of the code point to form the low byte.

For example, if you have the GB2312 code point 4566 ("foreign,"), the high byte will come from 45 (4500), and the low byte will come from 66 (0066). For the high byte, add 45 to 160, giving 205 or 0xCD. For the low byte do the same, add 66 to 160, giving 226 or 0xE2. So, the full encoding is 0xCDE2.

  • GBK
The GBK character set was defined in 1993 as an extension of GB2312-80, while also including the characters of GB13000.1-93 through the unused codepoints available in GB2312. Hence GBK is upward compatible with GB2312.

A character is encoded as 1 or 2 bytes. A byte in the range 00¨C7F is a single byte that means the same thing as it does in ASCII.

A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range 81FE (that is, never 80 or FF), and the second byte is 40FE for some areas and 80FE for others

  • GB18030
GB18030 can be considered a Unicode Transformation Format that maintains compatibility with a legacy character set. GB18030 is a superset of ASCII and can represent the whole range of Unicode code points; in addition, it is also a superset of GB2312. GB18030 also maintains compatibility with Windows Codepage 936, sometimes known as GBK

Unicode
The Unicode Standard consists of a repertoire, an encoding methodology and set of standard character encodings etc.
The Unicode Consortium, the nonprofit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes,
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).

ISO 10646
ISO 10646 and Unicode have an identical repertoire and numbers. The difference between them is that Unicode adds rules and specifications that are outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts

  • UTF-8
The UTF-8 encoding is variable-width, ranging from 1–4 bytes. Each byte has 0–4 leading 1 bits followed by a zero bit to indicate its type

  • UTF-16
For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair.

  • UCS2
The UCS-2 encoding form is identical to that of UTF-16, except that it does not support surrogate pairs and therefore can only encode characters in the BMP range U+0000 through U+FFFF

IANA character-sets
http://www.iana.org/assignments/character-sets

Character encoding(WikiPedia)
http://en.wikipedia.org/wiki/Charset

Sunday, October 18, 2009

Sip

rfc2543 -> Obsoleted by: 3261, 3262, 3263, 3264, 3265

rfc3261: SIP: Session Initiation Protocol

Monday, August 24, 2009

防护等级介绍

防护等级介绍
IP是国际用来认定防护等级的代号,IP等级由两个数字所组成,第一个数字表示防尘,第二个数字由表示防水,数字越大表示其防护组合越佳。

防尘等级

号码 防护程度 定义
0 无防护 无特殊的防护
1 防止大于50mm之物体侵入 防止人体因不慎碰到内部零件
2 防止大于12mm之物体侵入 防止手指碰到内部零件
3 防止大于2.5mm之物全侵入 防止工具,电线或限制范围的物体侵入
4 防止大于1.0mm之物体侵入 防止的蚊蝇、昆虫或物体侵入
5 无法完全防止灰尘侵入 侵入灰尘量不会影响灯具正常运作
6 防尘 完全防止灰尘侵入

 防水等级

号码 防护程度 定义
0 无防护 无特殊的防护
1 防止滴水侵入 防止垂直滴下之水滴
2 倾斜15度时防止滴水侵入 当物体倾斜15度时,仍可防止滴水
3 防止喷射的水侵入 防止雨水、或垂直入夹角小于50度方向
4 防止飞溅的水侵入 防止各方向飞溅而来的水侵入
5 防止大浪的水侵入 防止大浪或喷水孔急速喷出的水侵入
6 防止大浪的水侵入 在一定时间和水压的条件下仍正常运作
7 防止侵水的水侵入 无期限沉没在一定水压的条件下正常运作
8 防止沉没的影响