51Testing软件测试论坛

标题: [求助]UTF-8和UTF-16的区别 [打印本页]

作者: hu9920    时间: 2010-5-6 14:34
标题: [求助]UTF-8和UTF-16的区别
知道的说下,谢谢~
作者: wgx083    时间: 2010-5-11 15:45
UTF-8 is the byte-oriented encoding form of Unicode. For details of its definition, see Section 2.5 “Encoding Forms” and Section 3.9 “ Unicode Encoding Forms ” in the Unicode Standard. See, in particular, Table 3-6 UTF-8 Bit Distribution and Table 3-7 Well-formed UTF-8 Byte Sequences, which give succinct summaries of the encoding form. Make sure you refer to the latest version of the Unicode Standard, as the  Unicode Technical Committee has tightened the definition of UTF-8 over time to more strictly enforce unique sequences and to prohibit encoding of certain invalid characters. There is an Internet RFC 3629 about UTF-8. UTF-8 is also defined in Annex D of ISO/IEC 10646.

       UTF-16 uses a single 16-bitcode unit to encode the most common 63K characters, and a pair of 16-bit code unites, called surrogates, to encode the 1M less commonly used characters in Unicode.Originally, Unicode was designed as a pure 16-bit encoding, aimed at representing all modern scripts. (Ancient scripts were to be represented with private-use characters.) Over time, and especially after the addition of over 14,500 composite characters for compatibility with legacy sets, it became clear that 16-bits were not sufficient for the user community. Out of this arose UTF-16.

from "http://unicode.org/faq/utf_bom.html#UTF16 "
作者: peterjulun    时间: 2010-5-12 18:03
UTF-8, 8bit编码, ASCII不作变换, 其他字符做变长编码, 每个字符1-3 byte. 通常作为外码. 有以下优点:
* 与CPU字节顺序无关, 可以在不同平台之间交流
* 容错能力高, 任何一个字节损坏后, 最多只会导致一个编码码位损失, 不会链锁错误(如GB码错一个字节就会整行乱码)

UTF-16, 16bit编码, 是变长码, 大致相当于20位编码, 值在0到0x10FFFF之间, 基本上就是unicode编码的实现. 它是变长码, 与CPU字序有关, 但因为最省空间, 常作为网络传输的外码.
作者: peterjulun    时间: 2010-5-12 18:04
http://blog.csdn.net/qinysong/archive/2006/09/05/1179480.aspx里面有字符编码
作者: andy_wangyt    时间: 2010-10-15 23:53
学习了,经常见,却从没想过为什么
作者: jiaruiqiang    时间: 2010-11-2 15:12
来学习哈
作者: wyfyan    时间: 2010-11-19 20:41
了解 了
作者: ydqjlf    时间: 2011-1-27 10:47
学习了
作者: wangjf8711    时间: 2011-2-10 17:12
学习
作者: jiazurongyu    时间: 2011-4-15 15:48
UTF-8, 8bit编码, ASCII不作变换, 其他字符做变长编码, 每个字符1-3 byte. 通常作为外码. 有以下优点:
* 与CPU字节顺序无关, 可以在不同平台之间交流
* 容错能力高, 任何一个字节损坏后, 最多只会导致一个编码码位损失, 不会链锁错误(如GB码错一个字节就会整行乱码)

UTF-16, 16bit编码, 是变长码, 大致相当于20位编码, 值在0到0x10FFFF之间, 基本上就是unicode编码的实现. 它是变长码, 与CPU字序有关, 但因为最省空间, 常作为网络传输的外码.
作者: zxsh007    时间: 2011-4-15 15:52
上海熟悉Junit tester ,英语口语好,5年+,年薪20--30万
上海, 英语口语, 软件开发英语口语, 上海, tester, 年薪, Junit
senior tester ,有机会做Tech Leader.
要求有软件开发经验,能写自动化测试脚本,优先考虑做性能测试的,优先考虑用过Junit的(Junit就是用脚本写的自动化测试工具),不要做手动测试的


MSN:zxsh3598@hotmail.com
作者: ceshi521    时间: 2012-2-9 17:10
学习了




欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/) Powered by Discuz! X3.2