[求助]UTF-8和UTF-16的区别

hu9920 · 发表于 2010-5-6 14:34:38

知道的说下,谢谢~

wgx083 · 发表于 2010-5-11 15:45:32

UTF-8 is the byte-oriented encoding form of Unicode. For details of its definition, see Section 2.5 “Encoding Forms” and Section 3.9 “ Unicode Encoding Forms ” in the Unicode Standard. See, in particular, Table 3-6 UTF-8 Bit Distribution and Table 3-7 Well-formed UTF-8 Byte Sequences, which give succinct summaries of the encoding form. Make sure you refer to the latest version of the Unicode Standard, as the Unicode Technical Committee has tightened the definition of UTF-8 over time to more strictly enforce unique sequences and to prohibit encoding of certain invalid characters. There is an Internet RFC 3629 about UTF-8. UTF-8 is also defined in Annex D of ISO/IEC 10646.

UTF-16 uses a single 16-bitcode unit to encode the most common 63K characters, and a pair of 16-bit code unites, called surrogates, to encode the 1M less commonly used characters in Unicode.Originally, Unicode was designed as a pure 16-bit encoding, aimed at representing all modern scripts. (Ancient scripts were to be represented with private-use characters.) Over time, and especially after the addition of over 14,500 composite characters for compatibility with legacy sets, it became clear that 16-bits were not sufficient for the user community. Out of this arose UTF-16.

from "http://unicode.org/faq/utf_bom.html#UTF16 "

peterjulun · 发表于 2010-5-12 18:03:30

UTF-8, 8bit编码, ASCII不作变换, 其他字符做变长编码, 每个字符1-3 byte. 通常作为外码. 有以下优点:
* 与CPU字节顺序无关, 可以在不同平台之间交流
* 容错能力高, 任何一个字节损坏后, 最多只会导致一个编码码位损失, 不会链锁错误(如GB码错一个字节就会整行乱码)

UTF-16, 16bit编码, 是变长码, 大致相当于20位编码, 值在0到0x10FFFF之间, 基本上就是unicode编码的实现. 它是变长码, 与CPU字序有关, 但因为最省空间, 常作为网络传输的外码.

peterjulun · 发表于 2010-5-12 18:04:18

http://blog.csdn.net/qinysong/archive/2006/09/05/1179480.aspx里面有字符编码

andy_wangyt · 发表于 2010-10-15 23:53:47

学习了，经常见，却从没想过为什么

jiaruiqiang · 发表于 2010-11-2 15:12:01

来学习哈

wyfyan · 发表于 2010-11-19 20:41:10

了解了

ydqjlf · 发表于 2011-1-27 10:47:08

学习了

wangjf8711 · 发表于 2011-2-10 17:12:58

学习

jiazurongyu · 发表于 2011-4-15 15:48:51

UTF-8, 8bit编码, ASCII不作变换, 其他字符做变长编码, 每个字符1-3 byte. 通常作为外码. 有以下优点:
* 与CPU字节顺序无关, 可以在不同平台之间交流
* 容错能力高, 任何一个字节损坏后, 最多只会导致一个编码码位损失, 不会链锁错误(如GB码错一个字节就会整行乱码)

UTF-16, 16bit编码, 是变长码, 大致相当于20位编码, 值在0到0x10FFFF之间, 基本上就是unicode编码的实现. 它是变长码, 与CPU字序有关, 但因为最省空间, 常作为网络传输的外码.

zxsh007 · 发表于 2011-4-15 15:52:00

上海熟悉Junit tester ，英语口语好，5年+，年薪20--30万
上海, 英语口语, 软件开发英语口语, 上海, tester, 年薪, Junit
senior tester ，有机会做Tech Leader.
要求有软件开发经验，能写自动化测试脚本，优先考虑做性能测试的，优先考虑用过Junit的（Junit就是用脚本写的自动化测试工具），不要做手动测试的

MSN：zxsh3598@hotmail.com

ceshi521 · 发表于 2012-2-9 17:10:34

学习了

		自动登录	找回密码
密码			(注-册)加入51Testing

[求助]UTF-8和UTF-16的区别

相关帖子

站长推荐 /1