I Love Python. I Learn Python. I Teach Python. I Am a Python.

2015/01/21

Programming with Unicode

Programming with Unicode



6.3. CJK: asian encodings

6.3.1. Chinese encodings

GBK is a family of Chinese charsets using multibyte encodings:
  • GB 2312 (1980): includes 6,763 Chinese characters
  • GBK (1993) (code page 936)
  • GB 18030 (2005, last revision in 2006)
  • HZ (1989) (HG-GZ-2312)
Other encodings: 
Big5 (大五碼, Big Five Encoding, 1984), 
cp950.

11.3. Python

Python supports Unicode since its version 2.0 released in october 2000. Byte and Unicode strings store their length, so it’s possible to embed nul byte/character.
Python can be compiled in two modes: narrow (UTF-16) and wide (UCS-4). sys.maxunicode constant is 0xFFFF in narrow build, and 0x10FFFF in wide build. Python is compiled in narrow mode on Windows, because wchar_t is also 16 bits on Windows and so it is possible to use Python Unicode strings as wchar_t* strings without any (expensive) conversion.
See also
Python Unicode HOWTO.

沒有留言:

張貼留言