Introduction
character set GB13000.1-1993 contains 20,902 Chinese characters.
This standard was proposed by the Ministry of Information Industry People's Republic of China;
This standard by the Institute for Information Industry, Ministry of Electronics Industry Standardization;
This standard was drafted: Ministry of information industry Institute of Electronics standardization;
source
in order to facilitate simultaneous processing of multiple file types, the international organization for standardization under the coded character set for the working group developed a new coded character set standard, ISO / IEC 10646. The standard was first enacted in 1993, there were only issued its first part, ISO / IEC 10646.1: 1993, China corresponding national standard is GB 13000.1-93 "Information technology - Universal Multiple-Octet Coded Character Set (UCS) part I: architecture and basic multilingual plane. " The aim is to develop a standard for all Unicode characters in the world, all the text in order to achieve world unity process on the computer.
significance
GB13000 establish a new coding system. ISO / IEC 10646 is referred to as "multi-eight" coded character set, because it uses four "eight" (i.e., 8 bit) encoding. These four bytes are used to represent the group, plane, line, and word bits.
GB2312 predetermined characters to Chinese characters, including a simplified three thousand characters. Due to the large number of Chinese characters (about 10 words), my country has gradually increased six supplementary set. Wherein the basic set and the second, fourth auxiliary set is a set of simplified Chinese characters, the first (i.e., GB 12345), third, fifth auxiliary set is a set of traditional, and substantially set the first, second and third, four for the fifth supplementary set respectively Jane, Traditional one relationship, (except individual characters-many simple, complex relationship). The 7th supplementary set of Chinese characters is the source of CJK Unified Chinese characters GB13000.1 part of Chinese characters in Japan, Korea and Taiwan use. Seven Kanji character set contains a total of about 49,000 words (Simplified and Traditional coding, respectively).
As can be seen, the total position coding GB13000 up 2,147,483,648 (group × 256 128 × 256 lines × 256 plane-bit word). The current implementation is 00 plane 00 groups, called "Basic Multilingual Plane" (Basic Multilingual Plane, BMP), coding position 65536. (Since the first two bytes multilingual plane substantially all character codes are 0 (00 lines plane XX XX group 00 bit word), and therefore, the current default, basic multilingual plane treated to two bytes. )
features
code space is very large and can accommodate many languages simultaneously encode, also ensure that the multi-language simultaneous processing;
as a unified coding , Latin text and other characters, are the same number "eight" coding, that is: all four bytes, in the basic multilingual plane, are double-byte;
Note: for GB1988 (ISO646 / ASCII) character directly increase high eight as to 0x00.
distinction between characters and glyphs clear: the load entity is an abstract character text, and graphical form the specific shape is visible;
rules by using identification characters, the countries Unicode characters / region, both to meet the actual needs of the number of encoded characters each country / region, but not so much due to the influence of Chinese characters and the code bit basic multilingual plane occupied to encode other characters:
due to the large amount of text in the world, it is impossible that all text encoding, to this end, designated a special area for standard users to achieve their particular needs uncoded characters.
which, CJK Unified Chinese characters and CJK Unified Han Extension A collection of GB2312 and the first, third, fifth, seventh auxiliary set of all the characters 27,484. Kangxi radical, Japan and ROK supplement radical contains a total of 369 Chinese radicals.
In addition, also included Wei Hake text (belonging to the Arabic system), Korean, Yi, Tibetan and Mongolian. Expected soon to increase the Dai language.