Han unification
| Unicode series |
| Unicode |
| UTF-7 |
| UTF-8 |
| UTF-16 |
| UTF-32 |
| Punycode |
| BiDi |
| BOM |
| Consortium |
| UCS |
| Han unification |
The Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified glyphs. The Chinese characters are common to Chinese (where they are called "hanzi"), Japanese (where they are called kanji), and Korean (where they are called hanja). Modern Korean, Chinese and Japanese typefaces may represent a given Han character as somewhat different glyphs. However, in the formulation of Unicode, these differences were folded. This unification is referred to as "Han Unification", with the resulting character repertoire sometimes referred to as Unihan.
An article by IBM has a good explanation of this issue [1]:
- The problem stems from the fact that Unicode encodes characters rather than "glyphs," which are the visual representations of the characters. There are four basic traditions for East Asian character shapes: traditional Chinese, simplified Chinese, Japanese, and Korean. While the Han root character may be the same for CJK languages, the glyphs in common use for the same characters may not be.
- For example, the traditional Chinese glyph for "grass" uses four strokes for the "grass" radical, whereas the simplified Chinese, Japanese, and Korean glyphs use three. But there is only one Unicode point for the grass character (草, U+8349) regardless of writing system. Another example is the ideograph for "one," which is different in Chinese, Japanese, and Korean. Many people think that the three versions should be encoded differently.
The process of Han unification was very controversial with most of the opposition coming from the Japanese. Opponents of Han unification state that it steamrollers over thousands of years of cultural tradition, misses many of the subtleties that are one of the most important features of these languages, and renders serious literature and academic research in these languages impossible. Proponents of Han unification state that the Unicode BMP set of unified characters is "good enough" for almost all everyday uses of the languages that use these scripts, that Unicode 3.1 greatly extends this repertoire for academic and literary needs, and that other encodings are also available for specialist academic purposes. Noted that most of the opposition to Han unification appears to be Japanese. This might make some sense because the Unicode is a vast improvement over the chaotic system of Chinese encoding.
Specialist character sets developed to address, or regarded as not suffering from, these perceived deficiences include:
- CNS character set
- CCCII character set
- Giga Character Set
- TRON
- UTF-2000
| Table of contents |
|
2 See also: 3 External links |
The following table contains identical characters in all three rows, but the first row is marked (via an HTML attribute) as Chinese, the second as Japanese, and the third as Korean. So ideally your browser should select fontss and glyphs that suit each language better. See if it really happens.
Check your browser:
| Chinese | 今 | 化 | 外 | 天 | 才 | 海 | 町 | 画 | 直 | 空 | 角 |
| Japanese | 今 | 化 | 外 | 天 | 才 | 海 | 町 | 画 | 直 | 空 | 角 |
| Korean | 今 | 化 | 外 | 天 | 才 | 海 | 町 | 画 | 直 | 空 | 角 |
See also:
External links