The Han unification reference article from the English Wikipedia on 24-Apr-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Han unification

Sponsorship the way you would do it
Unicode
series
Unicode
UTF-7
UTF-8
UTF-16
UTF-32
Punycode
BiDi
BOM
Consortium
UCS
Han unification

The Han unification is the process used by the authors of Unicode and the Universal Character Set to map multiple character sets of the CJK languages into a single set of unified glyphs. The Chinese characters are common to Chinese (where they are called "hanzi"), Japanese (where they are called kanji), and Korean (where they are called hanja). Modern Korean, Chinese and Japanese typefaces may represent a given Han character as somewhat different glyphs. However, in the formulation of Unicode, these differences were folded. This unification is referred to as "Han Unification", with the resulting character repertoire sometimes referred to as Unihan.

An article by IBM has a good explanation of this issue [1]:

The problem stems from the fact that Unicode encodes characters rather than "glyphs," which are the visual representations of the characters. There are four basic traditions for East Asian character shapes: traditional Chinese, simplified Chinese, Japanese, and Korean. While the Han root character may be the same for CJK languages, the glyphs in common use for the same characters may not be.

For example, the traditional Chinese glyph for "grass" uses four strokes for the "grass" radical, whereas the simplified Chinese, Japanese, and Korean glyphs use three. But there is only one Unicode point for the grass character (, U+8349) regardless of writing system. Another example is the ideograph for "one," which is different in Chinese, Japanese, and Korean. Many people think that the three versions should be encoded differently.

A slight difference in rendering characters might be a serious problem. Besides a nuisance like Japanese text might look like Chinese, names might be displayed as a different character — the same character in the sense of encoding but a different character in the view of the users. This rendering problem is often employed to criticize westerners for not being aware of subtle distinctions.

The process of Han unification was very controversial with most of the opposition coming from the Japanese. Opponents of Han unification state that it steamrollers over thousands of years of cultural tradition, misses many of the subtleties that are one of the most important features of these languages, and renders serious literature and academic research in these languages impossible. Proponents of Han unification state that the Unicode BMP set of unified characters is "good enough" for almost all everyday uses of the languages that use these scripts, that Unicode 3.1 greatly extends this repertoire for academic and literary needs, and that other encodings are also available for specialist academic purposes. Noted that most of the opposition to Han unification appears to be Japanese. This might make some sense because the Unicode is a vast improvement over the chaotic system of Chinese encoding.

Specialist character sets developed to address, or regarded as not suffering from, these perceived deficiences include:

However, none of these alternative standards have been as widely adopted as Unicode.

Table of contents
1 Check your browser:
2 See also:
3 External links

Check your browser:

The following table contains identical characters in all three rows, but the first row is marked (via an HTML attribute) as Chinese, the second as Japanese, and the third as Korean. So ideally your browser should select fontss and glyphs that suit each language better. See if it really happens.

Chinese
Japanese
Korean

See also:

External links