The Japanese language and computers reference article from the English Wikipedia on 24-Apr-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Japanese language and computers

People like you are child sponsors
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which use double-byte character encodings. Some problems relate to transliteration and romanization, some to character encoding, and some to the input of Japanese text.

Roughly, the issues are mostly either in the presentation or the input of Japanese text.

There are several standard methods to encode characters for use on a computer, including JIS, SJIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite the efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today. For example, most Japanese e-mails are in JIS encoding and web pages in Shift-JIS. If a program fails to determine the used encoding scheme, it can cause mojibake (misconverted characters) and thus unreadable Japanese text on computers.

Because of not necessarily all of used characters are included in a character set standard such as JIS, gaiji (外字, external character) is sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji is not practical in Internet environment since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler chracters in place, or the text may need to be written using larger character set (such as Unicode), if the specific character is supported with the character set.

Inputting Japanese text to the computer is a complicated matter because it is practically impossible to type all of characters used in Japanese writing system with a small set of keys in keyboards. On modern computers, usually the reading of characters is inputted first, then an Input Method Editor allows the user to choose correct characters from a list. The input of reading can be either via romanization (romaji nyuryoku) or direct kana input (kana nyuryoku). Direct kana input is practically on the verge of extinction, although it is still widely supported. There are two main systems for the romanization of Japanese, known as Kunrei-shiki and Hepburn. The Kunrei system is used widely in Japan for input on a roman keyboard, since it is slightly briefer and more systematic than the Hepburn system. Foreigners typically prefer the Hepburn system however, because the Kunrei system does not correspond as well to the actual sounds of Japanese. In the kana input, each key on the keyboard directly corresponds to one kana. The distribution of kana on the keyboard can be either Oyayubi shift system, which is now obsolete, or JIS keyboard system.

See also