Content-type: text/html Man page of eucTW

eucTW

Section: File Formats (5)
Index Return to Main Contents
 

NAME

eucTW - A character encoding system (codeset) for Traditional Chinese  

DESCRIPTION

The Taiwanese EUC (Extended UNIX Code), or eucTW, codeset consists of the following character sets: ASCII CNS 11643 (Plane 1 to Plane 16)

Taiwanese EUC uses a combination of single-byte data and 2-byte data to represent ASCII characters, symbols, and ideographic characters. Because too many character planes were included, Taiwanese EUC uses different leading codes to designate different character planes.

ASCII characters are represented in the form of single byte 7-bit data in Taiwanese EUC; that is, the most significant bit (MSB) of the byte that represents an ASCII character is always set off. For more information, refer to ascii(5).

Although the standard Taiwanese EUC codeset includes all characters defined by the CNS 11643-1992 standard, Digital's eucTW implementation currently supports the following: Characters defined in the first and second planes of CNS 11643 The EDPC Recommended Character Set (refer to dechanyu(5) for more information) CNS 11643-1986 and DTSCS characters that have been remapped into the third and fourth character planes by the CNS 11643-1992 standard

Characters that were added to CNS 11643-1986 by the CNS 11643-1992 standard are not supported.

The characters that are defined in plane 1 and plane 2 of CNS 11643-1992 and that are the same as those defined in CNS 11643-1986 are as follows:


Character PlaneCharacter TypeNumber of Characters

1Special characters651
Control characters33
Frequently-used characters5401
2 Less frequently-used characters 7650

The characters defined in plane 3 and plane 4 of CNS 11643-1992 are as follows:


Character PlaneCharacter Type Number of Characters

3Rarely-used characters (EDPC Part I)6148
4 Used for residency system, ISO 2nd edition DIS 10646 Han characters, 171 EDPC Part II Characters 7298

The characters that have been remapped into the third and fourth character planes of CNS 11643-1992 as specified by the EDPC are as follows:


EDPC CharactersCharacter PlaneNumber of Characters

Part IPlane 36148
Part IIPlane 4171


 

Taiwanese EUC Encoding

Except for characters in the first plane of CNS 11643-1986, Taiwanese EUC makes use of a leading code (the 8-bit Single-Shift 2 control character (SS2) and an additional byte) to designate characters to a character plane.

The position of a character on a plane is specified by two bytes. The first byte determines the character's row number and the second byte determines the character's column number. The MSB of both bytes is set on.

The following table shows the encoding of Taiwanese EUC characters:


CNS 11643-1986 Code PlaneLeading CodeCode Range

1[nil]A1A1 - FEFE
2SS2 A2A1A1 - FEFE
3SS2 A3A1A1 - FEFE
4SS2 A4A1A1 - FEFE
5SS2 A5A1A1 - FEFE
6SS2 A6A1A1 - FEFE
7SS2 A7A1A1 - FEFE
8SS2 A8A1A1 - FEFE
9SS2 A9A1A1 - FEFE
10SS2 AAA1A1 - FEFE
11SS2 ABA1A1 - FEFE
12SS2 ACA1A1 - FEFE
13SS2 ADA1A1 - FEFE
14SS2 AEA1A1 - FEFE
15SS2 AFA1A1 - FEFE
16SS2 B0A1A1 - FEFE


 

Codeset Conversion

The following codeset converter pairs are available for converting Traditional Chinese characters between eucTW and other encoding formats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which eucTW is the input or output, see the reference page specified in the list item. big5_eucTW, eucTW_big5

Converting from and to the Big-5 codeset: big5(5).
Note that Big-5 encoding is equivalent to the Microsoft code-page format used on PCs for Traditional Chinese. You can therefore use this set of converters to convert Traditional Chinese text between the eucTW and PC code-page formats. For information about how the operating system supports PC code pages, see code_page(5). dechanyu_eucTW, eucTW_dechanyu
Converting from and to the DEC Hanyu codeset: dechanyu(5). dechanzi_eucTW, eucTW_dechanzi
Converting from and to the DEC Hanzi codeset: dechanzi(5). sbig5_eucTW, eucTW_sbig5
Converting from and to the Shift Big-5 codeset: sbig5(5). telecode_eucTW, eucTW_telecode
Converting from and to the Telecode codeset: telecode(5). UCS-2_eucTW, eucTW_UCS-2
Converting from and to UCS-2 format: Unicode(5). UCS-4_eucTW, eucTW_UCS-4
Converting from and to UCS-4 format: Unicode(5). UTF-8_eucTW, eucTW_UTF-8
Converting from and to UTF--8 format: Unicode(5).
 

Fonts for Taiwanese EUC

For both display devices and printers, the operating system supports Taiwanese EUC through internal conversion to DEC Hanyu code and use of DEC Hanyu fonts (see dechanyu(5)).

For general information on printing non-English text, refer to i18n_printing(5).
 

SEE ALSO

Commands: locale(1)

Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5), iconv_intro(5), i18n_intro(5), i18n_printing(5), l10n_intro(5), sbig5(5), telecode(5), Unicode(5)


 

Index

NAME
DESCRIPTION
Taiwanese EUC Encoding
Codeset Conversion
Fonts for Taiwanese EUC
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 02:43:10 GMT, October 02, 2010