code_page, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, dingbats, symbol - Coded character sets that are used on Microsoft Windows and NT systems
Code pages are coded character sets that are used on Microsoft Windows, Windows 95, and NT systems. Just as there are different UNIX codesets, there are different PC code pages, each supporting a particular set of character encodings.
A Tru64 UNIX system supplies one locale, en_US.cp850, that directly supports a PC code-page format (MS-DOS Latin 1). For all other locales, data in code-page format is supported only through codeset converters. These converters can be run directly by users or by software or applications that exchange data between PC and Tru64 UNIX systems. Fonts and other kinds of character support are available only for the native UNIX codeset to which a code page can be converted. See the i18n_intro(5) reference page for introductory information on locales and codesets. See the iconv_intro(5) reference page for an introduction to codeset conversion and the name format and location of codeset converters.
The following table lists and describes the code pages that have conversion support on a Tru64 UNIX system. An asterisk (*) follows the names of code pages that include support for the Euro currency sign (C=).
|cp437||MS-DOS United States|
|cp775||Baltic languages (1)|
|cp850||MS-DOS Multilingual (Latin-1)|
|cp852||MS-DOS Slavic (Latin-2)|
|cp863||MS-DOS Canadian French|
|cp865||MS-DOS Nordic languages|
|cp869||IBM Modern Greek|
|cp874 *||MS-DOS Thai|
|cp936||Chinese (People's Republic of China)|
|cp950||Chinese (Hong Kong)|
|cp1250 *||Windows Latin-2|
|cp1251 *||Windows Cyrillic|
|cp1252 *||Windows Latin-1|
|cp1253 *||Windows Greek|
|cp1254 *||Windows Turkish|
|cp1255 *||Windows Hebrew|
|cp1256 *||Windows Arabic|
|cp1257 *||Windows Baltic (1)|
|cp1258 *||Windows Vietnamese|
|dingbats||Microsoft dingbat characters|
|symbol||Microsoft miscellaneous symbol characters|
(1) Baltic languages include Estonian, Latvian, and Lithuanian.
(2) Latin-2 languages include Albanian, Croatian, Czech, Faeroese, Hungarian, Polish, Romanian, Latin Serbian, Slovak, and Slovenian.
(3) Cyrillic languages include Byelorussian, Bulgarian, and Russian.
In all cases, a code page can be converted to and from the UCS-2, UCS-4, and UTF-8 codesets. In addition, some code pages can be converted directly to ISO codesets as shown in the following table, although some data loss may occur.
|Code Page||Can Be Converted Directly to:|
See Unicode(5) for information about UCS-2, UCS-4, and UTF-8. Reference pages for UNIX implementations of the ISO codesets have the name format iso8859-number(5).
For Traditional Chinese and Japanese, there are no codeset converters whose names include the name of a code page because identical character encoding is provided in existing UNIX codesets. For Traditional Chinese, character encoding in PC code-page format (cp950) is identical to that in the Big-5 (big5) codeset. For Japanese, character encoding in PC code-page format (cp932) is identical to that in the Shift JIS (SJIS) codeset. Therefore, the codeset converters whose names include big5 and SJIS can be used to convert data in and out of PC code-page format for the supported languages.
Conversion of text that starts out in code-page format (cp949) to the DEC Korean (deckorean) codeset may result in loss of data. All of the Tru64 UNIX codeset equivalents for cp949 support all the Hanja and miscellaneous characters also supported by the code page. However, only the UCS-2, UCS-4, and UTF-8 codesets support the complete set of Hangul characters supported by the cp949 code page. The deckorean codeset supports only a subset of these Hangul characters. Therefore, if data is converted from cp949 format to UCS-2, UCS-4, or UTF-8, no data is lost. However, if the data is then converted from UCS-2, UCS-4, or UTF-8 to deckorean, the unsupported Hangul characters will be lost.
The DEC Hanzi (dechanzi) codeset uses the same encoding format as the PC code page used for Simplified Chinese (cp936) but does not support all the characters supported by the code page. Therefore, you can use converters with dechanzi in the converter name to convert text to and from cp936 format, but the operation may result in some loss of data.
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: i18n_intro(5), iconv_intro(5), iso8859-1(5), iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-15(5), Unicode(5)