↑点击上方蓝色字体,关注“嵌入式软件实战派”获得更多精品干货。
早期,代码页是IBM称呼计算机的BIOS所支持的字符集编码。当时通用的操作系统都是命令行界面,这些操作系统直接使用BIOS提供的字符绘制功能来显示字符(或者是一组嵌入在显卡字符生成器中的字形)。这些BIOS代码页也被称为OEM代码页。图形操作系统使用自己的字符呈现引擎(rendering engine),可以支持多个不同的字符集编码,这类代码页被称作ANSI代码页。早期IBM和微软内部使用数字来标记不同的编码字符集,不同的厂商对同一个字符集编码使用各自不同的名称。例如,UTF-8在IBM称作代码页1208,在微软称作代码页65001,在SAP称作代码页4110。
代码页 | 简称 | 全称 |
37 | IBM037 | IBM EBCDIC (US-Canada) |
437 | IBM437 | OEM United States |
500 | IBM500 | IBM EBCDIC (International) |
708 | ASMO-708 | Arabic (ASMO 708) |
720 | DOS-720 | Arabic (DOS) |
737 | ibm737 | Greek (DOS) |
775 | ibm775 | Baltic (DOS) |
850 | ibm850 | Western European (DOS) |
852 | ibm852 | Central European (DOS) |
855 | IBM855 | OEM Cyrillic |
857 | ibm857 | Turkish (DOS) |
858 | IBM00858 | OEM Multilingual Latin I |
860 | IBM860 | Portuguese (DOS) |
861 | ibm861 | Icelandic (DOS) |
862 | DOS-862 | Hebrew (DOS) |
863 | IBM863 | French Canadian (DOS) |
864 | IBM864 | Arabic (864) |
865 | IBM865 | Nordic (DOS) |
866 | cp866 | Cyrillic (DOS) |
869 | ibm869 | Greek, Modern (DOS) |
870 | IBM870 | IBM EBCDIC (Multilingual Latin-2) |
874 | windows-874 | Thai (Windows) |
875 | cp875 | IBM EBCDIC (Greek Modern) |
932 | shift_jis | Japanese (Shift-JIS) |
936 | gb2312 | Chinese Simplified (GB2312) |
949 | ks_c_5601-1987 | Korean |
950 | big5 | Chinese Traditional (Big5) |
1026 | IBM1026 | IBM EBCDIC (Turkish Latin-5) |
1047 | IBM01047 | IBM Latin-1 |
1140 | IBM01140 | IBM EBCDIC (US-Canada-Euro) |
1141 | IBM01141 | IBM EBCDIC (Germany-Euro) |
1142 | IBM01142 | IBM EBCDIC (Denmark-Norway-Euro) |
1143 | IBM01143 | IBM EBCDIC (Finland-Sweden-Euro) |
1144 | IBM01144 | IBM EBCDIC (Italy-Euro) |
1145 | IBM01145 | IBM EBCDIC (Spain-Euro) |
1146 | IBM01146 | IBM EBCDIC (UK-Euro) |
1147 | IBM01147 | IBM EBCDIC (France-Euro) |
1148 | IBM01148 | IBM EBCDIC (International-Euro) |
1149 | IBM01149 | IBM EBCDIC (Icelandic-Euro) |
1200 | utf-16 | Unicode |
1201 | unicodeFFFE | Unicode (Big-Endian) |
1250 | windows-1250 | Central European (Windows) |
1251 | windows-1251 | Cyrillic (Windows) |
1252 | Windows-1252 | Western European (Windows) |
1253 | windows-1253 | Greek (Windows) |
1254 | windows-1254 | Turkish (Windows) |
1255 | windows-1255 | Hebrew (Windows) |
1256 | windows-1256 | Arabic (Windows) |
1257 | windows-1257 | Baltic (Windows) |
1258 | windows-1258 | Vietnamese (Windows) |
1361 | Johab | Korean (Johab) |
10000 | macintosh | Western European (Mac) |
10001 | x-mac-japanese | Japanese (Mac) |
10002 | x-mac-chinesetrad | Chinese Traditional (Mac) |
10003 | x-mac-korean | Korean (Mac) |
10004 | x-mac-arabic | Arabic (Mac) |
10005 | x-mac-hebrew | Hebrew (Mac) |
10006 | x-mac-greek | Greek (Mac) |
10007 | x-mac-cyrillic | Cyrillic (Mac) |
10008 | x-mac-chinesesimp | Chinese Simplified (Mac) |
10010 | x-mac-romanian | Romanian (Mac) |
10017 | x-mac-ukrainian | Ukrainian (Mac) |
10021 | x-mac-thai | Thai (Mac) |
10029 | x-mac-ce | Central European (Mac) |
10079 | x-mac-icelandic | Icelandic (Mac) |
10081 | x-mac-turkish | Turkish (Mac) |
10082 | x-mac-croatian | Croatian (Mac) |
20000 | x-Chinese-CNS | Chinese Traditional (CNS) |
20001 | x-cp20001 | TCA Taiwan |
20002 | x-Chinese-Eten | Chinese Traditional (Eten) |
20003 | x-cp20003 | IBM5550 Taiwan |
20004 | x-cp20004 | TeleText Taiwan |
20005 | x-cp20005 | Wang Taiwan |
20105 | x-IA5 | Western European (IA5) |
20106 | x-IA5-German | German (IA5) |
20107 | x-IA5-Swedish | Swedish (IA5) |
20108 | x-IA5-Norwegian | Norwegian (IA5) |
20127 | us-ascii | US-ASCII |
20261 | x-cp20261 | T.61 |
20269 | x-cp20269 | ISO-6937 |
20273 | IBM273 | IBM EBCDIC (Germany) |
20277 | IBM277 | IBM EBCDIC (Denmark-Norway) |
20278 | IBM278 | IBM EBCDIC (Finland-Sweden) |
20280 | IBM280 | IBM EBCDIC (Italy) |
20284 | IBM284 | IBM EBCDIC (Spain) |
20285 | IBM285 | IBM EBCDIC (UK) |
20290 | IBM290 | IBM EBCDIC (Japanese katakana) |
20297 | IBM297 | IBM EBCDIC (France) |
20420 | IBM420 | IBM EBCDIC (Arabic) |
20423 | IBM423 | IBM EBCDIC (Greek) |
20424 | IBM424 | IBM EBCDIC (Hebrew) |
20833 | x-EBCDIC-KoreanExtended | IBM EBCDIC (Korean Extended) |
20838 | IBM-Thai | IBM EBCDIC (Thai) |
20866 | koi8-r | Cyrillic (KOI8-R) |
20871 | IBM871 | IBM EBCDIC (Icelandic) |
20880 | IBM880 | IBM EBCDIC (Cyrillic Russian) |
20905 | IBM905 | IBM EBCDIC (Turkish) |
20924 | IBM00924 | IBM Latin-1 |
20932 | EUC-JP | Japanese (JIS 0208-1990 and 0212-1990) |
20936 | x-cp20936 | Chinese Simplified (GB2312-80) |
20949 | x-cp20949 | Korean Wansung |
21025 | cp1025 | IBM EBCDIC (Cyrillic Serbian-Bulgarian) |
21866 | koi8-u | Cyrillic (KOI8-U) |
28591 | iso-8859-1 | Western European (ISO) |
28592 | iso-8859-2 | Central European (ISO) |
28593 | iso-8859-3 | Latin 3 (ISO) |
28594 | iso-8859-4 | Baltic (ISO) |
28595 | iso-8859-5 | Cyrillic (ISO) |
28596 | iso-8859-6 | Arabic (ISO) |
28597 | iso-8859-7 | Greek (ISO) |
28598 | iso-8859-8 | Hebrew (ISO-Visual) |
28599 | iso-8859-9 | Turkish (ISO) |
28603 | iso-8859-13 | Estonian (ISO) |
28605 | iso-8859-15 | Latin 9 (ISO) |
29001 | x-Europa | Europa |
38598 | iso-8859-8-i | Hebrew (ISO-Logical) |
50220 | iso-2022-jp | Japanese (JIS) |
50221 | csISO2022JP | Japanese (JIS-Allow 1 byte Kana) |
50222 | iso-2022-jp | Japanese (JIS-Allow 1 byte Kana - SO/SI) |
50225 | iso-2022-kr | Korean (ISO) |
50227 | x-cp50227 | Chinese Simplified (ISO-2022) |
51932 | euc-jp | Japanese (EUC) |
51936 | EUC-CN | Chinese Simplified (EUC) |
51949 | euc-kr | Korean (EUC) |
52936 | hz-gb-2312 | Chinese Simplified (HZ) |
54936 | GB18030 | Chinese Simplified (GB18030) |
57002 | x-iscii-de | ISCII Devanagari |
57003 | x-iscii-be | ISCII Bengali |
57004 | x-iscii-ta | ISCII Tamil |
57005 | x-iscii-te | ISCII Telugu |
57006 | x-iscii-as | ISCII Assamese |
57007 | x-iscii-or | ISCII Oriya |
57008 | x-iscii-ka | ISCII Kannada |
57009 | x-iscii-ma | ISCII Malayalam |
57010 | x-iscii-gu | ISCII Gujarati |
57011 | x-iscii-pa | ISCII Punjabi |
65000 | utf-7 | Unicode (UTF-7) |
65001 | utf-8 | Unicode (UTF-8) |
65005 | utf-32 | Unicode (UTF-32) |
65006 | utf-32BE | Unicode (UTF-32 Big-Endian) |
> iconv
命令参数解释: -f encoding :把字符从encoding编码开始转换。 -t encoding :把字符转换到encoding编码。 -l :列出已知的编码字符集合 -o file :指定输出文件 -c :忽略输出的非法字符 -s :禁止警告信息,但不是错误信息 --verbose :显示进度信息 -f和-t所能指定的合法字符在-l选项的命令里面都列出来了。 |
分类 | 字符编码 |
European languages | ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16}, KOI8-R, KOI8-U, KOI8-RU, CP{1250,1251,1252,1253,1254,1257}, CP{850,866,1131}, Mac{Roman,CentralEurope,Iceland,Croatian,Romania}, Mac{Cyrillic,Ukraine,Greek,Turkish}, Macintosh |
Semitic languages | ISO-8859-{6,8}, CP{1255,1256}, CP862, Mac{Hebrew,Arabic} |
Japanese | EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1, ISO-2022-JP-MS |
Chinese | EUC-CN, HZ, GBK, CP936, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS, BIG5-HKSCS:2004, BIG5-HKSCS:2001, BIG5-HKSCS:1999, ISO-2022-CN, ISO-2022-CN-EXT |
Korean | EUC-KR, CP949, ISO-2022-KR, JOHAB |
Armenian | ARMSCII-8 |
Georgian | Georgian-Academy, Georgian-PS |
Tajik | KOI8-T |
Kazakh | PT154, RK1048 |
Thai | ISO-8859-11, TIS-620, CP874, MacThai |
Laotian | MuleLao-1, CP1133 |
Vietnamese | VISCII, TCVN, CP1258 |
Platform specifics | HP-ROMAN8, NEXTSTEP |
Full Unicode | UTF-8 UCS-2, UCS-2BE, UCS-2LE UCS-4, UCS-4BE, UCS-4LE UTF-16, UTF-16BE, UTF-16LE UTF-32, UTF-32BE, UTF-32LE UTF-7 C99, JAVA Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment) |
char, wchar_t | The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding. |
如果你不知道iconv怎么获取或者从官网下载不了?请关注公众号回复“iconv”获得下载链接。