Easy Tutorial
❮ Met Win Scrollto Met Node Comparedocumentposition ❯

HTML Character Set


HTML Character Set

To display an HTML page correctly, a browser must know which character set to use.

The early character set widely used on the World Wide Web was ASCII. ASCII supported numbers 0-9, the upper and lower case English alphabet, and some special characters.

Complete ASCII Reference Manual.

Since many countries use characters that do not belong to ASCII, the default character set for modern browsers is ISO-8859-1.

Complete ISO-8859-1 Reference Manual.

If a web page uses a character set different from ISO-8859-1, it should be specified in the <meta> tag.

Try it Yourself


ISO Character Sets

ISO character sets are standards defined by the International Organization for Standardization (ISO) for different alphabets/languages.

Below is a list of different character sets used around the world:

Character Set Description Usage
ISO-8859-1 Latin alphabet part 1 North America, Western Europe, Latin America, Caribbean, Canada, Africa
ISO-8859-2 Latin alphabet part 2 Eastern Europe
ISO-8859-3 Latin alphabet part 3 SE Europe, Esperanto, miscellaneous others
ISO-8859-4 Latin alphabet part 4 Scandinavian/Baltic (and others not included in ISO-8859-1)
ISO-8859-5 Latin/Cyrillic part 5 Languages using the ancient Slavic alphabet, such as Bulgarian, Belarusian, Russian, Macedonian
ISO-8859-6 Latin/Arabic part 6 Languages using the Arabic alphabet
ISO-8859-7 Latin/Greek part 7 Modern Greek, along with Greek-derived mathematical symbols
ISO-8859-8 Latin/Hebrew part 8 Languages using the Hebrew alphabet
ISO-8859-9 Latin 5 part 9 Turkish. Identical to ISO-8859-1 except for Turkish characters replacing the Icelandic ones
ISO-8859-10 Latin 6 Lapland, Germanic, Eskimo North European
ISO-8859-15 Latin 9 (aka Latin 0) Similar to ISO 8859-1, with the Euro symbol and some other characters replacing less commonly used ones
ISO-2022-JP Latin/Japanese part 1 Japanese
ISO-2022-JP-2 Latin/Japanese part 2 Japanese
ISO-2022-KR Latin/Korean part 1 Korean

Unicode Standard

Due to the capacity limitations and incompatibility with multilingual environments of the character sets listed above, the Unicode Consortium developed the Unicode Standard.

The Unicode Standard covers all the characters, punctuations, and symbols in the world.

Unicode enables processing, storage, and interchange of text data, regardless of the platform, program, or language.


Unicode Consortium

The Unicode Consortium developed the Unicode Standard. Their goal is to replace existing character sets with the standard Unicode Transformation Format (UTF).

The Unicode Standard has been successful and is implemented in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML. Unicode is also supported in many operating systems and all modern browsers.

The Unicode Consortium collaborates with leading standard development organizations such as ISO, W3C, and ECMA.

Unicode can be compatible with different character sets. The most commonly used encodings are UTF-8 and UTF-16:

Character Set Description
UTF-8 Characters in UTF-8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backward compatible with ASCII. UTF-8 is the preferred encoding for web pages and email.
UTF-16 16-bit Unicode Transformation Format is a variable-length Unicode character encoding that can encode the entire Unicode instruction set. UTF-16 is mainly used in operating systems and environments such as Microsoft's Windows 2000/XP/2003/Vista/CE, Java, and .NET byte code environments.

Tip: The first 256 Unicode character set characters correspond to the 256 ISO-8859-1 characters.

Tip: All HTML 4 browsers support UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16!

❮ Met Win Scrollto Met Node Comparedocumentposition ❯