Java: Unicode
Unicode is a system of encoding characters. All characters
and Strings in Java use the Unicode encoding, which allows truly international
programming.
About Unicode
- Unicode characters are 16 bits, 2 bytes.
- Most programming languages before Java (C/C++, Pascal, Basic, ...)
use an 8-bit encoding of ASCII
(American Standard Coding for Information Interchange).
ASCII only defines the first 128 characters, and the
other 128 values are often used for various extensions.
- There are 65,567 possible characters in Unicode.
About 50,000 of them have already been assigned
in Unicode version 3.0.
- All of the world's major human languages can be represented
in Unicode (including Chinese, Japanese, and Korean).
- The first 64 characters of Unicode have the same values as
the equivalent ASCII characters, and the first 128 characters
are the same as ISO-8895-1 Latin-1.
- You can learn more about Unicode (currently version 3.0) at
www.unicode.org.
Unicode Fonts
Altho Java stores characters as Unicode, there are still some very
practical operating system problems in entering or displaying many of the Unicode
characters. Most fonts only display a very small subset of all Unicode characters,
typically only about 100 different characters.
You can download the Lucida Sans Unicode font from Microsoft
that has most of the Unicode characters. This is not a small file of course.
[Need to add URL and size].