« Previous • Trail • Next »

Terminology

A character is a minimal unit of text with no shape or value.

A character set is a collection of characters that might be used by multiple languages. For example, the Latin character set is used by English and most European languages, though the Greek character set is used only by the Greek language.

A coded character set is a character set where each character is assigned a unique number.

A code point is a value that can be used in a coded character set. A code point is a 32-bit int datat type, where the lower 21 bits represent a valid code point value and the upper 11 bits are 0.

A Unicode code unit is a 16-bit char value. For example, imagine a String that contains the letters "abc" followed by the Deseret LONG I, which is represented with two char values. That string contains four characters, four code points, but five code units.

To express a character in Unicode, the hexadecimal value is prefixed with the string U+. The valid code point range for the Unicode standard is U+0000 to U+10FFFF, inclusive. The code point value for the Latin character A is U+0040. The character € which represents the Euro currency, has the code point value U+20AC. The first letter in the Deseret alphabet, the LONG I, has the code point value U+10400.

The following table shows code point values for several characters:

Character Unicode Code Point Glyph

Latin A U+0041

Latin sharp S U+00DF

Han for East U+6771

Deseret, LONG I U+10400

Character	Unicode Code Point	Glyph
Latin A	U+0041
Latin sharp S	U+00DF
Han for East	U+6771
Deseret, LONG I	U+10400

As previously described, characters that are in the range U+10000 to U+10FFFF are called supplementary characters. The set of characters from U+0000 to U+FFFF are sometimes referred to as the Basic Multilingual Plane (BMP).

More terminology can be found in the Glossary of Unicode Terms, listed on the More Information page.

« Previous • Trail • Next »

Problems with the examples? Try Compiling and Running the Examples: FAQs.
Complaints? Compliments? Suggestions? Give us your feedback.

Your use of this page and all the material on pages under "The Java Tutorials" banner, and all the material on pages under "The Java Tutorials" banner is subject to the Java SE Tutorial Copyright and License. Additionally, any example code contained in any of these Java Tutorials pages is licensed under the Code Sample License.

About Oracle | Oracle Technology Network | Terms of Service