\uFFFF
, where FFFF
is the
hexidecimal value of the code point you want to match. For example,
\u6771
matches the Han character for east.
Alternatively, you can specify a code point using Perl-style hex notation,
\x{...}
. For example:
String hexPattern = "\x{" + Integer.toHexString(codePoint) + "}";
\p{prop}
.
You can match a single character not belonging to a particular
category with the expression \P{prop}
.
The three supported property types are scripts, blocks, and a "general" category.
script
keyword, or the
sc
short form, for example, \p{script=Hiragana}
.
Alternatively, you can prefix the script name with the string
Is
, such as \p{IsHiragana}
.
Valid script names supported by Pattern
are those accepted by
UnicodeScript.forName
.
block
keyword, or
the blk
short form, for example, \p{block=Mongolian}
.
Alternatively, you can prefix the block name with the string
In
, such as \p{InMongolian}
.
Valid block names supported by Pattern
are those
accepted by
UnicodeBlock.forName
.
Is
.
For example, IsL
matches the category of Unicode letters.
Categories can also be specified by using the
general_category
keyword, or the short form gc
.
For example, an uppercase letter can be matched using
general_category=Lu
or gc=Lu
.
Supported categories are those of
The Unicode Standard in the version specified by the
Character
class.