The syntax listed in 21.2.1 Patterns is modified as follows.
The following items are appended to 21.2.1.1 Static Semantics: Early Errors.
The following two abstract operations are appended to 21.2.2.8 Atom.
The algorithm uses values from the following tables, which associate supported Unicode property names and property aliases and their canonical property names.
Implementations must support the following non-binary Unicode properties and their property aliases:
Property name and aliases | Canonical property name |
---|---|
|
General_Category |
|
Script |
|
Script_Extensions |
Additionally, implementations must support the following binary Unicode properties and their property aliases:
Property name and aliases | Canonical property name |
---|---|
ASCII |
ASCII |
|
ASCII_Hex_Digit |
|
Alphabetic |
Any |
Any |
Assigned |
Assigned |
|
Bidi_Control |
|
Bidi_Mirrored |
|
Case_Ignorable |
Cased |
Cased |
|
Changes_When_Casefolded |
|
Changes_When_Casemapped |
|
Changes_When_Lowercased |
|
Changes_When_NFKC_Casefolded |
|
Changes_When_Titlecased |
|
Changes_When_Uppercased |
Dash |
Dash |
|
Default_Ignorable_Code_Point |
|
Deprecated |
|
Diacritic |
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
Emoji_Modifier |
Emoji_Modifier |
Emoji_Modifier_Base |
Emoji_Modifier_Base |
Emoji_Presentation |
Emoji_Presentation |
|
Extender |
|
Grapheme_Base |
|
Grapheme_Extend |
|
Hex_Digit |
|
IDS_Binary_Operator |
|
IDS_Trinary_Operator |
|
ID_Continue |
|
ID_Start |
|
Ideographic |
|
Join_Control |
|
Logical_Order_Exception |
|
Lowercase |
Math |
Math |
|
Noncharacter_Code_Point |
|
Pattern_Syntax |
|
Pattern_White_Space |
|
Quotation_Mark |
Radical |
Radical |
|
Regional_Indicator |
|
Sentence_Terminal |
|
Soft_Dotted |
|
Terminal_Punctuation |
|
Unified_Ideograph |
|
Uppercase |
|
Variation_Selector |
|
White_Space |
|
XID_Continue |
|
XID_Start |
The abstract operation UnicodeMatchProperty takes a parameter p that is a
To ensure interoperability, implementations must not extend Unicode property support to the remaining properties.
Implementations must only recognize the property aliases listed in
Implementations must only recognize the property value aliases and canonical property value names listed in
For example, Script_Extensions
(property name) and scx
(property alias) are valid, but script_extensions
or Scx
aren’t.
The listed properties form a superset of what UTS18 RL1.2 requires.
The algorithm uses values from the following tables, which associate canonical Unicode property names and their supported values and value aliases:
General_Category
Property value and aliases | Canonical property value |
---|---|
|
Cased_Letter |
|
Close_Punctuation |
|
Connector_Punctuation |
|
Control |
|
Currency_Symbol |
|
Dash_Punctuation |
|
Decimal_Number |
|
Enclosing_Mark |
|
Final_Punctuation |
|
Format |
|
Initial_Punctuation |
|
Letter |
|
Letter_Number |
|
Line_Separator |
|
Lowercase_Letter |
|
Mark |
|
Math_Symbol |
|
Modifier_Letter |
|
Modifier_Symbol |
|
Nonspacing_Mark |
|
Number |
|
Open_Punctuation |
|
Other |
|
Other_Letter |
|
Other_Number |
|
Other_Punctuation |
|
Other_Symbol |
|
Paragraph_Separator |
|
Private_Use |
|
Punctuation |
|
Separator |
|
Space_Separator |
|
Spacing_Mark |
|
Surrogate |
|
Symbol |
|
Titlecase_Letter |
|
Unassigned |
|
Uppercase_Letter |
Script
and Script_Extensions
Property value and aliases | Canonical property value |
---|---|
|
Adlam |
|
Ahom |
|
Anatolian_Hieroglyphs |
|
Arabic |
|
Armenian |
|
Avestan |
|
Balinese |
|
Bamum |
|
Bassa_Vah |
|
Batak |
|
Bengali |
|
Bhaiksuki |
|
Bopomofo |
|
Brahmi |
|
Braille |
|
Buginese |
|
Buhid |
|
Canadian_Aboriginal |
|
Carian |
|
Caucasian_Albanian |
|
Chakma |
|
Cham |
|
Cherokee |
|
Common |
|
Coptic |
|
Cuneiform |
|
Cypriot |
|
Cyrillic |
|
Deseret |
|
Devanagari |
|
Duployan |
|
Egyptian_Hieroglyphs |
|
Elbasan |
|
Ethiopic |
|
Georgian |
|
Glagolitic |
|
Gothic |
|
Grantha |
|
Greek |
|
Gujarati |
|
Gurmukhi |
|
Han |
|
Hangul |
|
Hanunoo |
|
Hatran |
|
Hebrew |
|
Hiragana |
|
Imperial_Aramaic |
|
Inherited |
|
Inscriptional_Pahlavi |
|
Inscriptional_Parthian |
|
Javanese |
|
Kaithi |
|
Kannada |
|
Katakana |
|
Kayah_Li |
|
Kharoshthi |
|
Khmer |
|
Khojki |
|
Khudawadi |
|
Lao |
|
Latin |
|
Lepcha |
|
Limbu |
|
Linear_A |
|
Linear_B |
|
Lisu |
|
Lycian |
|
Lydian |
|
Mahajani |
|
Malayalam |
|
Mandaic |
|
Manichaean |
|
Marchen |
|
Masaram_Gondi |
|
Meetei_Mayek |
|
Mende_Kikakui |
|
Meroitic_Cursive |
|
Meroitic_Hieroglyphs |
|
Miao |
|
Modi |
|
Mongolian |
|
Mro |
|
Multani |
|
Myanmar |
|
Nabataean |
|
New_Tai_Lue |
|
Newa |
|
Nko |
|
Nushu |
|
Ogham |
|
Ol_Chiki |
|
Old_Hungarian |
|
Old_Italic |
|
Old_North_Arabian |
|
Old_Permic |
|
Old_Persian |
|
Old_South_Arabian |
|
Old_Turkic |
|
Oriya |
|
Osage |
|
Osmanya |
|
Pahawh_Hmong |
|
Palmyrene |
|
Pau_Cin_Hau |
|
Phags_Pa |
|
Phoenician |
|
Psalter_Pahlavi |
|
Rejang |
|
Runic |
|
Samaritan |
|
Saurashtra |
|
Sharada |
|
Shavian |
|
Siddham |
|
SignWriting |
|
Sinhala |
|
Sora_Sompeng |
|
Soyombo |
|
Sundanese |
|
Syloti_Nagri |
|
Syriac |
|
Tagalog |
|
Tagbanwa |
|
Tai_Le |
|
Tai_Tham |
|
Tai_Viet |
|
Takri |
|
Tamil |
|
Tangut |
|
Telugu |
|
Thaana |
|
Thai |
|
Tibetan |
|
Tifinagh |
|
Tirhuta |
|
Ugaritic |
|
Vai |
|
Warang_Citi |
|
Yi |
|
Zanabazar_Square |
The abstract operation UnicodeMatchPropertyValue takes two parameters p and v, each of which is a
Only the canonical property values and property value aliases listed in
For example, Xpeo
and Old_Persian
are valid Script_Extension
values, but xpeo
and Old Persian
aren’t.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is
prefix is not supported.
The following is appended to the list of productions in 21.2.2.12 CharacterClassEscape.
The production
The production
The production
The production
"General_Category"
, LoneUnicodePropertyNameOrValue) is identical to a General_Category
with value LoneUnicodePropertyNameOrValue.The following is appended to the bibliography.
Script
Property, available at <https://unicode.org/reports/tr24/>