The syntax listed in 21.2.1 Patterns is modified as follows.
The following items are appended to 21.2.1.1 Static Semantics: Early Errors.
The following two abstract operations are appended to 21.2.2.8 Atom.
The algorithm uses values from the following tables, which associate supported Unicode property names and property aliases and their canonical property names.
Implementations must support the following non-binary Unicode properties and their property aliases:
| Property name and aliases | Canonical property name |
|---|---|
|
General_Category |
|
Script |
|
Script_Extensions |
Additionally, implementations must support the following binary Unicode properties and their property aliases:
| Property name and aliases | Canonical property name |
|---|---|
ASCII |
ASCII |
|
ASCII_Hex_Digit |
|
Alphabetic |
Any |
Any |
Assigned |
Assigned |
|
Bidi_Control |
|
Bidi_Mirrored |
|
Case_Ignorable |
Cased |
Cased |
|
Changes_When_Casefolded |
|
Changes_When_Casemapped |
|
Changes_When_Lowercased |
|
Changes_When_NFKC_Casefolded |
|
Changes_When_Titlecased |
|
Changes_When_Uppercased |
Dash |
Dash |
|
Default_Ignorable_Code_Point |
|
Deprecated |
|
Diacritic |
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
Emoji_Modifier |
Emoji_Modifier |
Emoji_Modifier_Base |
Emoji_Modifier_Base |
Emoji_Presentation |
Emoji_Presentation |
|
Extender |
|
Grapheme_Base |
|
Grapheme_Extend |
|
Hex_Digit |
|
IDS_Binary_Operator |
|
IDS_Trinary_Operator |
|
ID_Continue |
|
ID_Start |
|
Ideographic |
|
Join_Control |
|
Logical_Order_Exception |
|
Lowercase |
Math |
Math |
|
Noncharacter_Code_Point |
|
Pattern_Syntax |
|
Pattern_White_Space |
|
Quotation_Mark |
Radical |
Radical |
|
Regional_Indicator |
|
Sentence_Terminal |
|
Soft_Dotted |
|
Terminal_Punctuation |
|
Unified_Ideograph |
|
Uppercase |
|
Variation_Selector |
|
White_Space |
|
XID_Continue |
|
XID_Start |
The abstract operation UnicodeMatchProperty takes a parameter p that is a
To ensure interoperability, implementations must not extend Unicode property support to the remaining properties.
Implementations must only recognize the property aliases listed in
Implementations must only recognize the property value aliases and canonical property value names listed in
For example, Script_Extensions (property name) and scx (property alias) are valid, but script_extensions or Scx aren’t.
The listed properties form a superset of what UTS18 RL1.2 requires.
The algorithm uses values from the following tables, which associate canonical Unicode property names and their supported values and value aliases:
General_Category| Property value and aliases | Canonical property value |
|---|---|
|
Cased_Letter |
|
Close_Punctuation |
|
Connector_Punctuation |
|
Control |
|
Currency_Symbol |
|
Dash_Punctuation |
|
Decimal_Number |
|
Enclosing_Mark |
|
Final_Punctuation |
|
Format |
|
Initial_Punctuation |
|
Letter |
|
Letter_Number |
|
Line_Separator |
|
Lowercase_Letter |
|
Mark |
|
Math_Symbol |
|
Modifier_Letter |
|
Modifier_Symbol |
|
Nonspacing_Mark |
|
Number |
|
Open_Punctuation |
|
Other |
|
Other_Letter |
|
Other_Number |
|
Other_Punctuation |
|
Other_Symbol |
|
Paragraph_Separator |
|
Private_Use |
|
Punctuation |
|
Separator |
|
Space_Separator |
|
Spacing_Mark |
|
Surrogate |
|
Symbol |
|
Titlecase_Letter |
|
Unassigned |
|
Uppercase_Letter |
Script and Script_Extensions| Property value and aliases | Canonical property value |
|---|---|
|
Adlam |
|
Ahom |
|
Anatolian_Hieroglyphs |
|
Arabic |
|
Armenian |
|
Avestan |
|
Balinese |
|
Bamum |
|
Bassa_Vah |
|
Batak |
|
Bengali |
|
Bhaiksuki |
|
Bopomofo |
|
Brahmi |
|
Braille |
|
Buginese |
|
Buhid |
|
Canadian_Aboriginal |
|
Carian |
|
Caucasian_Albanian |
|
Chakma |
|
Cham |
|
Cherokee |
|
Common |
|
Coptic |
|
Cuneiform |
|
Cypriot |
|
Cyrillic |
|
Deseret |
|
Devanagari |
|
Duployan |
|
Egyptian_Hieroglyphs |
|
Elbasan |
|
Ethiopic |
|
Georgian |
|
Glagolitic |
|
Gothic |
|
Grantha |
|
Greek |
|
Gujarati |
|
Gurmukhi |
|
Han |
|
Hangul |
|
Hanunoo |
|
Hatran |
|
Hebrew |
|
Hiragana |
|
Imperial_Aramaic |
|
Inherited |
|
Inscriptional_Pahlavi |
|
Inscriptional_Parthian |
|
Javanese |
|
Kaithi |
|
Kannada |
|
Katakana |
|
Kayah_Li |
|
Kharoshthi |
|
Khmer |
|
Khojki |
|
Khudawadi |
|
Lao |
|
Latin |
|
Lepcha |
|
Limbu |
|
Linear_A |
|
Linear_B |
|
Lisu |
|
Lycian |
|
Lydian |
|
Mahajani |
|
Malayalam |
|
Mandaic |
|
Manichaean |
|
Marchen |
|
Masaram_Gondi |
|
Meetei_Mayek |
|
Mende_Kikakui |
|
Meroitic_Cursive |
|
Meroitic_Hieroglyphs |
|
Miao |
|
Modi |
|
Mongolian |
|
Mro |
|
Multani |
|
Myanmar |
|
Nabataean |
|
New_Tai_Lue |
|
Newa |
|
Nko |
|
Nushu |
|
Ogham |
|
Ol_Chiki |
|
Old_Hungarian |
|
Old_Italic |
|
Old_North_Arabian |
|
Old_Permic |
|
Old_Persian |
|
Old_South_Arabian |
|
Old_Turkic |
|
Oriya |
|
Osage |
|
Osmanya |
|
Pahawh_Hmong |
|
Palmyrene |
|
Pau_Cin_Hau |
|
Phags_Pa |
|
Phoenician |
|
Psalter_Pahlavi |
|
Rejang |
|
Runic |
|
Samaritan |
|
Saurashtra |
|
Sharada |
|
Shavian |
|
Siddham |
|
SignWriting |
|
Sinhala |
|
Sora_Sompeng |
|
Soyombo |
|
Sundanese |
|
Syloti_Nagri |
|
Syriac |
|
Tagalog |
|
Tagbanwa |
|
Tai_Le |
|
Tai_Tham |
|
Tai_Viet |
|
Takri |
|
Tamil |
|
Tangut |
|
Telugu |
|
Thaana |
|
Thai |
|
Tibetan |
|
Tifinagh |
|
Tirhuta |
|
Ugaritic |
|
Vai |
|
Warang_Citi |
|
Yi |
|
Zanabazar_Square |
The abstract operation UnicodeMatchPropertyValue takes two parameters p and v, each of which is a
Only the canonical property values and property value aliases listed in
For example, Xpeo and Old_Persian are valid Script_Extension values, but xpeo and Old Persian aren’t.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is prefix is not supported.
The following is appended to the list of productions in 21.2.2.12 CharacterClassEscape.
The production
The production
The production
The production
"General_Category", LoneUnicodePropertyNameOrValue) is identical to a General_Category with value LoneUnicodePropertyNameOrValue.The following is appended to the bibliography.
Script Property, available at <https://unicode.org/reports/tr24/>