Stage 2 Draft / July 5, 2021

It is a Syntax Error if the List of Unicode code points that is SourceText of UnicodePropertyName is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 51.
It is a Syntax Error if the List of Unicode code points that is SourceText of UnicodePropertyValue is not identical to a List of Unicode code points that is a value or value alias for the Unicode property or property alias given by SourceText of UnicodePropertyName listed in the “Property value and aliases” column of the corresponding tables Table 53 or Table 54.

UnicodePropertyValueOrSequenceExpression

LoneUnicodePropertyNameOrValue

It is a Syntax Error if the List of Unicode code points that is SourceText of LoneUnicodePropertyNameOrValue is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 53, nor a binary property or binary property alias listed in the “Property name and aliases” column of Table 52, nor a sequence property or sequence property alias listed in the “Property name and aliases” column of Table 1.

21.2.2.8.3 Runtime Semantics: UnicodeMatchProperty is changed as follows:

Implementations must support the Unicode property names and aliases listed in Table 51 ~~and~~, Table 52, and Table 1. To ensure interoperability, implementations must not support any other property names or aliases.

Table 1: Unicode sequence property aliases and their canonical property names

Property name and aliases	Canonical property name
`Basic_Emoji`	`Basic_Emoji`
`RGI_Emoji_Modifier_Sequence`	`RGI_Emoji_Modifier_Sequence`
`RGI_Emoji_Tag_Sequence`	`RGI_Emoji_Tag_Sequence`
`RGI_Emoji_ZWJ_Sequence`	`RGI_Emoji_ZWJ_Sequence`
`RGI_Emoji`	`RGI_Emoji`

21.2.2.1 Notation is changed as follows:

Furthermore, the descriptions below use the following internal data structures:

A CharSet is a mathematical set of characters, either code units or code points depending up the state of the Unicode flag. “All characters” means either all code unit values or all code point values also depending upon the state of Unicode.
A SequenceSet is a mathematical set of sequences of code points.
A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is a List of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The n^th element of captures is either a List that represents the value obtained by the n^th set of capturing parentheses or undefined if the n^th set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
A MatchResult is either a State or the special token failure that indicates that the match failed.
A Continuation procedure is an internal closure (i.e. an internal procedure with some arguments already bound to values) that takes one State argument and returns a MatchResult result. If an internal closure references variables which are bound in the function that creates the closure, the closure uses the values that these variables had at the time the closure was created. The Continuation attempts to match the remaining portion (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns failure.
A Matcher procedure is an internal closure that takes two arguments — a State and a Continuation — and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.
An AssertionTester procedure is an internal closure that takes a State argument and returns a Boolean result. The assertion tester tests a specific condition (specified by the closure's already-bound arguments) against the current place in Input and returns true if the condition matched or false if not.

21.2.2.12 CharacterClassEscape is changed as follows:

The production CharacterClassEscape::p{UnicodePropertyValueOrSequenceExpression} evaluates as follows:

Let v be the return value of UnicodePropertyValueOrSequenceExpression.
If v is a CharSet, then
1. Return the CharSet containing all Unicode code points included in v.
Assert: v is a SequenceSet.
Return the Disjunction containing an Alternative for each of the Unicode code point sequences in v.

The production CharacterClassEscape::P{UnicodePropertyValueExpression} evaluates by returning the CharSet containing all Unicode code points not included in the CharSet returned by UnicodePropertyValueExpression.

The productions UnicodePropertyValueExpression::UnicodePropertyName=UnicodePropertyValue and UnicodePropertyValueOrSequenceExpression::UnicodePropertyName=UnicodePropertyValue evaluate as follows:

Let ps be SourceText of UnicodePropertyName.
Let p be ! UnicodeMatchProperty(ps).
Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 57.
Let vs be SourceText of UnicodePropertyValue.
Let v be ! UnicodeMatchPropertyValue(p, vs).
Return the CharSet containing all Unicode code points whose character database definition includes the property p with value v.

The production UnicodePropertyValueOrSequenceExpression::LoneUnicodePropertyNameOrValue evaluates as follows:

Let s be SourceText of LoneUnicodePropertyNameOrValue.
If ! UnicodeMatchPropertyValue("General_Category", s) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 53, then
1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
If s is identical to a List of Unicode code points that is the name of a Unicode sequence property or sequence property alias listed in the “Property value and aliases” column of Table 1, then
1. Return the SequenceSet containing each Unicode code point sequence included in the Unicode property s.
Let p be ! UnicodeMatchProperty(s).
Assert: p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of Table 52.
Return the CharSet containing all Unicode code points whose character database definition includes the property p with value “True”.