22 Text Processing

22.1 String Objects

22.1.1 The String Constructor

The String constructor:

  • is %String%.
  • is the initial value of the "String" property of the global object.
  • creates and initializes a new String object when called as a constructor.
  • performs a type conversion when called as a function rather than as a constructor.
  • may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified String behaviour must include a super call to the String constructor to create and initialize the subclass instance with a [[StringData]] internal slot.

22.1.1.1 String ( value )

When String is called with argument value, the following steps are taken:

  1. If value is not present, let s be the empty String.
  2. Else,
    1. If NewTarget is undefined and Type(value) is Symbol, return SymbolDescriptiveString(value).
    2. Let s be ? ToString(value).
  3. If NewTarget is undefined, return s.
  4. Return StringCreate(s, ? GetPrototypeFromConstructor(NewTarget, "%String.prototype%")).

22.1.2 Properties of the String Constructor

The String constructor:

  • has a [[Prototype]] internal slot whose value is %Function.prototype%.
  • has the following properties:

22.1.2.1 String.fromCharCode ( ...codeUnits )

The String.fromCharCode function may be called with any number of arguments which form the rest parameter codeUnits. The following steps are taken:

  1. Let length be the number of elements in codeUnits.
  2. Let elements be a new empty List.
  3. For each element next of codeUnits, do
    1. Let nextCU be (? ToUint16(next)).
    2. Append nextCU to the end of elements.
  4. Return the String value whose code units are the elements in the List elements. If codeUnits is empty, the empty String is returned.

The "length" property of the fromCharCode function is 1𝔽.

22.1.2.2 String.fromCodePoint ( ...codePoints )

The String.fromCodePoint function may be called with any number of arguments which form the rest parameter codePoints. The following steps are taken:

  1. Let result be the empty String.
  2. For each element next of codePoints, do
    1. Let nextCP be ? ToNumber(next).
    2. If IsIntegralNumber(nextCP) is false, throw a RangeError exception.
    3. If (nextCP) < 0 or (nextCP) > 0x10FFFF, throw a RangeError exception.
    4. Set result to the string-concatenation of result and UTF16EncodeCodePoint((nextCP)).
  3. Assert: If codePoints is empty, then result is the empty String.
  4. Return result.

The "length" property of the fromCodePoint function is 1𝔽.

22.1.2.3 String.prototype

The initial value of String.prototype is the String prototype object.

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

22.1.2.4 String.raw ( template, ...substitutions )

The String.raw function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the List substitutions. The following steps are taken:

  1. Let numberOfSubstitutions be the number of elements in substitutions.
  2. Let cooked be ? ToObject(template).
  3. Let raw be ? ToObject(? Get(cooked, "raw")).
  4. Let literalSegments be ? LengthOfArrayLike(raw).
  5. If literalSegments ≤ 0, return the empty String.
  6. Let stringElements be a new empty List.
  7. Let nextIndex be 0.
  8. Repeat,
    1. Let nextKey be ! ToString(𝔽(nextIndex)).
    2. Let nextSeg be ? ToString(? Get(raw, nextKey)).
    3. Append the code unit elements of nextSeg to the end of stringElements.
    4. If nextIndex + 1 = literalSegments, then
      1. Return the String value whose code units are the elements in the List stringElements. If stringElements has no elements, the empty String is returned.
    5. If nextIndex < numberOfSubstitutions, let next be substitutions[nextIndex].
    6. Else, let next be the empty String.
    7. Let nextSub be ? ToString(next).
    8. Append the code unit elements of nextSub to the end of stringElements.
    9. Set nextIndex to nextIndex + 1.
Note

The raw function is intended for use as a tag function of a Tagged Template (13.3.11). When called as such, the first argument will be a well formed template object and the rest parameter will contain the substitution values.

22.1.3 Properties of the String Prototype Object

The String prototype object:

  • is %String.prototype%.
  • is a String exotic object and has the internal methods specified for such objects.
  • has a [[StringData]] internal slot whose value is the empty String.
  • has a "length" property whose initial value is +0𝔽 and whose attributes are { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.
  • has a [[Prototype]] internal slot whose value is %Object.prototype%.

Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the this value passed to them must be either a String value or an object that has a [[StringData]] internal slot that has been initialized to a String value.

The abstract operation thisStringValue takes argument value. It performs the following steps when called:

  1. If Type(value) is String, return value.
  2. If Type(value) is Object and value has a [[StringData]] internal slot, then
    1. Let s be value.[[StringData]].
    2. Assert: Type(s) is String.
    3. Return s.
  3. Throw a TypeError exception.

22.1.3.1 String.prototype.at ( index )

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let len be the length of S.
  4. Let relativeIndex be ? ToIntegerOrInfinity(index).
  5. If relativeIndex ≥ 0, then
    1. Let k be relativeIndex.
  6. Else,
    1. Let k be len + relativeIndex.
  7. If k < 0 or klen, return undefined.
  8. Return the substring of S from k to k + 1.

22.1.3.2 String.prototype.charAt ( pos )

Note 1

Returns a single element String containing the code unit at index pos within the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result is a String value, not a String object.

If pos is an integral Number, then the result of x.charAt(pos) is equivalent to the result of x.substring(pos, pos + 1).

When the charAt method is called with one argument pos, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let position be ? ToIntegerOrInfinity(pos).
  4. Let size be the length of S.
  5. If position < 0 or positionsize, return the empty String.
  6. Return the substring of S from position to position + 1.
Note 2

The charAt function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.3 String.prototype.charCodeAt ( pos )

Note 1

Returns a Number (a non-negative integral Number less than 216) that is the numeric value of the code unit at index pos within the String resulting from converting this object to a String. If there is no element at that index, the result is NaN.

When the charCodeAt method is called with one argument pos, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let position be ? ToIntegerOrInfinity(pos).
  4. Let size be the length of S.
  5. If position < 0 or positionsize, return NaN.
  6. Return the Number value for the numeric value of the code unit at index position within the String S.
Note 2

The charCodeAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

22.1.3.4 String.prototype.codePointAt ( pos )

Note 1

Returns a non-negative integral Number less than or equal to 0x10FFFF𝔽 that is the numeric value of the UTF-16 encoded code point (6.1.4) starting at the string element at index pos within the String resulting from converting this object to a String. If there is no element at that index, the result is undefined. If a valid UTF-16 surrogate pair does not begin at pos, the result is the code unit at pos.

When the codePointAt method is called with one argument pos, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let position be ? ToIntegerOrInfinity(pos).
  4. Let size be the length of S.
  5. If position < 0 or positionsize, return undefined.
  6. Let cp be CodePointAt(S, position).
  7. Return 𝔽(cp.[[CodePoint]]).
Note 2

The codePointAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

22.1.3.5 String.prototype.concat ( ...args )

Note 1

When the concat method is called it returns the String value consisting of the code units of the this value (converted to a String) followed by the code units of each of the arguments converted to a String. The result is a String value, not a String object.

When the concat method is called with zero or more arguments, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let R be S.
  4. For each element next of args, do
    1. Let nextString be ? ToString(next).
    2. Set R to the string-concatenation of R and nextString.
  5. Return R.

The "length" property of the concat method is 1𝔽.

Note 2

The concat function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

22.1.3.6 String.prototype.constructor

The initial value of String.prototype.constructor is %String%.

22.1.3.7 String.prototype.endsWith ( searchString [ , endPosition ] )

The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let isRegExp be ? IsRegExp(searchString).
  4. If isRegExp is true, throw a TypeError exception.
  5. Let searchStr be ? ToString(searchString).
  6. Let len be the length of S.
  7. If endPosition is undefined, let pos be len; else let pos be ? ToIntegerOrInfinity(endPosition).
  8. Let end be the result of clamping pos between 0 and len.
  9. Let searchLength be the length of searchStr.
  10. If searchLength = 0, return true.
  11. Let start be end - searchLength.
  12. If start < 0, return false.
  13. Let substring be the substring of S from start to end.
  14. Return SameValueNonNumeric(substring, searchStr).
Note 1

Returns true if the sequence of code units of searchString converted to a String is the same as the corresponding code units of this object (converted to a String) starting at endPosition - length(this). Otherwise returns false.

Note 2

Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

Note 3

The endsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.8 String.prototype.includes ( searchString [ , position ] )

The includes method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let isRegExp be ? IsRegExp(searchString).
  4. If isRegExp is true, throw a TypeError exception.
  5. Let searchStr be ? ToString(searchString).
  6. Let pos be ? ToIntegerOrInfinity(position).
  7. Assert: If position is undefined, then pos is 0.
  8. Let len be the length of S.
  9. Let start be the result of clamping pos between 0 and len.
  10. Let index be StringIndexOf(S, searchStr, start).
  11. If index is not -1, return true.
  12. Return false.
Note 1

If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, return true; otherwise, returns false. If position is undefined, 0 is assumed, so as to search all of the String.

Note 2

Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

Note 3

The includes function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.9 String.prototype.indexOf ( searchString [ , position ] )

Note 1

If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, then the smallest such index is returned; otherwise, -1𝔽 is returned. If position is undefined, +0𝔽 is assumed, so as to search all of the String.

The indexOf method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let searchStr be ? ToString(searchString).
  4. Let pos be ? ToIntegerOrInfinity(position).
  5. Assert: If position is undefined, then pos is 0.
  6. Let len be the length of S.
  7. Let start be the result of clamping pos between 0 and len.
  8. Return 𝔽(StringIndexOf(S, searchStr, start)).
Note 2

The indexOf function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.10 String.prototype.lastIndexOf ( searchString [ , position ] )

Note 1

If searchString appears as a substring of the result of converting this object to a String at one or more indices that are smaller than or equal to position, then the greatest such index is returned; otherwise, -1𝔽 is returned. If position is undefined, the length of the String value is assumed, so as to search all of the String.

The lastIndexOf method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let searchStr be ? ToString(searchString).
  4. Let numPos be ? ToNumber(position).
  5. Assert: If position is undefined, then numPos is NaN.
  6. If numPos is NaN, let pos be +∞; otherwise, let pos be ! ToIntegerOrInfinity(numPos).
  7. Let len be the length of S.
  8. Let start be the result of clamping pos between 0 and len.
  9. If searchStr is the empty String, return 𝔽(start).
  10. Let searchLen be the length of searchStr.
  11. For each non-negative integer i starting with start such that ilen - searchLen, in descending order, do
    1. Let candidate be the substring of S from i to i + searchLen.
    2. If candidate is the same sequence of code units as searchStr, return 𝔽(i).
  12. Return -1𝔽.
Note 2

The lastIndexOf function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.11 String.prototype.localeCompare ( that [ , reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the localeCompare method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the localeCompare method is used.

When the localeCompare method is called with argument that, it returns a Number other than NaN representing the result of an implementation-defined locale-sensitive String comparison of the this value (converted to a String S) with that (converted to a String thatValue). The result is intended to correspond with a sort order of String values according to conventions of the host environment's current locale, and will be negative when S is ordered before thatValue, positive when S is ordered after thatValue, and zero in all other cases (representing no relative ordering between S and thatValue).

Before performing the comparisons, the following steps are performed to prepare the Strings:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let thatValue be ? ToString(that).

The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.

The actual return values are implementation-defined to permit encoding additional information in them, but this method, when considered as a function of two arguments, is required to be a consistent comparator defining a total ordering on the set of all Strings. This method is also required to recognize and honour canonical equivalence according to the Unicode Standard, including returning 0 when comparing distinguishable Strings that are canonically equivalent.

Note 1

The localeCompare method itself is not directly suitable as an argument to Array.prototype.sort because the latter requires a function of two arguments.

Note 2

This method may rely on whatever language- and/or locale-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and is intended to compare according to the conventions of the host environment's current locale. However, regardless of comparison capabilities, this method must recognize and honour canonical equivalence according to the Unicode Standard—for example, the following comparisons must all return 0:

// Å ANGSTROM SIGN vs.
// Å LATIN CAPITAL LETTER A + COMBINING RING ABOVE
"\u212B".localeCompare("A\u030A")

// Ω OHM SIGN vs.
// Ω GREEK CAPITAL LETTER OMEGA
"\u2126".localeCompare("\u03A9")

// ṩ LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE vs.
// ṩ LATIN SMALL LETTER S + COMBINING DOT ABOVE + COMBINING DOT BELOW
"\u1E69".localeCompare("s\u0307\u0323")

// ḍ̇ LATIN SMALL LETTER D WITH DOT ABOVE + COMBINING DOT BELOW vs.
// ḍ̇ LATIN SMALL LETTER D WITH DOT BELOW + COMBINING DOT ABOVE
"\u1E0B\u0323".localeCompare("\u1E0D\u0307")

// 가 HANGUL CHOSEONG KIYEOK + HANGUL JUNGSEONG A
// 가 HANGUL SYLLABLE GA
"\u1100\u1161".localeCompare("\uAC00")

For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms and Unicode Technical Note #5, Canonical Equivalence in Applications. Also see Unicode Technical Standard #10, Unicode Collation Algorithm.

It is recommended that this method should not honour Unicode compatibility equivalents or compatibility decompositions as defined in the Unicode Standard, chapter 3, section 3.7.

Note 3

The localeCompare function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.12 String.prototype.match ( regexp )

When the match method is called with argument regexp, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If regexp is neither undefined nor null, then
    1. Let matcher be ? GetMethod(regexp, @@match).
    2. If matcher is not undefined, then
      1. Return ? Call(matcher, regexp, « O »).
  3. Let S be ? ToString(O).
  4. Let rx be ? RegExpCreate(regexp, undefined).
  5. Return ? Invoke(rx, @@match, « S »).
Note

The match function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.13 String.prototype.matchAll ( regexp )

Performs a regular expression match of the String representing the this value against regexp and returns an iterator. Each iteration result's value is an Array containing the results of the match, or null if the String did not match.

When the matchAll method is called, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If regexp is neither undefined nor null, then
    1. Let isRegExp be ? IsRegExp(regexp).
    2. If isRegExp is true, then
      1. Let flags be ? Get(regexp, "flags").
      2. Perform ? RequireObjectCoercible(flags).
      3. If ? ToString(flags) does not contain "g", throw a TypeError exception.
    3. Let matcher be ? GetMethod(regexp, @@matchAll).
    4. If matcher is not undefined, then
      1. Return ? Call(matcher, regexp, « O »).
  3. Let S be ? ToString(O).
  4. Let rx be ? RegExpCreate(regexp, "g").
  5. Return ? Invoke(rx, @@matchAll, « S »).
Note 1
The matchAll function is intentionally generic, it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.
Note 2
Similarly to String.prototype.split, String.prototype.matchAll is designed to typically act without mutating its inputs.

22.1.3.14 String.prototype.normalize ( [ form ] )

When the normalize method is called with one argument form, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. If form is undefined, let f be "NFC".
  4. Else, let f be ? ToString(form).
  5. If f is not one of "NFC", "NFD", "NFKC", or "NFKD", throw a RangeError exception.
  6. Let ns be the String value that is the result of normalizing S into the normalization form named by f as specified in https://unicode.org/reports/tr15/.
  7. Return ns.
Note

The normalize function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

22.1.3.15 String.prototype.padEnd ( maxLength [ , fillString ] )

When the padEnd method is called, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Return ? StringPad(O, maxLength, fillString, end).

22.1.3.16 String.prototype.padStart ( maxLength [ , fillString ] )

When the padStart method is called, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Return ? StringPad(O, maxLength, fillString, start).

22.1.3.16.1 StringPad ( O, maxLength, fillString, placement )

The abstract operation StringPad takes arguments O (an ECMAScript language value), maxLength (an ECMAScript language value), fillString (an ECMAScript language value), and placement (start or end) and returns either a normal completion containing a String or an abrupt completion. It performs the following steps when called:

  1. Let S be ? ToString(O).
  2. Let intMaxLength be (? ToLength(maxLength)).
  3. Let stringLength be the length of S.
  4. If intMaxLengthstringLength, return S.
  5. If fillString is undefined, let filler be the String value consisting solely of the code unit 0x0020 (SPACE).
  6. Else, let filler be ? ToString(fillString).
  7. If filler is the empty String, return S.
  8. Let fillLen be intMaxLength - stringLength.
  9. Let truncatedStringFiller be the String value consisting of repeated concatenations of filler truncated to length fillLen.
  10. If placement is start, return the string-concatenation of truncatedStringFiller and S.
  11. Else, return the string-concatenation of S and truncatedStringFiller.
Note 1

The argument maxLength will be clamped such that it can be no smaller than the length of S.

Note 2

The argument fillString defaults to " " (the String value consisting of the code unit 0x0020 SPACE).

22.1.3.16.2 ToZeroPaddedDecimalString ( n, minLength )

The abstract operation ToZeroPaddedDecimalString takes arguments n (a non-negative integer) and minLength (a non-negative integer) and returns a String. It performs the following steps when called:

  1. Let S be the String representation of n, formatted as a decimal number.
  2. Return ! StringPad(S, 𝔽(minLength), "0", start).

22.1.3.17 String.prototype.repeat ( count )

The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let n be ? ToIntegerOrInfinity(count).
  4. If n < 0 or n is +∞, throw a RangeError exception.
  5. If n is 0, return the empty String.
  6. Return the String value that is made from n copies of S appended together.
Note 1

This method creates the String value consisting of the code units of the this value (converted to String) repeated count times.

Note 2

The repeat function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.18 String.prototype.replace ( searchValue, replaceValue )

When the replace method is called with arguments searchValue and replaceValue, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If searchValue is neither undefined nor null, then
    1. Let replacer be ? GetMethod(searchValue, @@replace).
    2. If replacer is not undefined, then
      1. Return ? Call(replacer, searchValue, « O, replaceValue »).
  3. Let string be ? ToString(O).
  4. Let searchString be ? ToString(searchValue).
  5. Let functionalReplace be IsCallable(replaceValue).
  6. If functionalReplace is false, then
    1. Set replaceValue to ? ToString(replaceValue).
  7. Let searchLength be the length of searchString.
  8. Let position be StringIndexOf(string, searchString, 0).
  9. If position is -1, return string.
  10. Let preceding be the substring of string from 0 to position.
  11. Let following be the substring of string from position + searchLength.
  12. If functionalReplace is true, then
    1. Let replacement be ? ToString(? Call(replaceValue, undefined, « searchString, 𝔽(position), string »)).
  13. Else,
    1. Assert: Type(replaceValue) is String.
    2. Let captures be a new empty List.
    3. Let replacement be ! GetSubstitution(searchString, string, position, captures, undefined, replaceValue).
  14. Return the string-concatenation of preceding, replacement, and following.
Note

The replace function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.18.1 GetSubstitution ( matched, str, position, captures, namedCaptures, replacementTemplate )

The abstract operation GetSubstitution takes arguments matched (a String), str (a String), position (a non-negative integer), captures (a possibly empty List, each of whose elements is a String or undefined), namedCaptures (an Object or undefined), and replacementTemplate (a String) and returns either a normal completion containing a String or an abrupt completion. For the purposes of this abstract operation, a decimal digit is a code unit in the range 0x0030 (DIGIT ZERO) to 0x0039 (DIGIT NINE) inclusive. It performs the following steps when called:

  1. Let stringLength be the number of code units in str.
  2. Assert: positionstringLength.
  3. Let templateRemainder be replacementTemplate.
  4. Let result be the empty String.
  5. Repeat, while templateRemainder is not the empty String,
    1. NOTE: The following steps isolate ref (a prefix of templateRemainder), determine refReplacement (its replacement), and then append that replacement to result.
    2. If templateRemainder starts with "$$", then
      1. Let ref be "$$".
      2. Let refReplacement be "$".
    3. Else if templateRemainder starts with "$`", then
      1. Let ref be "$`".
      2. Let refReplacement be the substring of str from 0 to position.
    4. Else if templateRemainder starts with "$&", then
      1. Let ref be "$&".
      2. Let refReplacement be matched.
    5. Else if templateRemainder starts with "$'" (0x0024 (DOLLAR SIGN) followed by 0x0027 (APOSTROPHE)), then
      1. Let ref be "$'".
      2. Let matchLength be the number of code units in matched.
      3. Let tailPos be position + matchLength.
      4. Let refReplacement be the substring of str from min(tailPos, stringLength).
      5. NOTE: tailPos can exceed stringLength only if this abstract operation was invoked by a call to the intrinsic @@replace method of %RegExp.prototype% on an object whose "exec" property is not the intrinsic %RegExp.prototype.exec%.
    6. Else if templateRemainder starts with "$" followed by 1 or more decimal digits, then
      1. Let found be false.
      2. For each integer d of « 2, 1 », do
        1. If found is false and templateRemainder starts with "$" followed by d or more decimal digits, then
          1. Set found to true.
          2. Let ref be the substring of templateRemainder from 0 to 1 + d.
          3. Let digits be the substring of templateRemainder from 1 to 1 + d.
          4. Let index be (StringToNumber(digits)).
          5. Assert: 0 ≤ index ≤ 99.
          6. If index = 0, then
            1. Let refReplacement be ref.
          7. Else if index ≤ the number of elements in captures, then
            1. Let capture be captures[index - 1].
            2. If capture is undefined, then
              1. Let refReplacement be the empty String.
            3. Else,
              1. Let refReplacement be capture.
          8. Else,
            1. Let refReplacement be ref.
    7. Else if templateRemainder starts with "$<", then
      1. Let gtPos be StringIndexOf(templateRemainder, ">", 0).
      2. If gtPos = -1 or namedCaptures is undefined, then
        1. Let ref be "$<".
        2. Let refReplacement be ref.
      3. Else,
        1. Let ref be the substring of templateRemainder from 0 to gtPos + 1.
        2. Let groupName be the substring of templateRemainder from 2 to gtPos.
        3. Assert: Type(namedCaptures) is Object.
        4. Let capture be ? Get(namedCaptures, groupName).
        5. If capture is undefined, then
          1. Let refReplacement be the empty String.
        6. Else,
          1. Let refReplacement be ? ToString(capture).
    8. Else,
      1. Let ref be the substring of templateRemainder from 0 to 1.
      2. Let refReplacement be ref.
    9. Let refLength be the number of code units in ref.
    10. Set templateRemainder to the substring of templateRemainder from refLength.
    11. Set result to the string-concatenation of result and refReplacement.
  6. Return result.

22.1.3.19 String.prototype.replaceAll ( searchValue, replaceValue )

When the replaceAll method is called with arguments searchValue and replaceValue, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If searchValue is neither undefined nor null, then
    1. Let isRegExp be ? IsRegExp(searchValue).
    2. If isRegExp is true, then
      1. Let flags be ? Get(searchValue, "flags").
      2. Perform ? RequireObjectCoercible(flags).
      3. If ? ToString(flags) does not contain "g", throw a TypeError exception.
    3. Let replacer be ? GetMethod(searchValue, @@replace).
    4. If replacer is not undefined, then
      1. Return ? Call(replacer, searchValue, « O, replaceValue »).
  3. Let string be ? ToString(O).
  4. Let searchString be ? ToString(searchValue).
  5. Let functionalReplace be IsCallable(replaceValue).
  6. If functionalReplace is false, then
    1. Set replaceValue to ? ToString(replaceValue).
  7. Let searchLength be the length of searchString.
  8. Let advanceBy be max(1, searchLength).
  9. Let matchPositions be a new empty List.
  10. Let position be StringIndexOf(string, searchString, 0).
  11. Repeat, while position is not -1,
    1. Append position to the end of matchPositions.
    2. Set position to StringIndexOf(string, searchString, position + advanceBy).
  12. Let endOfLastMatch be 0.
  13. Let result be the empty String.
  14. For each element p of matchPositions, do
    1. Let preserved be the substring of string from endOfLastMatch to p.
    2. If functionalReplace is true, then
      1. Let replacement be ? ToString(? Call(replaceValue, undefined, « searchString, 𝔽(p), string »)).
    3. Else,
      1. Assert: Type(replaceValue) is String.
      2. Let captures be a new empty List.
      3. Let replacement be ! GetSubstitution(searchString, string, p, captures, undefined, replaceValue).
    4. Set result to the string-concatenation of result, preserved, and replacement.
    5. Set endOfLastMatch to p + searchLength.
  15. If endOfLastMatch < the length of string, then
    1. Set result to the string-concatenation of result and the substring of string from endOfLastMatch.
  16. Return result.

22.1.3.20 String.prototype.search ( regexp )

When the search method is called with argument regexp, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If regexp is neither undefined nor null, then
    1. Let searcher be ? GetMethod(regexp, @@search).
    2. If searcher is not undefined, then
      1. Return ? Call(searcher, regexp, « O »).
  3. Let string be ? ToString(O).
  4. Let rx be ? RegExpCreate(regexp, undefined).
  5. Return ? Invoke(rx, @@search, « string »).
Note

The search function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.21 String.prototype.slice ( start, end )

The slice method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end (or through the end of the String if end is undefined). If start is negative, it is treated as sourceLength + start where sourceLength is the length of the String. If end is negative, it is treated as sourceLength + end where sourceLength is the length of the String. The result is a String value, not a String object. The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let len be the length of S.
  4. Let intStart be ? ToIntegerOrInfinity(start).
  5. If intStart is -∞, let from be 0.
  6. Else if intStart < 0, let from be max(len + intStart, 0).
  7. Else, let from be min(intStart, len).
  8. If end is undefined, let intEnd be len; else let intEnd be ? ToIntegerOrInfinity(end).
  9. If intEnd is -∞, let to be 0.
  10. Else if intEnd < 0, let to be max(len + intEnd, 0).
  11. Else, let to be min(intEnd, len).
  12. If fromto, return the empty String.
  13. Return the substring of S from from to to.
Note

The slice function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

22.1.3.22 String.prototype.split ( separator, limit )

Returns an Array into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any String in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a @@split method.

When the split method is called, the following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. If separator is neither undefined nor null, then
    1. Let splitter be ? GetMethod(separator, @@split).
    2. If splitter is not undefined, then
      1. Return ? Call(splitter, separator, « O, limit »).
  3. Let S be ? ToString(O).
  4. If limit is undefined, let lim be 232 - 1; else let lim be (? ToUint32(limit)).
  5. Let R be ? ToString(separator).
  6. If lim = 0, then
    1. Return CreateArrayFromList(« »).
  7. If separator is undefined, then
    1. Return CreateArrayFromListS »).
  8. Let separatorLength be the length of R.
  9. If separatorLength is 0, then
    1. Let head be the substring of S from 0 to lim.
    2. Let codeUnits be a List consisting of the sequence of code units that are the elements of head.
    3. Return CreateArrayFromList(codeUnits).
  10. If S is the empty String, return CreateArrayFromListS »).
  11. Let substrings be a new empty List.
  12. Let i be 0.
  13. Let j be StringIndexOf(S, R, 0).
  14. Repeat, while j is not -1,
    1. Let T be the substring of S from i to j.
    2. Append T as the last element of substrings.
    3. If the number of elements of substrings is lim, return CreateArrayFromList(substrings).
    4. Set i to j + separatorLength.
    5. Set j to StringIndexOf(S, R, i).
  15. Let T be the substring of S from i.
  16. Append T to substrings.
  17. Return CreateArrayFromList(substrings).
Note 1

The value of separator may be an empty String. In this case, separator does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. If separator is the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.

If the this value is (or converts to) the empty String, the result depends on whether separator can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.

If separator is undefined, then the result array contains just one String, which is the this value (converted to a String). If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.

Note 2

The split function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.23 String.prototype.startsWith ( searchString [ , position ] )

The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let isRegExp be ? IsRegExp(searchString).
  4. If isRegExp is true, throw a TypeError exception.
  5. Let searchStr be ? ToString(searchString).
  6. Let len be the length of S.
  7. If position is undefined, let pos be 0; else let pos be ? ToIntegerOrInfinity(position).
  8. Let start be the result of clamping pos between 0 and len.
  9. Let searchLength be the length of searchStr.
  10. If searchLength = 0, return true.
  11. Let end be start + searchLength.
  12. If end > len, return false.
  13. Let substring be the substring of S from start to end.
  14. Return SameValueNonNumeric(substring, searchStr).
Note 1

This method returns true if the sequence of code units of searchString converted to a String is the same as the corresponding code units of this object (converted to a String) starting at index position. Otherwise returns false.

Note 2

Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

Note 3

The startsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.24 String.prototype.substring ( start, end )

The substring method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end of the String (or through the end of the String if end is undefined). The result is a String value, not a String object.

If either argument is NaN or negative, it is replaced with zero; if either argument is larger than the length of the String, it is replaced with the length of the String.

If start is larger than end, they are swapped.

The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let len be the length of S.
  4. Let intStart be ? ToIntegerOrInfinity(start).
  5. If end is undefined, let intEnd be len; else let intEnd be ? ToIntegerOrInfinity(end).
  6. Let finalStart be the result of clamping intStart between 0 and len.
  7. Let finalEnd be the result of clamping intEnd between 0 and len.
  8. Let from be min(finalStart, finalEnd).
  9. Let to be max(finalStart, finalEnd).
  10. Return the substring of S from from to to.
Note

The substring function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.25 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleLowerCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleLowerCase method is used.

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function works exactly the same as toLowerCase except that it is intended to yield a locale-sensitive result corresponding with conventions of the host environment's current locale. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.

Note

The toLocaleLowerCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.26 String.prototype.toLocaleUpperCase ( [ reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleUpperCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleUpperCase method is used.

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function works exactly the same as toUpperCase except that it is intended to yield a locale-sensitive result corresponding with conventions of the host environment's current locale. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.

Note

The toLocaleUpperCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.27 String.prototype.toLowerCase ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4. The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let S be ? ToString(O).
  3. Let sText be StringToCodePoints(S).
  4. Let lowerText be the result of toLowercase(sText), according to the Unicode Default Case Conversion algorithm.
  5. Let L be CodePointsToString(lowerText).
  6. Return L.

The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the file UnicodeData.txt, but also all locale-insensitive mappings in the file SpecialCasing.txt that accompanies it).

Note 1

The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both toUpperCase and toLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().

Note 2

The toLowerCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.28 String.prototype.toString ( )

When the toString method is called, the following steps are taken:

  1. Return ? thisStringValue(this value).
Note

For a String object, the toString method happens to return the same thing as the valueOf method.

22.1.3.29 String.prototype.toUpperCase ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function behaves in exactly the same way as String.prototype.toLowerCase, except that the String is mapped using the toUppercase algorithm of the Unicode Default Case Conversion.

Note

The toUpperCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.30 String.prototype.trim ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

The following steps are taken:

  1. Let S be the this value.
  2. Return ? TrimString(S, start+end).
Note

The trim function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.30.1 TrimString ( string, where )

The abstract operation TrimString takes arguments string (an ECMAScript language value) and where (start, end, or start+end) and returns either a normal completion containing a String or an abrupt completion. It interprets string as a sequence of UTF-16 encoded code points, as described in 6.1.4. It performs the following steps when called:

  1. Let str be ? RequireObjectCoercible(string).
  2. Let S be ? ToString(str).
  3. If where is start, let T be the String value that is a copy of S with leading white space removed.
  4. Else if where is end, let T be the String value that is a copy of S with trailing white space removed.
  5. Else,
    1. Assert: where is start+end.
    2. Let T be the String value that is a copy of S with both leading and trailing white space removed.
  6. Return T.

The definition of white space is the union of WhiteSpace and LineTerminator. When determining whether a Unicode code point is in Unicode general category “Space_Separator” (“Zs”), code unit sequences are interpreted as UTF-16 encoded code point sequences as specified in 6.1.4.

22.1.3.31 String.prototype.trimEnd ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

The following steps are taken:

  1. Let S be the this value.
  2. Return ? TrimString(S, end).
Note

The trimEnd function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.32 String.prototype.trimStart ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

The following steps are taken:

  1. Let S be the this value.
  2. Return ? TrimString(S, start).
Note

The trimStart function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

22.1.3.33 String.prototype.valueOf ( )

When the valueOf method is called, the following steps are taken:

  1. Return ? thisStringValue(this value).

22.1.3.34 String.prototype [ @@iterator ] ( )

When the @@iterator method is called it returns an Iterator object (27.1.1.2) that iterates over the code points of a String value, returning each code point as a String value. The following steps are taken:

  1. Let O be ? RequireObjectCoercible(this value).
  2. Let s be ? ToString(O).
  3. Let closure be a new Abstract Closure with no parameters that captures s and performs the following steps when called:
    1. Let position be 0.
    2. Let len be the length of s.
    3. Repeat, while position < len,
      1. Let cp be CodePointAt(s, position).
      2. Let nextIndex be position + cp.[[CodeUnitCount]].
      3. Let resultString be the substring of s from position to nextIndex.
      4. Set position to nextIndex.
      5. Perform ? GeneratorYield(CreateIterResultObject(resultString, false)).
    4. Return undefined.
  4. Return CreateIteratorFromClosure(closure, "%StringIteratorPrototype%", %StringIteratorPrototype%).

The value of the "name" property of this function is "[Symbol.iterator]".

22.1.4 Properties of String Instances

String instances are String exotic objects and have the internal methods specified for such objects. String instances inherit properties from the String prototype object. String instances also have a [[StringData]] internal slot.

String instances have a "length" property, and a set of enumerable properties with integer-indexed names.

22.1.4.1 length

The number of elements in the String value represented by this String object.

Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

22.1.5 String Iterator Objects

A String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named constructor for String Iterator objects. Instead, String iterator objects are created by calling certain methods of String instance objects.

22.1.5.1 The %StringIteratorPrototype% Object

The %StringIteratorPrototype% object:

  • has properties that are inherited by all String Iterator Objects.
  • is an ordinary object.
  • has a [[Prototype]] internal slot whose value is %IteratorPrototype%.
  • has the following properties:

22.1.5.1.1 %StringIteratorPrototype%.next ( )

  1. Return ? GeneratorResume(this value, empty, "%StringIteratorPrototype%").

22.1.5.1.2 %StringIteratorPrototype% [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "String Iterator".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.

22.2 RegExp (Regular Expression) Objects

A RegExp object contains a regular expression and the associated flags.

Note

The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.

22.2.1 Patterns

The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of Pattern.

Syntax

Pattern[UnicodeMode, N] :: Disjunction[?UnicodeMode, ?N] Disjunction[UnicodeMode, N] :: Alternative[?UnicodeMode, ?N] Alternative[?UnicodeMode, ?N] | Disjunction[?UnicodeMode, ?N] Alternative[UnicodeMode, N] :: [empty] Alternative[?UnicodeMode, ?N] Term[?UnicodeMode, ?N] Term[UnicodeMode, N] :: Assertion[?UnicodeMode, ?N] Atom[?UnicodeMode, ?N] Atom[?UnicodeMode, ?N] Quantifier Assertion[UnicodeMode, N] :: ^ $ \ b \ B ( ? = Disjunction[?UnicodeMode, ?N] ) ( ? ! Disjunction[?UnicodeMode, ?N] ) ( ? <= Disjunction[?UnicodeMode, ?N] ) ( ? <! Disjunction[?UnicodeMode, ?N] ) Quantifier :: QuantifierPrefix QuantifierPrefix ? QuantifierPrefix :: * + ? { DecimalDigits[~Sep] } { DecimalDigits[~Sep] , } { DecimalDigits[~Sep] , DecimalDigits[~Sep] } Atom[UnicodeMode, N] :: PatternCharacter . \ AtomEscape[?UnicodeMode, ?N] CharacterClass[?UnicodeMode] ( GroupSpecifier[?UnicodeMode] Disjunction[?UnicodeMode, ?N] ) ( ? : Disjunction[?UnicodeMode, ?N] ) SyntaxCharacter :: one of ^ $ \ . * + ? ( ) [ ] { } | PatternCharacter :: SourceCharacter but not SyntaxCharacter AtomEscape[UnicodeMode, N] :: DecimalEscape CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] [+N] k GroupName[?UnicodeMode] CharacterEscape[UnicodeMode] :: ControlEscape c ControlLetter 0 [lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?UnicodeMode] IdentityEscape[?UnicodeMode] ControlEscape :: one of f n r t v ControlLetter :: one of a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z GroupSpecifier[UnicodeMode] :: [empty] ? GroupName[?UnicodeMode] GroupName[UnicodeMode] :: < RegExpIdentifierName[?UnicodeMode] > RegExpIdentifierName[UnicodeMode] :: RegExpIdentifierStart[?UnicodeMode] RegExpIdentifierName[?UnicodeMode] RegExpIdentifierPart[?UnicodeMode] RegExpIdentifierStart[UnicodeMode] :: IdentifierStartChar \ RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart[UnicodeMode] :: IdentifierPartChar \ RegExpUnicodeEscapeSequence[+UnicodeMode] [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpUnicodeEscapeSequence[UnicodeMode] :: [+UnicodeMode] u HexLeadSurrogate \u HexTrailSurrogate [+UnicodeMode] u HexLeadSurrogate [+UnicodeMode] u HexTrailSurrogate [+UnicodeMode] u HexNonSurrogate [~UnicodeMode] u Hex4Digits [+UnicodeMode] u{ CodePoint } UnicodeLeadSurrogate :: any Unicode code point in the inclusive range 0xD800 to 0xDBFF UnicodeTrailSurrogate :: any Unicode code point in the inclusive range 0xDC00 to 0xDFFF

Each \u HexTrailSurrogate for which the choice of associated u HexLeadSurrogate is ambiguous shall be associated with the nearest possible u HexLeadSurrogate that would otherwise have no corresponding \u HexTrailSurrogate.

HexLeadSurrogate :: Hex4Digits but only if the MV of Hex4Digits is in the inclusive range 0xD800 to 0xDBFF HexTrailSurrogate :: Hex4Digits but only if the MV of Hex4Digits is in the inclusive range 0xDC00 to 0xDFFF HexNonSurrogate :: Hex4Digits but only if the MV of Hex4Digits is not in the inclusive range 0xD800 to 0xDFFF IdentityEscape[UnicodeMode] :: [+UnicodeMode] SyntaxCharacter [+UnicodeMode] / [~UnicodeMode] SourceCharacter but not UnicodeIDContinue DecimalEscape :: NonZeroDigit DecimalDigits[~Sep]opt [lookahead ∉ DecimalDigit] CharacterClassEscape[UnicodeMode] :: d D s S w W [+UnicodeMode] p{ UnicodePropertyValueExpression } [+UnicodeMode] P{ UnicodePropertyValueExpression } UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue LoneUnicodePropertyNameOrValue UnicodePropertyName :: UnicodePropertyNameCharacters UnicodePropertyNameCharacters :: UnicodePropertyNameCharacter UnicodePropertyNameCharactersopt UnicodePropertyValue :: UnicodePropertyValueCharacters LoneUnicodePropertyNameOrValue :: UnicodePropertyValueCharacters UnicodePropertyValueCharacters :: UnicodePropertyValueCharacter UnicodePropertyValueCharactersopt UnicodePropertyValueCharacter :: UnicodePropertyNameCharacter DecimalDigit UnicodePropertyNameCharacter :: ControlLetter _ CharacterClass[UnicodeMode] :: [ [lookahead ≠ ^] ClassRanges[?UnicodeMode] ] [ ^ ClassRanges[?UnicodeMode] ] ClassRanges[UnicodeMode] :: [empty] NonemptyClassRanges[?UnicodeMode] NonemptyClassRanges[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtom[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtom[?UnicodeMode] - ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] NonemptyClassRangesNoDash[UnicodeMode] :: ClassAtom[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] NonemptyClassRangesNoDash[?UnicodeMode] ClassAtomNoDash[?UnicodeMode] - ClassAtom[?UnicodeMode] ClassRanges[?UnicodeMode] ClassAtom[UnicodeMode] :: - ClassAtomNoDash[?UnicodeMode] ClassAtomNoDash[UnicodeMode] :: SourceCharacter but not one of \ or ] or - \ ClassEscape[?UnicodeMode] ClassEscape[UnicodeMode] :: b [+UnicodeMode] - CharacterClassEscape[?UnicodeMode] CharacterEscape[?UnicodeMode] Note

A number of productions in this section are given alternative definitions in section B.1.2.

22.2.1.1 Static Semantics: Early Errors

Note

This section is amended in B.1.2.1.

Pattern :: Disjunction QuantifierPrefix :: { DecimalDigits , DecimalDigits } AtomEscape :: k GroupName AtomEscape :: DecimalEscape NonemptyClassRanges :: ClassAtom - ClassAtom ClassRanges NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassRanges RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue
  • It is a Syntax Error if the List of Unicode code points that is SourceText of LoneUnicodePropertyNameOrValue is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 68, nor a binary property or binary property alias listed in the “Property name and aliases” column of Table 67.

22.2.1.2 Static Semantics: CapturingGroupNumber

The syntax-directed operation CapturingGroupNumber takes no arguments and returns a positive integer.

Note

This section is amended in B.1.2.1.

It is defined piecewise over the following productions:

DecimalEscape :: NonZeroDigit
  1. Return the MV of NonZeroDigit.
DecimalEscape :: NonZeroDigit DecimalDigits
  1. Let n be the number of code points in DecimalDigits.
  2. Return (the MV of NonZeroDigit × 10n plus the MV of DecimalDigits).

The definitions of “the MV of NonZeroDigit” and “the MV of DecimalDigits” are in 12.8.3.

22.2.1.3 Static Semantics: IsCharacterClass

The syntax-directed operation IsCharacterClass takes no arguments and returns a Boolean.

Note

This section is amended in B.1.2.2.

It is defined piecewise over the following productions:

ClassAtom :: - ClassAtomNoDash :: SourceCharacter but not one of \ or ] or - ClassEscape :: b - CharacterEscape
  1. Return false.
ClassEscape :: CharacterClassEscape
  1. Return true.

22.2.1.4 Static Semantics: CharacterValue

The syntax-directed operation CharacterValue takes no arguments and returns a non-negative integer.

Note 1

This section is amended in B.1.2.3.

It is defined piecewise over the following productions:

ClassAtom :: -
  1. Return the numeric value of U+002D (HYPHEN-MINUS).
ClassAtomNoDash :: SourceCharacter but not one of \ or ] or -
  1. Let ch be the code point matched by SourceCharacter.
  2. Return the numeric value of ch.
ClassEscape :: b
  1. Return the numeric value of U+0008 (BACKSPACE).
ClassEscape :: -
  1. Return the numeric value of U+002D (HYPHEN-MINUS).
CharacterEscape :: ControlEscape
  1. Return the numeric value according to Table 65.
Table 65: ControlEscape Code Point Values
ControlEscape Numeric Value Code Point Unicode Name Symbol
t 9 U+0009 CHARACTER TABULATION <HT>
n 10 U+000A LINE FEED (LF) <LF>
v 11 U+000B LINE TABULATION <VT>
f 12 U+000C FORM FEED (FF) <FF>
r 13 U+000D CARRIAGE RETURN (CR) <CR>
CharacterEscape :: c ControlLetter
  1. Let ch be the code point matched by ControlLetter.
  2. Let i be the numeric value of ch.
  3. Return the remainder of dividing i by 32.
CharacterEscape :: 0 [lookahead ∉ DecimalDigit]
  1. Return the numeric value of U+0000 (NULL).
Note 2

\0 represents the <NUL> character and cannot be followed by a decimal digit.

CharacterEscape :: HexEscapeSequence
  1. Return the MV of HexEscapeSequence.
RegExpUnicodeEscapeSequence :: u HexLeadSurrogate \u HexTrailSurrogate
  1. Let lead be the CharacterValue of HexLeadSurrogate.
  2. Let trail be the CharacterValue of HexTrailSurrogate.
  3. Let cp be UTF16SurrogatePairToCodePoint(lead, trail).
  4. Return the numeric value of cp.
RegExpUnicodeEscapeSequence :: u Hex4Digits
  1. Return the MV of Hex4Digits.
RegExpUnicodeEscapeSequence :: u{ CodePoint }
  1. Return the MV of CodePoint.
HexLeadSurrogate :: Hex4Digits HexTrailSurrogate :: Hex4Digits HexNonSurrogate :: Hex4Digits
  1. Return the MV of HexDigits.
CharacterEscape :: IdentityEscape
  1. Let ch be the code point matched by IdentityEscape.
  2. Return the numeric value of ch.

22.2.1.5 Static Semantics: SourceText

The syntax-directed operation SourceText takes no arguments and returns a List of code points. It is defined piecewise over the following productions:

UnicodePropertyNameCharacters :: UnicodePropertyNameCharacter UnicodePropertyNameCharactersopt UnicodePropertyValueCharacters :: UnicodePropertyValueCharacter UnicodePropertyValueCharactersopt
  1. Return the List, in source text order, of Unicode code points in the source text matched by this production.

22.2.1.6 Static Semantics: CapturingGroupName

The syntax-directed operation CapturingGroupName takes no arguments and returns a String. It is defined piecewise over the following productions:

RegExpIdentifierName :: RegExpIdentifierStart RegExpIdentifierName RegExpIdentifierPart
  1. Let idTextUnescaped be RegExpIdentifierCodePoints of RegExpIdentifierName.
  2. Return CodePointsToString(idTextUnescaped).

22.2.1.7 Static Semantics: RegExpIdentifierCodePoints

The syntax-directed operation RegExpIdentifierCodePoints takes no arguments and returns a List of code points. It is defined piecewise over the following productions:

RegExpIdentifierName :: RegExpIdentifierStart
  1. Let cp be RegExpIdentifierCodePoint of RegExpIdentifierStart.
  2. Return « cp ».
RegExpIdentifierName :: RegExpIdentifierName RegExpIdentifierPart
  1. Let cps be RegExpIdentifierCodePoints of the derived RegExpIdentifierName.
  2. Let cp be RegExpIdentifierCodePoint of RegExpIdentifierPart.
  3. Return the list-concatenation of cps and « cp ».

22.2.1.8 Static Semantics: RegExpIdentifierCodePoint

The syntax-directed operation RegExpIdentifierCodePoint takes no arguments and returns a code point. It is defined piecewise over the following productions:

RegExpIdentifierStart :: IdentifierStartChar
  1. Return the code point matched by IdentifierStartChar.
RegExpIdentifierPart :: IdentifierPartChar
  1. Return the code point matched by IdentifierPartChar.
RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence
  1. Return the code point whose numeric value is the CharacterValue of RegExpUnicodeEscapeSequence.
RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
  1. Let lead be the code unit whose numeric value is that of the code point matched by UnicodeLeadSurrogate.
  2. Let trail be the code unit whose numeric value is that of the code point matched by UnicodeTrailSurrogate.
  3. Return UTF16SurrogatePairToCodePoint(lead, trail).

22.2.2 Pattern Semantics

A regular expression pattern is converted into an Abstract Closure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The Abstract Closure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.

A Pattern is either a BMP pattern or a Unicode pattern depending upon whether or not its associated flags contain a u. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (6.1.4). In either context, “character value” means the numeric value of the corresponding non-encoded code point.

The syntax and semantics of Pattern is defined as if the source text for the Pattern was a List of SourceCharacter values where each SourceCharacter corresponds to a Unicode code point. If a BMP pattern contains a non-BMP SourceCharacter the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.

Note

For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character) List consisting of the single code point 0x1D11E. However, interpreted as a BMP pattern, it is first UTF-16 encoded to produce a two element List consisting of the code units 0xD834 and 0xDD1E.

Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16SurrogatePairToCodePoint must be used in producing a List whose sole element is a single pattern character, the code point U+1D11E.

An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.

22.2.2.1 Notation

The descriptions below use the following aliases:

  • Input is a List whose elements are the characters of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation Input[n] means the nth character of Input, where n can range between 0 (inclusive) and InputLength (exclusive).
  • InputLength is the number of characters in Input.
  • NcapturingParens is the total number of left-capturing parentheses (i.e. the total number of Atom :: ( GroupSpecifier Disjunction ) Parse Nodes) in the pattern. A left-capturing parenthesis is any ( pattern character that is matched by the ( terminal of the Atom :: ( GroupSpecifier Disjunction ) production.
  • DotAll is true if the RegExp object's [[OriginalFlags]] internal slot contains "s" and otherwise is false.
  • IgnoreCase is true if the RegExp object's [[OriginalFlags]] internal slot contains "i" and otherwise is false.
  • Multiline is true if the RegExp object's [[OriginalFlags]] internal slot contains "m" and otherwise is false.
  • Unicode is true if the RegExp object's [[OriginalFlags]] internal slot contains "u" and otherwise is false.
  • WordCharacters is the mathematical set that is the union of all sixty-three characters in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_" (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters c for which c is not in that set but Canonicalize(c) is. WordCharacters cannot contain more than sixty-three characters unless Unicode and IgnoreCase are both true.

Furthermore, the descriptions below use the following internal data structures:

  • A CharSet is a mathematical set of characters. When the Unicode flag is true, “all characters” means the CharSet containing all code point values; otherwise “all characters” means the CharSet containing all code unit values.
  • A Range is an ordered pair (startIndex, endIndex) that represents the range of characters included in a capture, where startIndex is an integer representing the start index (inclusive) of the range within Input, and endIndex is an integer representing the end index (exclusive) of the range within Input. For any Range, these indices must satisfy the invariant that startIndexendIndex.
  • A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is a List of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The nth element of captures is either a Range representing the range of characters captured by the nth set of capturing parentheses, or undefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
  • A MatchResult is either a State or the special token failure that indicates that the match failed.
  • A Continuation is an Abstract Closure that takes one State argument and returns a MatchResult result. The Continuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns failure.
  • A Matcher is an Abstract Closure that takes two arguments—a State and a Continuation—and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.

22.2.2.2 Runtime Semantics: CompilePattern

The syntax-directed operation CompilePattern takes no arguments and returns an Abstract Closure that takes a List of characters and a non-negative integer and returns a MatchResult. It is defined piecewise over the following productions:

Pattern :: Disjunction
  1. Let m be CompileSubpattern of Disjunction with argument forward.
  2. Return a new Abstract Closure with parameters (inputChars, index) that captures m and performs the following steps when called:
    1. Assert: inputChars is a List of characters.
    2. Assert: index is a non-negative integer which is ≤ the number of characters in inputChars.
    3. Let Input be inputChars. This alias will be used throughout the algorithms in 22.2.2.
    4. Let InputLength be the number of characters contained in Input. This alias will be used throughout the algorithms in 22.2.2.
    5. Let c be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a State.
      2. Return y.
    6. Let cap be a List of NcapturingParens undefined values, indexed 1 through NcapturingParens.
    7. Let x be the State (index, cap).
    8. Return m(x, c).
Note

A Pattern compiles to an Abstract Closure value. RegExpBuiltinExec can then apply this procedure to a List of characters and an offset within that List to determine whether the pattern would match starting at exactly that offset within the List, and, if it does match, what the values of the capturing parentheses would be. The algorithms in 22.2.2 are designed so that compiling a pattern may throw a SyntaxError exception; on the other hand, once the pattern is successfully compiled, applying the resulting Abstract Closure to find a match in a List of characters cannot throw an exception (except for any implementation-defined exceptions that can occur anywhere such as out-of-memory).

22.2.2.3 Runtime Semantics: CompileSubpattern

The syntax-directed operation CompileSubpattern takes argument direction (forward or backward) and returns a Matcher.

Note 1

This section is amended in B.1.2.4.

It is defined piecewise over the following productions:

Disjunction :: Alternative | Disjunction
  1. Let m1 be CompileSubpattern of Alternative with argument direction.
  2. Let m2 be CompileSubpattern of Disjunction with argument direction.
  3. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let r be m1(x, c).
    4. If r is not failure, return r.
    5. Return m2(x, c).
Note 2

The | regular expression operator separates two alternatives. The pattern first tries to match the left Alternative (followed by the sequel of the regular expression); if it fails, it tries to match the right Disjunction (followed by the sequel of the regular expression). If the left Alternative, the right Disjunction, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left Alternative. If choices in the left Alternative are exhausted, the right Disjunction is tried instead of the left Alternative. Any capturing parentheses inside a portion of the pattern skipped by | produce undefined values instead of Strings. Thus, for example,

/a|ab/.exec("abc")

returns the result "a" and not "ab". Moreover,

/((a)|(ab))((c)|(bc))/.exec("abc")

returns the array

["abc", "a", "a", undefined, "bc", undefined, "bc"]

and not

["abc", "ab", undefined, "ab", "c", "c", undefined]

The order in which the two alternatives are tried is independent of the value of direction.

Alternative :: [empty]
  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Return c(x).
Alternative :: Alternative Term
  1. Let m1 be CompileSubpattern of Alternative with argument direction.
  2. Let m2 be CompileSubpattern of Term with argument direction.
  3. If direction is forward, then
    1. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
      1. Assert: x is a State.
      2. Assert: c is a Continuation.
      3. Let d be a new Continuation with parameters (y) that captures c and m2 and performs the following steps when called:
        1. Assert: y is a State.
        2. Return m2(y, c).
      4. Return m1(x, d).
  4. Else,
    1. Assert: direction is backward.
    2. Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
      1. Assert: x is a State.
      2. Assert: c is a Continuation.
      3. Let d be a new Continuation with parameters (y) that captures c and m1 and performs the following steps when called:
        1. Assert: y is a State.
        2. Return m1(y, c).
      4. Return m2(x, d).
Note 3

Consecutive Terms try to simultaneously match consecutive portions of Input. When direction is forward, if the left Alternative, the right Term, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right Term, and all choices in the right Term are tried before moving on to the next choice in the left Alternative. When direction is backward, the evaluation order of Alternative and Term are reversed.

Term :: Assertion
  1. Return CompileAssertion of Assertion.
Note 4

The resulting Matcher is independent of direction.

Term :: Atom
  1. Return CompileAtom of Atom with argument direction.
Term :: Atom Quantifier
  1. Let m be CompileAtom of Atom with argument direction.
  2. Let q be CompileQuantifier of Quantifier.
  3. Assert: q.[[Min]] ≤ q.[[Max]].
  4. Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this Term. This is the total number of Atom :: ( GroupSpecifier Disjunction ) Parse Nodes prior to or enclosing this Term.
  5. Let parenCount be the number of left-capturing parentheses in Atom. This is the total number of Atom :: ( GroupSpecifier Disjunction ) Parse Nodes enclosed by Atom.
  6. Return a new Matcher with parameters (x, c) that captures m, q, parenIndex, and parenCount and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Return RepeatMatcher(m, q.[[Min]], q.[[Max]], q.[[Greedy]], x, c, parenIndex, parenCount).

22.2.2.3.1 RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )

The abstract operation RepeatMatcher takes arguments m (a Matcher), min (a non-negative integer), max (a non-negative integer or +∞), greedy (a Boolean), x (a State), c (a Continuation), parenIndex (a non-negative integer), and parenCount (a non-negative integer) and returns a MatchResult. It performs the following steps when called:

  1. If max = 0, return c(x).
  2. Let d be a new Continuation with parameters (y) that captures m, min, max, greedy, x, c, parenIndex, and parenCount and performs the following steps when called:
    1. Assert: y is a State.
    2. If min = 0 and y's endIndex = x's endIndex, return failure.
    3. If min = 0, let min2 be 0; otherwise let min2 be min - 1.
    4. If max is +∞, let max2 be +∞; otherwise let max2 be max - 1.
    5. Return RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount).
  3. Let cap be a copy of x's captures List.
  4. For each integer k such that parenIndex < k and kparenIndex + parenCount, set cap[k] to undefined.
  5. Let e be x's endIndex.
  6. Let xr be the State (e, cap).
  7. If min ≠ 0, return m(xr, d).
  8. If greedy is false, then
    1. Let z be c(x).
    2. If z is not failure, return z.
    3. Return m(xr, d).
  9. Let z be m(xr, d).
  10. If z is not failure, return z.
  11. Return c(x).
Note 1

An Atom followed by a Quantifier is repeated the number of times specified by the Quantifier. A Quantifier can be non-greedy, in which case the Atom pattern is repeated as few times as possible while still matching the sequel, or it can be greedy, in which case the Atom pattern is repeated as many times as possible while still matching the sequel. The Atom pattern is repeated rather than the input character sequence that it matches, so different repetitions of the Atom can match different input substrings.

Note 2

If the Atom and the sequel of the regular expression all have choice points, the Atom is first matched as many (or as few, if non-greedy) times as possible. All choices in the sequel are tried before moving on to the next choice in the last repetition of Atom. All choices in the last (nth) repetition of Atom are tried before moving on to the next choice in the next-to-last (n - 1)st repetition of Atom; at which point it may turn out that more or fewer repetitions of Atom are now possible; these are exhausted (again, starting with either as few or as many as possible) before moving on to the next choice in the (n - 1)st repetition of Atom and so on.

Compare

/a[a-z]{2,4}/.exec("abcdefghi")

which returns "abcde" with

/a[a-z]{2,4}?/.exec("abcdefghi")

which returns "abc".

Consider also

/(aa|aabaac|ba|b|c)*/.exec("aabaac")

which, by the choice point ordering above, returns the array

["aaba", "ba"]

and not any of:

["aabaac", "aabaac"]
["aabaac", "c"]

The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:

"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")

which returns the gcd in unary notation "aaaaa".

Note 3

Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated. We can see its behaviour in the regular expression

/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")

which returns the array

["zaacbbbcac", "z", "ac", "a", undefined, "c"]

and not

["zaacbbbcac", "z", "ac", "a", "bbb", "c"]

because each iteration of the outermost * clears all captured Strings contained in the quantified Atom, which in this case includes capture Strings numbered 2, 3, 4, and 5.

Note 4

Step 2.b of the RepeatMatcher states that once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty character sequence are not considered for further repetitions. This prevents the regular expression engine from falling into an infinite loop on patterns such as:

/(a*)*/.exec("b")

or the slightly more complicated:

/(a*)b\1+/.exec("baaaac")

which returns the array

["b", ""]

22.2.2.4 Runtime Semantics: CompileAssertion

The syntax-directed operation CompileAssertion takes no arguments and returns a Matcher.

Note 1

This section is amended in B.1.2.5.

It is defined piecewise over the following productions:

Assertion :: ^
  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let e be x's endIndex.
    4. If e = 0, or if Multiline is true and the character Input[e - 1] is one of LineTerminator, then
      1. Return c(x).
    5. Return failure.
Note 2

Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.

Assertion :: $
  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let e be x's endIndex.
    4. If e = InputLength, or if Multiline is true and the character Input[e] is one of LineTerminator, then
      1. Return c(x).
    5. Return failure.
Assertion :: \ b
  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let e be x's endIndex.
    4. Let a be IsWordChar(e - 1).
    5. Let b be IsWordChar(e).
    6. If a is true and b is false, or if a is false and b is true, return c(x).
    7. Return failure.
Assertion :: \ B
  1. Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let e be x's endIndex.
    4. Let a be IsWordChar(e - 1).
    5. Let b be IsWordChar(e).
    6. If a is true and b is true, or if a is false and b is false, return c(x).
    7. Return failure.
Assertion :: ( ? = Disjunction )
  1. Let m be CompileSubpattern of Disjunction with argument forward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a State.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is failure, return failure.
    6. Let y be r's State.
    7. Let cap be y's captures List.
    8. Let xe be x's endIndex.
    9. Let z be the State (xe, cap).
    10. Return c(z).
Assertion :: ( ? ! Disjunction )
  1. Let m be CompileSubpattern of Disjunction with argument forward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a State.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is not failure, return failure.
    6. Return c(x).
Assertion :: ( ? <= Disjunction )
  1. Let m be CompileSubpattern of Disjunction with argument backward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a State.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is failure, return failure.
    6. Let y be r's State.
    7. Let cap be y's captures List.
    8. Let xe be x's endIndex.
    9. Let z be the State (xe, cap).
    10. Return c(z).
Assertion :: ( ? <! Disjunction )
  1. Let m be CompileSubpattern of Disjunction with argument backward.
  2. Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
      1. Assert: y is a State.
      2. Return y.
    4. Let r be m(x, d).
    5. If r is not failure, return failure.
    6. Return c(x).

22.2.2.4.1 IsWordChar ( e )

The abstract operation IsWordChar takes argument e (an integer) and returns a Boolean. It performs the following steps when called:

  1. If e = -1 or e is InputLength, return false.
  2. Let c be the character Input[e].
  3. If c is in WordCharacters, return true.
  4. Return false.

22.2.2.5 Runtime Semantics: CompileQuantifier

The syntax-directed operation CompileQuantifier takes no arguments and returns a Record with fields [[Min]] (a non-negative integer), [[Max]] (a non-negative integer or +∞), and [[Greedy]] (a Boolean). It is defined piecewise over the following productions:

Quantifier :: QuantifierPrefix
  1. Let qp be CompileQuantifierPrefix of QuantifierPrefix.
  2. Return the Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: true }.
Quantifier :: QuantifierPrefix ?
  1. Let qp be CompileQuantifierPrefix of QuantifierPrefix.
  2. Return the Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: false }.

22.2.2.6 Runtime Semantics: CompileQuantifierPrefix

The syntax-directed operation CompileQuantifierPrefix takes no arguments and returns a Record with fields [[Min]] (a non-negative integer) and [[Max]] (a non-negative integer or +∞). It is defined piecewise over the following productions:

QuantifierPrefix :: *
  1. Return the Record { [[Min]]: 0, [[Max]]: +∞ }.
QuantifierPrefix :: +
  1. Return the Record { [[Min]]: 1, [[Max]]: +∞ }.
QuantifierPrefix :: ?
  1. Return the Record { [[Min]]: 0, [[Max]]: 1 }.
QuantifierPrefix :: { DecimalDigits }
  1. Let i be the MV of DecimalDigits (see 12.8.3).
  2. Return the Record { [[Min]]: i, [[Max]]: i }.
QuantifierPrefix :: { DecimalDigits , }
  1. Let i be the MV of DecimalDigits.
  2. Return the Record { [[Min]]: i, [[Max]]: +∞ }.
QuantifierPrefix :: { DecimalDigits , DecimalDigits }
  1. Let i be the MV of the first DecimalDigits.
  2. Let j be the MV of the second DecimalDigits.
  3. Return the Record { [[Min]]: i, [[Max]]: j }.

22.2.2.7 Runtime Semantics: CompileAtom

The syntax-directed operation CompileAtom takes argument direction (forward or backward) and returns a Matcher.

Note 1

This section is amended in B.1.2.6.

It is defined piecewise over the following productions:

Atom :: PatternCharacter
  1. Let ch be the character matched by PatternCharacter.
  2. Let A be a one-element CharSet containing the character ch.
  3. Return CharacterSetMatcher(A, false, direction).
Atom :: .
  1. Let A be the CharSet of all characters.
  2. If DotAll is not true, then
    1. Remove from A all characters corresponding to a code point on the right-hand side of the LineTerminator production.
  3. Return CharacterSetMatcher(A, false, direction).
Atom :: CharacterClass
  1. Let cc be CompileCharacterClass of CharacterClass.
  2. Return CharacterSetMatcher(cc.[[CharSet]], cc.[[Invert]], direction).
Atom :: ( GroupSpecifier Disjunction )
  1. Let m be CompileSubpattern of Disjunction with argument direction.
  2. Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this Atom. This is the total number of Atom :: ( GroupSpecifier Disjunction ) Parse Nodes prior to or enclosing this Atom.
  3. Return a new Matcher with parameters (x, c) that captures direction, m, and parenIndex and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let d be a new Continuation with parameters (y) that captures x, c, direction, and parenIndex and performs the following steps when called:
      1. Assert: y is a State.
      2. Let cap be a copy of y's captures List.
      3. Let xe be x's endIndex.
      4. Let ye be y's endIndex.
      5. If direction is forward, then
        1. Assert: xeye.
        2. Let r be the Range (xe, ye).
      6. Else,
        1. Assert: direction is backward.
        2. Assert: yexe.
        3. Let r be the Range (ye, xe).
      7. Set cap[parenIndex + 1] to r.
      8. Let z be the State (ye, cap).
      9. Return c(z).
    4. Return m(x, d).
Atom :: ( ? : Disjunction )
  1. Return CompileSubpattern of Disjunction with argument direction.
AtomEscape :: DecimalEscape
  1. Let n be the CapturingGroupNumber of DecimalEscape.
  2. Assert: nNcapturingParens.
  3. Return BackreferenceMatcher(n, direction).
Note 2

An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (22.2.2.1). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.

AtomEscape :: CharacterEscape
  1. Let cv be the CharacterValue of CharacterEscape.
  2. Let ch be the character whose character value is cv.
  3. Let A be a one-element CharSet containing the character ch.
  4. Return CharacterSetMatcher(A, false, direction).
AtomEscape :: CharacterClassEscape
  1. Let A be CompileToCharSet of CharacterClassEscape.
  2. Return CharacterSetMatcher(A, false, direction).
AtomEscape :: k GroupName
  1. Search the enclosing Pattern for an instance of a GroupSpecifier containing a RegExpIdentifierName which has a CapturingGroupName equal to the CapturingGroupName of the RegExpIdentifierName contained in GroupName.
  2. Assert: A unique such GroupSpecifier is found.
  3. Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located GroupSpecifier. This is the total number of Atom :: ( GroupSpecifier Disjunction ) Parse Nodes prior to or enclosing the located GroupSpecifier, including its immediately enclosing Atom.
  4. Return BackreferenceMatcher(parenIndex, direction).

22.2.2.7.1 CharacterSetMatcher ( A, invert, direction )

The abstract operation CharacterSetMatcher takes arguments A (a CharSet), invert (a Boolean), and direction (forward or backward) and returns a Matcher. It performs the following steps when called:

  1. Return a new Matcher with parameters (x, c) that captures A, invert, and direction and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let e be x's endIndex.
    4. If direction is forward, let f be e + 1.
    5. Else, let f be e - 1.
    6. If f < 0 or f > InputLength, return failure.
    7. Let index be min(e, f).
    8. Let ch be the character Input[index].
    9. Let cc be Canonicalize(ch).
    10. If there exists a member a of A such that Canonicalize(a) is cc, let found be true. Otherwise, let found be false.
    11. If invert is false and found is false, return failure.
    12. If invert is true and found is true, return failure.
    13. Let cap be x's captures List.
    14. Let y be the State (f, cap).
    15. Return c(y).

22.2.2.7.2 BackreferenceMatcher ( n, direction )

The abstract operation BackreferenceMatcher takes arguments n (a positive integer) and direction (forward or backward) and returns a Matcher. It performs the following steps when called:

  1. Assert: n ≥ 1.
  2. Return a new Matcher with parameters (x, c) that captures n and direction and performs the following steps when called:
    1. Assert: x is a State.
    2. Assert: c is a Continuation.
    3. Let cap be x's captures List.
    4. Let r be cap[n].
    5. If r is undefined, return c(x).
    6. Let e be x's endIndex.
    7. Let rs be r's startIndex.
    8. Let re be r's endIndex.
    9. Let len be re - rs.
    10. If direction is forward, let f be e + len.
    11. Else, let f be e - len.
    12. If f < 0 or f > InputLength, return failure.
    13. Let g be min(e, f).
    14. If there exists an integer i between 0 (inclusive) and len (exclusive) such that Canonicalize(Input[rs + i]) is not the same character value as Canonicalize(Input[g + i]), return failure.
    15. Let y be the State (f, cap).
    16. Return c(y).

22.2.2.7.3 Canonicalize ( ch )

The abstract operation Canonicalize takes argument ch (a character) and returns a character. It performs the following steps when called:

  1. If Unicode is true and IgnoreCase is true, then
    1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
    2. Return ch.
  2. If IgnoreCase is false, return ch.
  3. Assert: ch is a UTF-16 code unit.
  4. Let cp be the code point whose numeric value is that of ch.
  5. Let u be the result of toUppercase(« cp »), according to the Unicode Default Case Conversion algorithm.
  6. Let uStr be CodePointsToString(u).
  7. If uStr does not consist of a single code unit, return ch.
  8. Let cu be uStr's single code unit element.
  9. If the numeric value of ch ≥ 128 and the numeric value of cu < 128, return ch.
  10. Return cu.
Note 1

Parentheses of the form ( Disjunction ) serve both to group the components of the Disjunction pattern together and to save the result of the match. The result can be used either in a backreference (\ followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching Abstract Closure. To inhibit the capturing behaviour of parentheses, use the form (?: Disjunction ) instead.

Note 2

The form (?= Disjunction ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside Disjunction must match at the current position, but the current position is not advanced before matching the sequel. If Disjunction can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a (?= form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.

For example,

/(?=(a+))/.exec("baaabac")

matches the empty String immediately after the first b and therefore returns the array:

["", "aaa"]

To illustrate the lack of backtracking into the lookahead, consider:

/(?=(a+))a*b\1/.exec("baaabac")

This expression returns

["aba", "a"]

and not:

["aaaba", "a"]
Note 3

The form (?! Disjunction ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside Disjunction must fail to match at the current position. The current position is not advanced before matching the sequel. Disjunction can contain capturing parentheses, but backreferences to them only make sense from within Disjunction itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return undefined because the negative lookahead must fail for the pattern to succeed. For example,

/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")

looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against undefined and therefore always succeeds. The whole expression returns the array:

["baaabaac", "ba", undefined, "abaac"]
Note 4

In case-insignificant matches when Unicode is true, all characters are implicitly case-folded using the simple mapping provided by the Unicode Standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, ß (U+00DF) to SS. It may however map a code point outside the Basic Latin range to a character within, for example, ſ (U+017F) to s. Such characters are not mapped if Unicode is false. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as /[a-z]/i, but they will match /[a-z]/ui.

22.2.2.8 Runtime Semantics: CompileCharacterClass

The syntax-directed operation CompileCharacterClass takes no arguments and returns a Record with fields [[CharSet]] (a CharSet) and [[Invert]] (a Boolean). It is defined piecewise over the following productions:

CharacterClass :: [ ClassRanges ]
  1. Let A be CompileToCharSet of ClassRanges.
  2. Return the Record { [[CharSet]]: A, [[Invert]]: false }.
CharacterClass :: [ ^ ClassRanges ]
  1. Let A be CompileToCharSet of ClassRanges.
  2. Return the Record { [[CharSet]]: A, [[Invert]]: true }.

22.2.2.9 Runtime Semantics: CompileToCharSet

The syntax-directed operation CompileToCharSet takes no arguments and returns a CharSet.

Note 1

This section is amended in B.1.2.7.

It is defined piecewise over the following productions:

ClassRanges :: [empty]
  1. Return the empty CharSet.
NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash
  1. Let A be CompileToCharSet of ClassAtom.
  2. Let B be CompileToCharSet of NonemptyClassRangesNoDash.
  3. Return the union of CharSets A and B.
NonemptyClassRanges :: ClassAtom - ClassAtom ClassRanges
  1. Let A be CompileToCharSet of the first ClassAtom.
  2. Let B be CompileToCharSet of the second ClassAtom.
  3. Let C be CompileToCharSet of ClassRanges.
  4. Let D be CharacterRange(A, B).
  5. Return the union of D and C.
NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash
  1. Let A be CompileToCharSet of ClassAtomNoDash.
  2. Let B be CompileToCharSet of NonemptyClassRangesNoDash.
  3. Return the union of CharSets A and B.
NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassRanges
  1. Let A be CompileToCharSet of ClassAtomNoDash.
  2. Let B be CompileToCharSet of ClassAtom.
  3. Let C be CompileToCharSet of ClassRanges.
  4. Let D be CharacterRange(A, B).
  5. Return the union of D and C.
Note 2

ClassRanges can expand into a single ClassAtom and/or ranges of two ClassAtom separated by dashes. In the latter case the ClassRanges includes all characters between the first ClassAtom and the second ClassAtom, inclusive; an error occurs if either ClassAtom does not represent a single character (for example, if one is \w) or if the first ClassAtom's character value is greater than the second ClassAtom's character value.

Note 3

Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all uppercase and lowercase letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.

Note 4

A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of ClassRanges, the beginning or end limit of a range specification, or immediately follows a range specification.

ClassAtom :: -
  1. Return the CharSet containing the single character - U+002D (HYPHEN-MINUS).
ClassAtomNoDash :: SourceCharacter but not one of \ or ] or -
  1. Return the CharSet containing the character matched by SourceCharacter.
ClassEscape :: b - CharacterEscape
  1. Let cv be the CharacterValue of this ClassEscape.
  2. Let c be the character whose character value is cv.
  3. Return the CharSet containing the single character c.
Note 5

A ClassAtom can use any of the escape sequences that are allowed in the rest of the regular expression except for \b, \B, and backreferences. Inside a CharacterClass, \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a ClassAtom causes an error.

CharacterClassEscape :: d
  1. Return the ten-element CharSet containing the characters 0 through 9 inclusive.
CharacterClassEscape :: D
  1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: d .
CharacterClassEscape :: s
  1. Return the CharSet containing all characters corresponding to a code point on the right-hand side of the WhiteSpace or LineTerminator productions.
CharacterClassEscape :: S
  1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: s .
CharacterClassEscape :: w
  1. Return WordCharacters.
CharacterClassEscape :: W
  1. Return the CharSet containing all characters not in the CharSet returned by CharacterClassEscape :: w .
CharacterClassEscape :: p{ UnicodePropertyValueExpression }
  1. Return the CharSet containing all Unicode code points included in CompileToCharSet of UnicodePropertyValueExpression.
CharacterClassEscape :: P{ UnicodePropertyValueExpression }
  1. Return the CharSet containing all Unicode code points not included in CompileToCharSet of UnicodePropertyValueExpression.
UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue
  1. Let ps be SourceText of UnicodePropertyName.
  2. Let p be UnicodeMatchProperty(ps).
  3. Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 66.
  4. Let vs be SourceText of UnicodePropertyValue.
  5. Let v be UnicodeMatchPropertyValue(p, vs).
  6. Return the CharSet containing all Unicode code points whose character database definition includes the property p with value v.
UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue
  1. Let s be SourceText of LoneUnicodePropertyNameOrValue.
  2. If UnicodeMatchPropertyValue(General_Category, s) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 68, then
    1. Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
  3. Let p be UnicodeMatchProperty(s).
  4. Assert: p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of Table 67.
  5. Return the CharSet containing all Unicode code points whose character database definition includes the property p with value “True”.

22.2.2.9.1 CharacterRange ( A, B )

The abstract operation CharacterRange takes arguments A (a CharSet) and B (a CharSet) and returns a CharSet. It performs the following steps when called:

  1. Assert: A and B each contain exactly one character.
  2. Let a be the one character in CharSet A.
  3. Let b be the one character in CharSet B.
  4. Let i be the character value of character a.
  5. Let j be the character value of character b.
  6. Assert: ij.
  7. Return the CharSet containing all characters with a character value greater than or equal to i and less than or equal to j.

22.2.2.9.2 UnicodeMatchProperty ( p )

The abstract operation UnicodeMatchProperty takes argument p (a List of Unicode code points) and returns a Unicode property name. It performs the following steps when called:

  1. Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 66 or Table 67.
  2. Let c be the canonical property name of p as given in the “Canonical property name” column of the corresponding row.
  3. Return the List of Unicode code points c.

Implementations must support the Unicode property names and aliases listed in Table 66 and Table 67. To ensure interoperability, implementations must not support any other property names or aliases.

Note 1

For example, Script_Extensions (property name) and scx (property alias) are valid, but script_extensions or Scx aren't.

Note 2

The listed properties form a superset of what UTS18 RL1.2 requires.

Table 66: Non-binary Unicode property aliases and their canonical property names
Property name and aliases Canonical property name
General_Category General_Category
gc
Script Script
sc
Script_Extensions Script_Extensions
scx
Table 67: Binary Unicode property aliases and their canonical property names
Property name and aliases Canonical property name
ASCII ASCII
ASCII_Hex_Digit ASCII_Hex_Digit
AHex
Alphabetic Alphabetic
Alpha
Any Any
Assigned Assigned
Bidi_Control Bidi_Control
Bidi_C
Bidi_Mirrored Bidi_Mirrored
Bidi_M
Case_Ignorable Case_Ignorable
CI
Cased Cased
Changes_When_Casefolded Changes_When_Casefolded
CWCF
Changes_When_Casemapped Changes_When_Casemapped
CWCM
Changes_When_Lowercased Changes_When_Lowercased
CWL
Changes_When_NFKC_Casefolded Changes_When_NFKC_Casefolded
CWKCF
Changes_When_Titlecased Changes_When_Titlecased
CWT
Changes_When_Uppercased Changes_When_Uppercased
CWU
Dash Dash
Default_Ignorable_Code_Point Default_Ignorable_Code_Point
DI
Deprecated Deprecated
Dep
Diacritic Diacritic
Dia
Emoji Emoji
Emoji_Component Emoji_Component
EComp
Emoji_Modifier Emoji_Modifier
EMod
Emoji_Modifier_Base Emoji_Modifier_Base
EBase
Emoji_Presentation Emoji_Presentation
EPres
Extended_Pictographic Extended_Pictographic
ExtPict
Extender Extender
Ext
Grapheme_Base Grapheme_Base
Gr_Base
Grapheme_Extend Grapheme_Extend
Gr_Ext
Hex_Digit Hex_Digit
Hex
IDS_Binary_Operator IDS_Binary_Operator
IDSB
IDS_Trinary_Operator IDS_Trinary_Operator
IDST
ID_Continue ID_Continue
IDC
ID_Start ID_Start
IDS
Ideographic Ideographic
Ideo
Join_Control Join_Control
Join_C
Logical_Order_Exception Logical_Order_Exception
LOE
Lowercase Lowercase
Lower
Math Math
Noncharacter_Code_Point Noncharacter_Code_Point
NChar
Pattern_Syntax Pattern_Syntax
Pat_Syn
Pattern_White_Space Pattern_White_Space
Pat_WS
Quotation_Mark Quotation_Mark
QMark
Radical Radical
Regional_Indicator Regional_Indicator
RI
Sentence_Terminal Sentence_Terminal
STerm
Soft_Dotted Soft_Dotted
SD
Terminal_Punctuation Terminal_Punctuation
Term
Unified_Ideograph Unified_Ideograph
UIdeo
Uppercase Uppercase
Upper
Variation_Selector Variation_Selector
VS
White_Space White_Space
space
XID_Continue XID_Continue
XIDC
XID_Start XID_Start
XIDS

22.2.2.9.3 UnicodeMatchPropertyValue ( p, v )

The abstract operation UnicodeMatchPropertyValue takes arguments p (a List of Unicode code points) and v (a List of Unicode code points) and returns a Unicode property value. It performs the following steps when called:

  1. Assert: p is a canonical, unaliased Unicode property name listed in the “Canonical property name” column of Table 66.
  2. Assert: v is a property value or property value alias for Unicode property p listed in the “Property value and aliases” column of Table 68 or Table 69.
  3. Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
  4. Return the List of Unicode code points value.

Implementations must support the Unicode property value names and aliases listed in Table 68 and Table 69. To ensure interoperability, implementations must not support any other property value names or aliases.

Note 1

For example, Xpeo and Old_Persian are valid Script_Extensions values, but xpeo and Old Persian aren't.

Note 2

This algorithm differs from the matching rules for symbolic values listed in UAX44: case, white space, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the Is prefix is not supported.

Note 3

The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files PropertyAliases.txt and PropertyValueAliases.txt in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.

Table 68: Value aliases and canonical values for the Unicode property General_Category
Property value and aliases Canonical property value
Cased_Letter Cased_Letter
LC
Close_Punctuation Close_Punctuation
Pe
Connector_Punctuation Connector_Punctuation
Pc
Control Control
Cc
cntrl
Currency_Symbol Currency_Symbol
Sc
Dash_Punctuation Dash_Punctuation
Pd
Decimal_Number Decimal_Number
Nd
digit
Enclosing_Mark Enclosing_Mark
Me
Final_Punctuation Final_Punctuation
Pf
Format Format
Cf
Initial_Punctuation Initial_Punctuation
Pi
Letter Letter
L
Letter_Number Letter_Number
Nl
Line_Separator Line_Separator
Zl
Lowercase_Letter Lowercase_Letter
Ll
Mark Mark
M
Combining_Mark
Math_Symbol Math_Symbol
Sm
Modifier_Letter Modifier_Letter
Lm
Modifier_Symbol Modifier_Symbol
Sk
Nonspacing_Mark Nonspacing_Mark
Mn
Number Number
N
Open_Punctuation Open_Punctuation
Ps
Other Other
C
Other_Letter Other_Letter
Lo
Other_Number Other_Number
No
Other_Punctuation Other_Punctuation
Po
Other_Symbol Other_Symbol
So
Paragraph_Separator Paragraph_Separator
Zp
Private_Use Private_Use
Co
Punctuation Punctuation
P
punct
Separator Separator
Z
Space_Separator Space_Separator
Zs
Spacing_Mark Spacing_Mark
Mc
Surrogate Surrogate
Cs
Symbol Symbol
S
Titlecase_Letter Titlecase_Letter
Lt
Unassigned Unassigned
Cn
Uppercase_Letter Uppercase_Letter
Lu
Table 69: Value aliases and canonical values for the Unicode properties Script and Script_Extensions
Property value and aliases Canonical property value
Adlam Adlam
Adlm
Ahom Ahom
Anatolian_Hieroglyphs Anatolian_Hieroglyphs
Hluw
Arabic Arabic
Arab
Armenian Armenian
Armn
Avestan Avestan
Avst
Balinese Balinese
Bali
Bamum Bamum
Bamu
Bassa_Vah Bassa_Vah
Bass
Batak Batak
Batk
Bengali Bengali
Beng
Bhaiksuki Bhaiksuki
Bhks
Bopomofo Bopomofo
Bopo
Brahmi Brahmi
Brah
Braille Braille
Brai
Buginese Buginese
Bugi
Buhid Buhid
Buhd
Canadian_Aboriginal Canadian_Aboriginal
Cans
Carian Carian
Cari
Caucasian_Albanian Caucasian_Albanian
Aghb
Chakma Chakma
Cakm
Cham Cham
Chorasmian Chorasmian
Chrs
Cherokee Cherokee
Cher
Common Common
Zyyy
Coptic Coptic
Copt
Qaac
Cuneiform Cuneiform
Xsux
Cypriot Cypriot
Cprt
Cypro_Minoan Cypro_Minoan
Cpmn
Cyrillic Cyrillic
Cyrl
Deseret Deseret
Dsrt
Devanagari Devanagari
Deva
Dives_Akuru Dives_Akuru
Diak
Dogra Dogra
Dogr
Duployan Duployan
Dupl
Egyptian_Hieroglyphs Egyptian_Hieroglyphs
Egyp
Elbasan Elbasan
Elba
Elymaic Elymaic
Elym
Ethiopic Ethiopic
Ethi
Georgian Georgian
Geor
Glagolitic Glagolitic
Glag
Gothic Gothic
Goth
Grantha Grantha
Gran
Greek Greek
Grek
Gujarati Gujarati
Gujr
Gunjala_Gondi Gunjala_Gondi
Gong
Gurmukhi Gurmukhi
Guru
Han Han
Hani
Hangul Hangul
Hang
Hanifi_Rohingya Hanifi_Rohingya
Rohg
Hanunoo Hanunoo
Hano
Hatran Hatran
Hatr
Hebrew Hebrew
Hebr
Hiragana Hiragana
Hira
Imperial_Aramaic Imperial_Aramaic
Armi
Inherited Inherited
Zinh
Qaai
Inscriptional_Pahlavi Inscriptional_Pahlavi
Phli
Inscriptional_Parthian Inscriptional_Parthian
Prti
Javanese Javanese
Java
Kaithi Kaithi
Kthi
Kannada Kannada
Knda
Katakana Katakana
Kana
Kayah_Li Kayah_Li
Kali
Kharoshthi Kharoshthi
Khar
Khitan_Small_Script Khitan_Small_Script
Kits
Khmer Khmer
Khmr
Khojki Khojki
Khoj
Khudawadi Khudawadi
Sind
Lao Lao
Laoo
Latin Latin
Latn
Lepcha Lepcha
Lepc
Limbu Limbu
Limb
Linear_A Linear_A
Lina
Linear_B Linear_B
Linb
Lisu Lisu
Lycian Lycian
Lyci
Lydian Lydian
Lydi
Mahajani Mahajani
Mahj
Makasar Makasar
Maka
Malayalam Malayalam
Mlym
Mandaic Mandaic
Mand
Manichaean Manichaean
Mani
Marchen Marchen
Marc
Medefaidrin Medefaidrin
Medf
Masaram_Gondi Masaram_Gondi
Gonm
Meetei_Mayek Meetei_Mayek
Mtei
Mende_Kikakui Mende_Kikakui
Mend
Meroitic_Cursive Meroitic_Cursive
Merc
Meroitic_Hieroglyphs Meroitic_Hieroglyphs
Mero
Miao Miao
Plrd
Modi Modi
Mongolian Mongolian
Mong
Mro Mro
Mroo
Multani Multani
Mult
Myanmar Myanmar
Mymr
Nabataean Nabataean
Nbat
Nandinagari Nandinagari
Nand
New_Tai_Lue New_Tai_Lue
Talu
Newa Newa
Nko Nko
Nkoo
Nushu Nushu
Nshu
Nyiakeng_Puachue_Hmong Nyiakeng_Puachue_Hmong
Hmnp
Ogham Ogham
Ogam
Ol_Chiki Ol_Chiki
Olck
Old_Hungarian Old_Hungarian
Hung
Old_Italic Old_Italic
Ital
Old_North_Arabian Old_North_Arabian
Narb
Old_Permic Old_Permic
Perm
Old_Persian Old_Persian
Xpeo
Old_Sogdian Old_Sogdian
Sogo
Old_South_Arabian Old_South_Arabian
Sarb
Old_Turkic Old_Turkic
Orkh
Old_Uyghur Old_Uyghur
Ougr
Oriya Oriya
Orya
Osage Osage
Osge
Osmanya Osmanya
Osma
Pahawh_Hmong Pahawh_Hmong
Hmng
Palmyrene Palmyrene
Palm
Pau_Cin_Hau Pau_Cin_Hau
Pauc
Phags_Pa Phags_Pa
Phag
Phoenician Phoenician
Phnx
Psalter_Pahlavi Psalter_Pahlavi
Phlp
Rejang Rejang
Rjng
Runic Runic
Runr
Samaritan Samaritan
Samr
Saurashtra Saurashtra
Saur
Sharada Sharada
Shrd
Shavian Shavian
Shaw
Siddham Siddham
Sidd
SignWriting SignWriting
Sgnw
Sinhala Sinhala
Sinh
Sogdian Sogdian
Sogd
Sora_Sompeng Sora_Sompeng
Sora
Soyombo Soyombo
Soyo
Sundanese Sundanese
Sund
Syloti_Nagri Syloti_Nagri
Sylo
Syriac Syriac
Syrc
Tagalog Tagalog
Tglg
Tagbanwa Tagbanwa
Tagb
Tai_Le Tai_Le
Tale
Tai_Tham Tai_Tham
Lana
Tai_Viet Tai_Viet
Tavt
Takri Takri
Takr
Tamil Tamil
Taml
Tangsa Tangsa
Tnsa
Tangut Tangut
Tang
Telugu Telugu
Telu
Thaana Thaana
Thaa
Thai Thai
Tibetan Tibetan
Tibt
Tifinagh Tifinagh
Tfng
Tirhuta Tirhuta
Tirh
Toto Toto
Ugaritic Ugaritic
Ugar
Vai Vai
Vaii
Vithkuqi Vithkuqi
Vith
Wancho Wancho
Wcho
Warang_Citi Warang_Citi
Wara
Yezidi Yezidi
Yezi
Yi Yi
Yiii
Zanabazar_Square Zanabazar_Square
Zanb

22.2.3 The RegExp Constructor

The RegExp constructor:

  • is %RegExp%.
  • is the initial value of the "RegExp" property of the global object.
  • creates and initializes a new RegExp object when called as a function rather than as a constructor. Thus the function call RegExp(…) is equivalent to the object creation expression new RegExp(…) with the same arguments.
  • may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified RegExp behaviour must include a super call to the RegExp constructor to create and initialize subclass instances with the necessary internal slots.

22.2.3.1 RegExp ( pattern, flags )

The following steps are taken:

  1. Let patternIsRegExp be ? IsRegExp(pattern).
  2. If NewTarget is undefined, then
    1. Let newTarget be the active function object.
    2. If patternIsRegExp is true and flags is undefined, then
      1. Let patternConstructor be ? Get(pattern, "constructor").
      2. If SameValue(newTarget, patternConstructor) is true, return pattern.
  3. Else, let newTarget be NewTarget.
  4. If Type(pattern) is Object and pattern has a [[RegExpMatcher]] internal slot, then
    1. Let P be pattern.[[OriginalSource]].
    2. If flags is undefined, let F be pattern.[[OriginalFlags]].
    3. Else, let F be flags.
  5. Else if patternIsRegExp is true, then
    1. Let P be ? Get(pattern, "source").
    2. If flags is undefined, then
      1. Let F be ? Get(pattern, "flags").
    3. Else, let F be flags.
  6. Else,
    1. Let P be pattern.
    2. Let F be flags.
  7. Let O be ? RegExpAlloc(newTarget).
  8. Return ? RegExpInitialize(O, P, F).
Note

If pattern is supplied using a StringLiteral, the usual escape sequence substitutions are performed before the String is processed by RegExp. If pattern must contain an escape sequence to be recognized by RegExp, any U+005C (REVERSE SOLIDUS) code points must be escaped within the StringLiteral to prevent them being removed when the contents of the StringLiteral are formed.

22.2.3.2 Abstract Operations for the RegExp Constructor

22.2.3.2.1 RegExpAlloc ( newTarget )

The abstract operation RegExpAlloc takes argument newTarget and returns either a normal completion containing an Object or an abrupt completion. It performs the following steps when called:

  1. Let obj be ? OrdinaryCreateFromConstructor(newTarget, "%RegExp.prototype%", « [[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]] »).
  2. Perform ! DefinePropertyOrThrow(obj, "lastIndex", PropertyDescriptor { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }).
  3. Return obj.

22.2.3.2.2 RegExpInitialize ( obj, pattern, flags )

The abstract operation RegExpInitialize takes arguments obj (an Object), pattern (an ECMAScript language value), and flags (an ECMAScript language value) and returns either a normal completion containing an Object or an abrupt completion. It performs the following steps when called:

  1. If pattern is undefined, let P be the empty String.
  2. Else, let P be ? ToString(pattern).
  3. If flags is undefined, let F be the empty String.
  4. Else, let F be ? ToString(flags).
  5. If F contains any code unit other than "d", "g", "i", "m", "s", "u", or "y" or if it contains the same code unit more than once, throw a SyntaxError exception.
  6. If F contains "u", let u be true; else let u be false.
  7. If u is true, then
    1. Let patternText be StringToCodePoints(P).
  8. Else,
    1. Let patternText be the result of interpreting each of P's 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
  9. Let parseResult be ParsePattern(patternText, u).
  10. If parseResult is a non-empty List of SyntaxError objects, throw a SyntaxError exception.
  11. Assert: parseResult is a Pattern Parse Node.
  12. Set obj.[[OriginalSource]] to P.
  13. Set obj.[[OriginalFlags]] to F.
  14. NOTE: The definitions of DotAll, IgnoreCase, Multiline, and Unicode in 22.2.2.1 refer to this value of obj.[[OriginalFlags]].
  15. Set obj.[[RegExpMatcher]] to CompilePattern of parseResult.
  16. Perform ? Set(obj, "lastIndex", +0𝔽, true).
  17. Return obj.

22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u )

The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points) and u (a Boolean) and returns a Parse Node or a non-empty List of SyntaxError objects. It performs the following steps when called:

  1. If u is true, then
    1. Let parseResult be ParseText(patternText, Pattern[+UnicodeMode, +N]).
  2. Else,
    1. Let parseResult be ParseText(patternText, Pattern[~UnicodeMode, ~N]).
    2. If parseResult is a Parse Node and parseResult contains a GroupName, then
      1. Set parseResult to ParseText(patternText, Pattern[~UnicodeMode, +N]).
  3. Return parseResult.

22.2.3.2.4 RegExpCreate ( P, F )

The abstract operation RegExpCreate takes arguments P and F and returns either a normal completion containing an Object or an abrupt completion. It performs the following steps when called:

  1. Let obj be ! RegExpAlloc(%RegExp%).
  2. Return ? RegExpInitialize(obj, P, F).

22.2.3.2.5 EscapeRegExpPattern ( P, F )

The abstract operation EscapeRegExpPattern takes arguments P and F and returns a String. It performs the following steps when called:

  1. Let S be a String in the form of a Pattern[~UnicodeMode] (Pattern[+UnicodeMode] if F contains "u") equivalent to P interpreted as UTF-16 encoded Unicode code points (6.1.4), in which certain code points are escaped as described below. S may or may not be identical to P; however, the Abstract Closure that would result from evaluating S as a Pattern[~UnicodeMode] (Pattern[+UnicodeMode] if F contains "u") must behave identically to the Abstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results.
  2. The code points / or any LineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that the string-concatenation of "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is "/", then S could be "\/" or "\u002F", among other possibilities, but not "/", because /// followed by F would be parsed as a SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this specification can be met by letting S be "(?:)".
  3. Return S.

22.2.4 Properties of the RegExp Constructor

The RegExp constructor:

  • has a [[Prototype]] internal slot whose value is %Function.prototype%.
  • has the following properties:

22.2.4.1 RegExp.prototype

The initial value of RegExp.prototype is the RegExp prototype object.

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

22.2.4.2 get RegExp [ @@species ]

RegExp[@@species] is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Return the this value.

The value of the "name" property of this function is "get [Symbol.species]".

Note

RegExp prototype methods normally use their this value's constructor to create a derived object. However, a subclass constructor may over-ride that default behaviour by redefining its @@species property.

22.2.5 Properties of the RegExp Prototype Object

The RegExp prototype object:

  • is %RegExp.prototype%.
  • is an ordinary object.
  • is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
  • has a [[Prototype]] internal slot whose value is %Object.prototype%.
Note

The RegExp prototype object does not have a "valueOf" property of its own; however, it inherits the "valueOf" property from the Object prototype object.

22.2.5.1 RegExp.prototype.constructor

The initial value of RegExp.prototype.constructor is %RegExp%.

22.2.5.2 RegExp.prototype.exec ( string )

Performs a regular expression match of string against the regular expression and returns an Array containing the results of the match, or null if string did not match.

The String ToString(string) is searched for an occurrence of the regular expression pattern as follows:

  1. Let R be the this value.
  2. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
  3. Let S be ? ToString(string).
  4. Return ? RegExpBuiltinExec(R, S).

22.2.5.2.1 RegExpExec ( R, S )

The abstract operation RegExpExec takes arguments R (an Object) and S (a String) and returns either a normal completion containing either an Object or null, or an abrupt completion. It performs the following steps when called:

  1. Let exec be ? Get(R, "exec").
  2. If IsCallable(exec) is true, then
    1. Let result be ? Call(exec, R, « S »).
    2. If Type(result) is neither Object nor Null, throw a TypeError exception.
    3. Return result.
  3. Perform ? RequireInternalSlot(R, [[RegExpMatcher]]).
  4. Return ? RegExpBuiltinExec(R, S).
Note

If a callable "exec" property is not found this algorithm falls back to attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup of "exec".

22.2.5.2.2 RegExpBuiltinExec ( R, S )

The abstract operation RegExpBuiltinExec takes arguments R (an initialized RegExp instance) and S (a String) and returns either a normal completion containing either an Array exotic object or null, or an abrupt completion. It performs the following steps when called:

  1. Let length be the number of code units in S.
  2. Let lastIndex be (? ToLength(? Get(R, "lastIndex"))).
  3. Let flags be R.[[OriginalFlags]].
  4. If flags contains "g", let global be true; else let global be false.
  5. If flags contains "y", let sticky be true; else let sticky be false.
  6. If flags contains "d", let hasIndices be true; else let hasIndices be false.
  7. If global is false and sticky is false, set lastIndex to 0.
  8. Let matcher be R.[[RegExpMatcher]].
  9. If flags contains "u", let fullUnicode be true; else let fullUnicode be false.
  10. Let matchSucceeded be false.
  11. If fullUnicode is true, let input be StringToCodePoints(S). Otherwise, let input be a List whose elements are the code units that are the elements of S.
  12. NOTE: Each element of input is considered to be a character.
  13. Repeat, while matchSucceeded is false,
    1. If lastIndex > length, then
      1. If global is true or sticky is true, then
        1. Perform ? Set(R, "lastIndex", +0𝔽, true).
      2. Return null.
    2. Let inputIndex be the index into input of the character that was obtained from element lastIndex of S.
    3. Let r be matcher(input, inputIndex).
    4. If r is failure, then
      1. If sticky is true, then
        1. Perform ? Set(R, "lastIndex", +0𝔽, true).
        2. Return null.
      2. Set lastIndex to AdvanceStringIndex(S, lastIndex, fullUnicode).
    5. Else,
      1. Assert: r is a State.
      2. Set matchSucceeded to true.
  14. Let e be r's endIndex value.
  15. If fullUnicode is true, set e to GetStringIndex(S, e).
  16. If global is true or sticky is true, then
    1. Perform ? Set(R, "lastIndex", 𝔽(e), true).
  17. Let n be the number of elements in r's captures List. (This is the same value as 22.2.2.1's NcapturingParens.)
  18. Assert: n < 232 - 1.
  19. Let A be ! ArrayCreate(n + 1).
  20. Assert: The mathematical value of A's "length" property is n + 1.
  21. Perform ! CreateDataPropertyOrThrow(A, "index", 𝔽(lastIndex)).
  22. Perform ! CreateDataPropertyOrThrow(A, "input", S).
  23. Let match be the Match Record { [[StartIndex]]: lastIndex, [[EndIndex]]: e }.
  24. Let indices be a new empty List.
  25. Let groupNames be a new empty List.
  26. Append match to indices.
  27. Let matchedSubstr be GetMatchString(S, match).
  28. Perform ! CreateDataPropertyOrThrow(A, "0", matchedSubstr).
  29. If R contains any GroupName, then
    1. Let groups be OrdinaryObjectCreate(null).
    2. Let hasGroups be true.
  30. Else,
    1. Let groups be undefined.
    2. Let hasGroups be false.
  31. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
  32. For each integer i such that i ≥ 1 and in, in ascending order, do
    1. Let captureI be ith element of r's captures List.
    2. If captureI is undefined, then
      1. Let capturedValue be undefined.
      2. Append undefined to indices.
    3. Else,
      1. Let captureStart be captureI's startIndex.
      2. Let captureEnd be captureI's endIndex.
      3. If fullUnicode is true, then
        1. Set captureStart to GetStringIndex(S, captureStart).
        2. Set captureEnd to GetStringIndex(S, captureEnd).
      4. Let capture be the Match Record { [[StartIndex]]: captureStart, [[EndIndex]]: captureEnd }.
      5. Let capturedValue be GetMatchString(S, capture).
      6. Append capture to indices.
    4. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), capturedValue).
    5. If the ith capture of R was defined with a GroupName, then
      1. Let s be the CapturingGroupName of the corresponding RegExpIdentifierName.
      2. Perform ! CreateDataPropertyOrThrow(groups, s, capturedValue).
      3. Append s to groupNames.
    6. Else,
      1. Append undefined to groupNames.
  33. If hasIndices is true, then
    1. Let indicesArray be MakeMatchIndicesIndexPairArray(S, indices, groupNames, hasGroups).
    2. Perform ! CreateDataPropertyOrThrow(A, "indices", indicesArray).
  34. Return A.

22.2.5.2.3 AdvanceStringIndex ( S, index, unicode )

The abstract operation AdvanceStringIndex takes arguments S (a String), index (a non-negative integer), and unicode (a Boolean) and returns an integer. It performs the following steps when called:

  1. Assert: index ≤ 253 - 1.
  2. If unicode is false, return index + 1.
  3. Let length be the number of code units in S.
  4. If index + 1 ≥ length, return index + 1.
  5. Let cp be CodePointAt(S, index).
  6. Return index + cp.[[CodeUnitCount]].

22.2.5.2.4 GetStringIndex ( S, e )

The abstract operation GetStringIndex takes arguments S (a String) and e (a non-negative integer) and returns a non-negative integer. It performs the following steps when called:

  1. If S is the empty String, return 0.
  2. Let codepoints be StringToCodePoints(S).
  3. Let eUTF be the smallest index into S that corresponds to the character at element e of codepoints. If e is greater than or equal to the number of elements in codepoints, then eUTF is the number of code units in S.
  4. Return eUTF.

22.2.5.2.5 Match Records

A Match Record is a Record value used to encapsulate the start and end indices of a regular expression match or capture.

Match Records have the fields listed in Table 70.

Table 70: Match Record Fields
Field Name Value Meaning
[[StartIndex]] a non-negative integer The number of code units from the start of a string at which the match begins (inclusive).
[[EndIndex]] an integer ≥ [[StartIndex]] The number of code units from the start of a string at which the match ends (exclusive).

22.2.5.2.6 GetMatchString ( S, match )

The abstract operation GetMatchString takes arguments S (a String) and match (a Match Record) and returns a String. It performs the following steps when called:

  1. Assert: match.[[StartIndex]] is a non-negative integer less than or equal to the length of S.
  2. Assert: match.[[EndIndex]] is an integer between match.[[StartIndex]] and the length of S, inclusive.
  3. Return the substring of S from match.[[StartIndex]] to match.[[EndIndex]].

22.2.5.2.7 GetMatchIndexPair ( S, match )

The abstract operation GetMatchIndexPair takes arguments S (a String) and match (a Match Record) and returns an Array. It performs the following steps when called:

  1. Assert: match.[[StartIndex]] is a non-negative integer less than or equal to the length of S.
  2. Assert: match.[[EndIndex]] is an integer between match.[[StartIndex]] and the length of S, inclusive.
  3. Return CreateArrayFromList𝔽(match.[[StartIndex]]), 𝔽(match.[[EndIndex]]) »).

22.2.5.2.8 MakeMatchIndicesIndexPairArray ( S, indices, groupNames, hasGroups )

The abstract operation MakeMatchIndicesIndexPairArray takes arguments S (a String), indices (a List of either Match Records or undefined), groupNames (a List of either Strings or undefined), and hasGroups (a Boolean) and returns an Array. It performs the following steps when called:

  1. Let n be the number of elements in indices.
  2. Assert: n < 232 - 1.
  3. Assert: groupNames has n - 1 elements.
  4. NOTE: The groupNames List contains elements aligned with the indices List starting at indices[1].
  5. Let A be ! ArrayCreate(n).
  6. If hasGroups is true, then
    1. Let groups be OrdinaryObjectCreate(null).
  7. Else,
    1. Let groups be undefined.
  8. Perform ! CreateDataPropertyOrThrow(A, "groups", groups).
  9. For each integer i starting with 0 such that i < n, in ascending order, do
    1. Let matchIndices be indices[i].
    2. If matchIndices is not undefined, then
      1. Let matchIndexPair be GetMatchIndexPair(S, matchIndices).
    3. Else,
      1. Let matchIndexPair be undefined.
    4. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), matchIndexPair).
    5. If i > 0 and groupNames[i - 1] is not undefined, then
      1. Assert: groups is not undefined.
      2. Perform ! CreateDataPropertyOrThrow(groups, groupNames[i - 1], matchIndexPair).
  10. Return A.

22.2.5.3 get RegExp.prototype.dotAll

RegExp.prototype.dotAll is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0073 (LATIN SMALL LETTER S).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.3.1 RegExpHasFlag ( R, codeUnit )

The abstract operation RegExpHasFlag takes arguments R (an ECMAScript language value) and codeUnit (a code unit) and returns either a normal completion containing either a Boolean or undefined, or an abrupt completion. It performs the following steps when called:

  1. If Type(R) is not Object, throw a TypeError exception.
  2. If R does not have an [[OriginalFlags]] internal slot, then
    1. If SameValue(R, %RegExp.prototype%) is true, return undefined.
    2. Otherwise, throw a TypeError exception.
  3. Let flags be R.[[OriginalFlags]].
  4. If flags contains codeUnit, return true.
  5. Return false.

22.2.5.4 get RegExp.prototype.flags

RegExp.prototype.flags is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let result be the empty String.
  4. Let hasIndices be ToBoolean(? Get(R, "hasIndices")).
  5. If hasIndices is true, append the code unit 0x0064 (LATIN SMALL LETTER D) as the last code unit of result.
  6. Let global be ToBoolean(? Get(R, "global")).
  7. If global is true, append the code unit 0x0067 (LATIN SMALL LETTER G) as the last code unit of result.
  8. Let ignoreCase be ToBoolean(? Get(R, "ignoreCase")).
  9. If ignoreCase is true, append the code unit 0x0069 (LATIN SMALL LETTER I) as the last code unit of result.
  10. Let multiline be ToBoolean(? Get(R, "multiline")).
  11. If multiline is true, append the code unit 0x006D (LATIN SMALL LETTER M) as the last code unit of result.
  12. Let dotAll be ToBoolean(? Get(R, "dotAll")).
  13. If dotAll is true, append the code unit 0x0073 (LATIN SMALL LETTER S) as the last code unit of result.
  14. Let unicode be ToBoolean(? Get(R, "unicode")).
  15. If unicode is true, append the code unit 0x0075 (LATIN SMALL LETTER U) as the last code unit of result.
  16. Let sticky be ToBoolean(? Get(R, "sticky")).
  17. If sticky is true, append the code unit 0x0079 (LATIN SMALL LETTER Y) as the last code unit of result.
  18. Return result.

22.2.5.5 get RegExp.prototype.global

RegExp.prototype.global is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0067 (LATIN SMALL LETTER G).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.6 get RegExp.prototype.hasIndices

RegExp.prototype.hasIndices is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0064 (LATIN SMALL LETTER D).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.7 get RegExp.prototype.ignoreCase

RegExp.prototype.ignoreCase is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0069 (LATIN SMALL LETTER I).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.8 RegExp.prototype [ @@match ] ( string )

When the @@match method is called with argument string, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let global be ToBoolean(? Get(rx, "global")).
  5. If global is false, then
    1. Return ? RegExpExec(rx, S).
  6. Else,
    1. Assert: global is true.
    2. Let fullUnicode be ToBoolean(? Get(rx, "unicode")).
    3. Perform ? Set(rx, "lastIndex", +0𝔽, true).
    4. Let A be ! ArrayCreate(0).
    5. Let n be 0.
    6. Repeat,
      1. Let result be ? RegExpExec(rx, S).
      2. If result is null, then
        1. If n = 0, return null.
        2. Return A.
      3. Else,
        1. Let matchStr be ? ToString(? Get(result, "0")).
        2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(n)), matchStr).
        3. If matchStr is the empty String, then
          1. Let thisIndex be (? ToLength(? Get(rx, "lastIndex"))).
          2. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          3. Perform ? Set(rx, "lastIndex", 𝔽(nextIndex), true).
        4. Set n to n + 1.

The value of the "name" property of this function is "[Symbol.match]".

Note

The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.

22.2.5.9 RegExp.prototype [ @@matchAll ] ( string )

When the @@matchAll method is called with argument string, the following steps are taken:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let C be ? SpeciesConstructor(R, %RegExp%).
  5. Let flags be ? ToString(? Get(R, "flags")).
  6. Let matcher be ? Construct(C, « R, flags »).
  7. Let lastIndex be ? ToLength(? Get(R, "lastIndex")).
  8. Perform ? Set(matcher, "lastIndex", lastIndex, true).
  9. If flags contains "g", let global be true.
  10. Else, let global be false.
  11. If flags contains "u", let fullUnicode be true.
  12. Else, let fullUnicode be false.
  13. Return CreateRegExpStringIterator(matcher, S, global, fullUnicode).

The value of the "name" property of this function is "[Symbol.matchAll]".

22.2.5.10 get RegExp.prototype.multiline

RegExp.prototype.multiline is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x006D (LATIN SMALL LETTER M).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.11 RegExp.prototype [ @@replace ] ( string, replaceValue )

When the @@replace method is called with arguments string and replaceValue, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let lengthS be the number of code unit elements in S.
  5. Let functionalReplace be IsCallable(replaceValue).
  6. If functionalReplace is false, then
    1. Set replaceValue to ? ToString(replaceValue).
  7. Let global be ToBoolean(? Get(rx, "global")).
  8. If global is true, then
    1. Let fullUnicode be ToBoolean(? Get(rx, "unicode")).
    2. Perform ? Set(rx, "lastIndex", +0𝔽, true).
  9. Let results be a new empty List.
  10. Let done be false.
  11. Repeat, while done is false,
    1. Let result be ? RegExpExec(rx, S).
    2. If result is null, set done to true.
    3. Else,
      1. Append result to the end of results.
      2. If global is false, set done to true.
      3. Else,
        1. Let matchStr be ? ToString(? Get(result, "0")).
        2. If matchStr is the empty String, then
          1. Let thisIndex be (? ToLength(? Get(rx, "lastIndex"))).
          2. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          3. Perform ? Set(rx, "lastIndex", 𝔽(nextIndex), true).
  12. Let accumulatedResult be the empty String.
  13. Let nextSourcePosition be 0.
  14. For each element result of results, do
    1. Let resultLength be ? LengthOfArrayLike(result).
    2. Let nCaptures be max(resultLength - 1, 0).
    3. Let matched be ? ToString(? Get(result, "0")).
    4. Let matchLength be the number of code units in matched.
    5. Let position be ? ToIntegerOrInfinity(? Get(result, "index")).
    6. Set position to the result of clamping position between 0 and lengthS.
    7. Let n be 1.
    8. Let captures be a new empty List.
    9. Repeat, while nnCaptures,
      1. Let capN be ? Get(result, ! ToString(𝔽(n))).
      2. If capN is not undefined, then
        1. Set capN to ? ToString(capN).
      3. Append capN as the last element of captures.
      4. NOTE: When n = 1, the preceding step puts the first element into captures (at index 0). More generally, the nth capture (the characters captured by the nth set of capturing parentheses) is at captures[n - 1].
      5. Set n to n + 1.
    10. Let namedCaptures be ? Get(result, "groups").
    11. If functionalReplace is true, then
      1. Let replacerArgs be « matched ».
      2. Append in List order the elements of captures to the end of the List replacerArgs.
      3. Append 𝔽(position) and S to replacerArgs.
      4. If namedCaptures is not undefined, then
        1. Append namedCaptures as the last element of replacerArgs.
      5. Let replValue be ? Call(replaceValue, undefined, replacerArgs).
      6. Let replacement be ? ToString(replValue).
    12. Else,
      1. If namedCaptures is not undefined, then
        1. Set namedCaptures to ? ToObject(namedCaptures).
      2. Let replacement be ? GetSubstitution(matched, S, position, captures, namedCaptures, replaceValue).
    13. If positionnextSourcePosition, then
      1. NOTE: position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
      2. Set accumulatedResult to the string-concatenation of accumulatedResult, the substring of S from nextSourcePosition to position, and replacement.
      3. Set nextSourcePosition to position + matchLength.
  15. If nextSourcePositionlengthS, return accumulatedResult.
  16. Return the string-concatenation of accumulatedResult and the substring of S from nextSourcePosition.

The value of the "name" property of this function is "[Symbol.replace]".

22.2.5.12 RegExp.prototype [ @@search ] ( string )

When the @@search method is called with argument string, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let previousLastIndex be ? Get(rx, "lastIndex").
  5. If SameValue(previousLastIndex, +0𝔽) is false, then
    1. Perform ? Set(rx, "lastIndex", +0𝔽, true).
  6. Let result be ? RegExpExec(rx, S).
  7. Let currentLastIndex be ? Get(rx, "lastIndex").
  8. If SameValue(currentLastIndex, previousLastIndex) is false, then
    1. Perform ? Set(rx, "lastIndex", previousLastIndex, true).
  9. If result is null, return -1𝔽.
  10. Return ? Get(result, "index").

The value of the "name" property of this function is "[Symbol.search]".

Note

The "lastIndex" and "global" properties of this RegExp object are ignored when performing the search. The "lastIndex" property is left unchanged.

22.2.5.13 get RegExp.prototype.source

RegExp.prototype.source is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalSource]] internal slot, then
    1. If SameValue(R, %RegExp.prototype%) is true, return "(?:)".
    2. Otherwise, throw a TypeError exception.
  4. Assert: R has an [[OriginalFlags]] internal slot.
  5. Let src be R.[[OriginalSource]].
  6. Let flags be R.[[OriginalFlags]].
  7. Return EscapeRegExpPattern(src, flags).

22.2.5.14 RegExp.prototype [ @@split ] ( string, limit )

Note 1

Returns an Array into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the this value regular expression; these occurrences are not part of any String in the returned array, but serve to divide up the String value.

The this value may be an empty regular expression or a regular expression that can match an empty String. In this case, the regular expression does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. (For example, if the regular expression matches the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.) Only the first match at a given index of the String is considered, even if backtracking could yield a non-empty substring match at that index. (For example, /a*?/[Symbol.split]("ab") evaluates to the array ["a", "b"], while /a*/[Symbol.split]("ab") evaluates to the array ["","b"].)

If string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.

If the regular expression contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,

/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")

evaluates to the array

["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]

If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.

When the @@split method is called, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ? ToString(string).
  4. Let C be ? SpeciesConstructor(rx, %RegExp%).
  5. Let flags be ? ToString(? Get(rx, "flags")).
  6. If flags contains "u", let unicodeMatching be true.
  7. Else, let unicodeMatching be false.
  8. If flags contains "y", let newFlags be flags.
  9. Else, let newFlags be the string-concatenation of flags and "y".
  10. Let splitter be ? Construct(C, « rx, newFlags »).
  11. Let A be ! ArrayCreate(0).
  12. Let lengthA be 0.
  13. If limit is undefined, let lim be 232 - 1; else let lim be (? ToUint32(limit)).
  14. If lim is 0, return A.
  15. Let size be the length of S.
  16. If size is 0, then
    1. Let z be ? RegExpExec(splitter, S).
    2. If z is not null, return A.
    3. Perform ! CreateDataPropertyOrThrow(A, "0", S).
    4. Return A.
  17. Let p be 0.
  18. Let q be p.
  19. Repeat, while q < size,
    1. Perform ? Set(splitter, "lastIndex", 𝔽(q), true).
    2. Let z be ? RegExpExec(splitter, S).
    3. If z is null, set q to AdvanceStringIndex(S, q, unicodeMatching).
    4. Else,
      1. Let e be (? ToLength(? Get(splitter, "lastIndex"))).
      2. Set e to min(e, size).
      3. If e = p, set q to AdvanceStringIndex(S, q, unicodeMatching).
      4. Else,
        1. Let T be the substring of S from p to q.
        2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T).
        3. Set lengthA to lengthA + 1.
        4. If lengthA = lim, return A.
        5. Set p to e.
        6. Let numberOfCaptures be ? LengthOfArrayLike(z).
        7. Set numberOfCaptures to max(numberOfCaptures - 1, 0).
        8. Let i be 1.
        9. Repeat, while inumberOfCaptures,
          1. Let nextCapture be ? Get(z, ! ToString(𝔽(i))).
          2. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), nextCapture).
          3. Set i to i + 1.
          4. Set lengthA to lengthA + 1.
          5. If lengthA = lim, return A.
        10. Set q to p.
  20. Let T be the substring of S from p to size.
  21. Perform ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T).
  22. Return A.

The value of the "name" property of this function is "[Symbol.split]".

Note 2

The @@split method ignores the value of the "global" and "sticky" properties of this RegExp object.

22.2.5.15 get RegExp.prototype.sticky

RegExp.prototype.sticky is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0079 (LATIN SMALL LETTER Y).
  3. Return ? RegExpHasFlag(R, cu).

22.2.5.16 RegExp.prototype.test ( S )

The following steps are taken:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let string be ? ToString(S).
  4. Let match be ? RegExpExec(R, string).
  5. If match is not null, return true; else return false.

22.2.5.17 RegExp.prototype.toString ( )

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let pattern be ? ToString(? Get(R, "source")).
  4. Let flags be ? ToString(? Get(R, "flags")).
  5. Let result be the string-concatenation of "/", pattern, "/", and flags.
  6. Return result.
Note

The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.

22.2.5.18 get RegExp.prototype.unicode

RegExp.prototype.unicode is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. Let cu be the code unit 0x0075 (LATIN SMALL LETTER U).
  3. Return ? RegExpHasFlag(R, cu).

22.2.6 Properties of RegExp Instances

RegExp instances are ordinary objects that inherit properties from the RegExp prototype object. RegExp instances have internal slots [[RegExpMatcher]], [[OriginalSource]], and [[OriginalFlags]]. The value of the [[RegExpMatcher]] internal slot is an Abstract Closure representation of the Pattern of the RegExp object.

Note

Prior to ECMAScript 2015, RegExp instances were specified as having the own data properties "source", "global", "ignoreCase", and "multiline". Those properties are now specified as accessor properties of RegExp.prototype.

RegExp instances also have the following property:

22.2.6.1 lastIndex

The value of the "lastIndex" property specifies the String index at which to start the next match. It is coerced to an integral Number when used (see 22.2.5.2.2). This property shall have the attributes { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }.

22.2.7 RegExp String Iterator Objects

A RegExp String Iterator is an object, that represents a specific iteration over some specific String instance object, matching against some specific RegExp instance object. There is not a named constructor for RegExp String Iterator objects. Instead, RegExp String Iterator objects are created by calling certain methods of RegExp instance objects.

22.2.7.1 CreateRegExpStringIterator ( R, S, global, fullUnicode )

The abstract operation CreateRegExpStringIterator takes arguments R (an Object), S (a String), global (a Boolean), and fullUnicode (a Boolean) and returns a Generator. It performs the following steps when called:

  1. Let closure be a new Abstract Closure with no parameters that captures R, S, global, and fullUnicode and performs the following steps when called:
    1. Repeat,
      1. Let match be ? RegExpExec(R, S).
      2. If match is null, return undefined.
      3. If global is false, then
        1. Perform ? GeneratorYield(CreateIterResultObject(match, false)).
        2. Return undefined.
      4. Let matchStr be ? ToString(? Get(match, "0")).
      5. If matchStr is the empty String, then
        1. Let thisIndex be (? ToLength(? Get(R, "lastIndex"))).
        2. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
        3. Perform ? Set(R, "lastIndex", 𝔽(nextIndex), true).
      6. Perform ? GeneratorYield(CreateIterResultObject(match, false)).
  2. Return CreateIteratorFromClosure(closure, "%RegExpStringIteratorPrototype%", %RegExpStringIteratorPrototype%).

22.2.7.2 The %RegExpStringIteratorPrototype% Object

The %RegExpStringIteratorPrototype% object:

  • has properties that are inherited by all RegExp String Iterator Objects.
  • is an ordinary object.
  • has a [[Prototype]] internal slot whose value is %IteratorPrototype%.
  • has the following properties:

22.2.7.2.1 %RegExpStringIteratorPrototype%.next ( )

  1. Return ? GeneratorResume(this value, empty, "%RegExpStringIteratorPrototype%").

22.2.7.2.2 %RegExpStringIteratorPrototype% [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "RegExp String Iterator".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.