?
u
m
/
p
1-9
The String
extends
clause of a class definition. Subclass super
call to the String This function performs the following steps when called:
The String
This function may be called with any number of arguments which form the rest parameter codeUnits.
It performs the following steps when called:
The
This function may be called with any number of arguments which form the rest parameter codePoints.
It performs the following steps when called:
The
The initial value of String.prototype
is the
This property has the attributes { [[Writable]]:
This function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the
It performs the following steps when called:
This function is intended for use as a tag function of a Tagged Template (
The String prototype object:
Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the
This method returns a single element String containing the code unit at index pos within the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result
If pos
is an x.charAt(pos)
is equivalent to the result of x.substring(pos, pos + 1)
.
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method returns a Number (a non-negative
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method returns a non-negative
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
When this method is called it returns the String value consisting of the code units of the
This method performs the following steps when called:
The
This method is intentionally generic; it does not require that its
The initial value of String.prototype.constructor
is
This method performs the following steps when called:
This method returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
If searchString appears as a
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
If searchString appears as a
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
If searchString appears as a
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method returns a Number other than
Before performing the comparisons, this method performs the following steps to prepare the Strings:
The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.
The actual return values are
This method itself is not directly suitable as an argument to Array.prototype.sort
because the latter requires a function of two arguments.
This method may rely on whatever language- and/or locale-sensitive comparison functionality is available to the ECMAScript environment from the
// Å ANGSTROM SIGN vs.
// Å LATIN CAPITAL LETTER A + COMBINING RING ABOVE
"\u212B".localeCompare("A\u030A")
// Ω OHM SIGN vs.
// Ω GREEK CAPITAL LETTER OMEGA
"\u2126".localeCompare("\u03A9")
// ṩ LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE vs.
// ṩ LATIN SMALL LETTER S + COMBINING DOT ABOVE + COMBINING DOT BELOW
"\u1E69".localeCompare("s\u0307\u0323")
// ḍ̇ LATIN SMALL LETTER D WITH DOT ABOVE + COMBINING DOT BELOW vs.
// ḍ̇ LATIN SMALL LETTER D WITH DOT BELOW + COMBINING DOT ABOVE
"\u1E0B\u0323".localeCompare("\u1E0D\u0307")
// 가 HANGUL CHOSEONG KIYEOK + HANGUL JUNGSEONG A vs.
// 가 HANGUL SYLLABLE GA
"\u1100\u1161".localeCompare("\uAC00")
For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms and Unicode Technical Note #5, Canonical Equivalence in Applications. Also see Unicode Technical Standard #10, Unicode Collation Algorithm.
It is recommended that this method should not honour Unicode compatibility equivalents or compatibility decompositions as defined in the Unicode Standard, chapter 3, section 3.7.
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method performs a regular expression match of the String representing the
It performs the following steps when called:
String.prototype.split
, String.prototype.matchAll
is designed to typically act without mutating its inputs.This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
This method performs the following steps when called:
The abstract operation StringPaddingBuiltinsImpl takes arguments O (an
The abstract operation StringPad takes arguments S (a String), maxLength (a non-negative
The argument maxLength will be clamped such that it can be no smaller than the length of S.
The argument fillString defaults to
The abstract operation ToZeroPaddedDecimalString takes arguments n (a non-negative
This method performs the following steps when called:
This method creates the String value consisting of the code units of the
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
The abstract operation GetSubstitution takes arguments matched (a String), str (a String), position (a non-negative
This method performs the following steps when called:
This method performs the following steps when called:
This method is intentionally generic; it does not require that its
This method returns a
It performs the following steps when called:
This method is intentionally generic; it does not require that its
This method returns an Array into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any String in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a
It performs the following steps when called:
The value of separator may be an empty String. In this case, separator does not match the empty
If the
If separator is
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
This method returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
This method returns a
If either argument is
If start is strictly greater than end, they are swapped.
It performs the following steps when called:
This method is intentionally generic; it does not require that its
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It works exactly the same as toLowerCase
except that it is intended to yield a locale-sensitive result corresponding with conventions of the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
This method is intentionally generic; it does not require that its
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It works exactly the same as toUpperCase
except that it is intended to yield a locale-sensitive result corresponding with conventions of the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
This method is intentionally generic; it does not require that its
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the file UnicodeData.txt
, but also all locale-insensitive mappings in the file SpecialCasing.txt
that accompanies it).
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both toUpperCase
and toLowerCase
have context-sensitive behaviour, the methods are not symmetrical. In other words, s.toUpperCase().toLowerCase()
is not necessarily equal to s.toLowerCase()
.
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
For a String object, this method happens to return the same thing as the valueOf
method.
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It behaves in exactly the same way as String.prototype.toLowerCase
, except that the String is mapped using the toUppercase algorithm of the Unicode Default Case Conversion.
This method is intentionally generic; it does not require that its
This method returns a String representation of this object with all
It performs the following steps when called:
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
This method is intentionally generic; it does not require that its
The abstract operation TrimString takes arguments string (an
The definition of white space is the union of
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
This method is intentionally generic; it does not require that its
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
This method is intentionally generic; it does not require that its
This method performs the following steps when called:
The abstract operation ThisStringValue takes argument value (an
This method returns an Iterator object (
It performs the following steps when called:
The value of the
String instances are
String instances have a
The number of elements in the String value represented by this String object.
Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]:
A String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named
The %StringIteratorPrototype% object:
The initial value of the
This property has the attributes { [[Writable]]:
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
The RegExp
Each \u
u
u
\u
The first two lines here are equivalent to CharacterClass.
A number of productions in this section are given alternative definitions in section
This section is amended in
PropertyValueAliases.txt
.
PropertyValueAliases.txt
, nor a binary property or binary property alias listed in the “The abstract operation CountLeftCapturingParensWithin takes argument node (a (
pattern character that is matched by the (
terminal of the
This section is amended in
It performs the following steps when called:
The abstract operation CountLeftCapturingParensBefore takes argument node (a
This section is amended in
It performs the following steps when called:
The
This section is amended in
It is defined piecewise over the following productions:
The definitions of “the MV of
The
This section is amended in
It is defined piecewise over the following productions:
The
This section is amended in
It is defined piecewise over the following productions:
ControlEscape | Numeric Value | Code Point | Unicode Name | Symbol |
---|---|---|---|---|
t
|
9 |
U+0009
|
CHARACTER TABULATION | <HT> |
n
|
10 |
U+000A
|
LINE FEED (LF) | <LF> |
v
|
11 |
U+000B
|
LINE TABULATION | <VT> |
f
|
12 |
U+000C
|
FORM FEED (FF) | <FF> |
r
|
13 |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
\0
represents the <NUL> character and cannot be followed by a decimal digit.
The
The abstract operation GroupSpecifiersThatMatch takes argument thisGroupName (a
The
The
The
A regular expression pattern is converted into an
A u
nor a v
. Otherwise, it is a Unicode pattern. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of
For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character)
Patterns are passed to the RegExp
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
The descriptions below use the following internal data structures:
A RegExp Record is a
It has the following fields:
Field Name | Value | Meaning |
---|---|---|
[[IgnoreCase]] | a Boolean | indicates whether |
[[Multiline]] | a Boolean | indicates whether |
[[DotAll]] | a Boolean | indicates whether |
[[Unicode]] | a Boolean | indicates whether |
[[UnicodeSets]] | a Boolean | indicates whether |
[[CapturingGroupsCount]] | a non-negative |
the number of |
The
A Pattern compiles to an
The
This section is amended in
It is defined piecewise over the following productions:
The |
regular expression operator separates two alternatives. The pattern first tries to match the left |
produce
/a|ab/.exec("abc")
returns the result
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The order in which the two alternatives are tried is independent of the value of direction.
Consecutive
The resulting
The abstract operation RepeatMatcher takes arguments m (a
An
If the
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")
which returns the gcd in unary notation
Step
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost *
clears all captured Strings contained in the quantified
Step
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
The abstract operation EmptyMatcher takes no arguments and returns a
The abstract operation MatchTwoAlternatives takes arguments m1 (a
The abstract operation MatchSequence takes arguments m1 (a
The
This section is amended in
It is defined piecewise over the following productions:
Even when the y
flag is used with a pattern, ^
always matches only at the beginning of Input, or (if rer.[[Multiline]] is
The form (?=
)
specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?=
form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b
and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
The form (?!
)
specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a
not immediately followed by some positive number n of a
's, a b
, another n a
's (specified by the first \2
) and a c
. The second \2
is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
The abstract operation IsWordChar takes arguments rer (a
The
The
The
This section is amended in
It is defined piecewise over the following productions:
Parentheses of the form (
)
serve both to group the components of the \
followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching (?:
)
instead.
An escape sequence of the form \
followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (
The abstract operation CharacterSetMatcher takes arguments rer (a
The abstract operation BackreferenceMatcher takes arguments rer (a
The abstract operation Canonicalize takes arguments rer (a
CaseFolding.txt
of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.In case-insignificant matches when ß
(U+00DF LATIN SMALL LETTER SHARP S) to ss
or SS
. It may however map code points outside the Basic Latin block to code points within it—for example, ſ
(U+017F LATIN SMALL LETTER LONG S) case-folds to s
(U+0073 LATIN SMALL LETTER S) and K
(U+212A KELVIN SIGN) case-folds to k
(U+006B LATIN SMALL LETTER K). Strings containing those code points are matched by regular expressions such as /[a-z]/ui
.
In case-insignificant matches when Ω
(U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to ω
(U+03C9 GREEK SMALL LETTER OMEGA) along with Ω
(U+03A9 GREEK CAPITAL LETTER OMEGA), so /[ω]/ui
and /[\u03A9]/ui
but not by /[ω]/i
or /[\u03A9]/i
. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as /[a-z]/i
.
The
The
This section is amended in
It is defined piecewise over the following productions:
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i
matches only the letters E
, F
, e
, and f
, while the pattern /[E-f]/i
matches all uppercase and lowercase letters in the Unicode Basic Latin block as well as the symbols [
, \
, ]
, ^
, _
, and `
.
A -
character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
-
U+002D (HYPHEN-MINUS).A \b
, \B
, and backreferences. Inside a \b
means the backspace character, while \B
and backreferences raise errors. Using a backreference inside a
0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, and 9
.General_Category
, s) is a Unicode property value or property value alias for the General_Category (gc) property listed in PropertyValueAliases.txt
, thenThe result will often consist of two or more ranges. When UnicodeSets is
The abstract operation CharacterRange takes arguments A (a
The abstract operation HasEitherUnicodeFlag takes argument rer (a
The abstract operation WordCharacters takes argument rer (a \b
, \B
, \w
, and \W
It performs the following steps when called:
The abstract operation AllCharacters takes argument rer (a
The abstract operation MaybeSimpleCaseFolding takes arguments rer (a CaseFolding.txt
of the Unicode Character Database (each of which maps a single code point to another single code point) to map each
The abstract operation CharacterComplement takes arguments rer (a
The abstract operation UnicodeMatchProperty takes arguments rer (a
Implementations must support the Unicode property names and aliases listed in
For example, Script_Extensions
(scx
(property alias) are valid, but script_extensions
or Scx
aren't.
The listed properties form a superset of what UTS18 RL1.2 requires.
The spellings of entries in these tables (including casing) match the spellings used in the file PropertyAliases.txt
in the Unicode Character Database. The precise spellings in that file are guaranteed to be stable.
Canonical |
|
---|---|
General_Category |
General_Category |
gc |
|
Script |
Script |
sc |
|
Script_Extensions |
Script_Extensions |
scx |
Canonical |
|
---|---|
ASCII |
ASCII |
ASCII_Hex_Digit |
ASCII_Hex_Digit |
AHex |
|
Alphabetic |
Alphabetic |
Alpha |
|
Any |
Any |
Assigned |
Assigned |
Bidi_Control |
Bidi_Control |
Bidi_C |
|
Bidi_Mirrored |
Bidi_Mirrored |
Bidi_M |
|
Case_Ignorable |
Case_Ignorable |
CI |
|
Cased |
Cased |
Changes_When_Casefolded |
Changes_When_Casefolded |
CWCF |
|
Changes_When_Casemapped |
Changes_When_Casemapped |
CWCM |
|
Changes_When_Lowercased |
Changes_When_Lowercased |
CWL |
|
Changes_When_NFKC_Casefolded |
Changes_When_NFKC_Casefolded |
CWKCF |
|
Changes_When_Titlecased |
Changes_When_Titlecased |
CWT |
|
Changes_When_Uppercased |
Changes_When_Uppercased |
CWU |
|
Dash |
Dash |
Default_Ignorable_Code_Point |
Default_Ignorable_Code_Point |
DI |
|
Deprecated |
Deprecated |
Dep |
|
Diacritic |
Diacritic |
Dia |
|
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
EComp |
|
Emoji_Modifier |
Emoji_Modifier |
EMod |
|
Emoji_Modifier_Base |
Emoji_Modifier_Base |
EBase |
|
Emoji_Presentation |
Emoji_Presentation |
EPres |
|
Extended_Pictographic |
Extended_Pictographic |
ExtPict |
|
Extender |
Extender |
Ext |
|
Grapheme_Base |
Grapheme_Base |
Gr_Base |
|
Grapheme_Extend |
Grapheme_Extend |
Gr_Ext |
|
Hex_Digit |
Hex_Digit |
Hex |
|
IDS_Binary_Operator |
IDS_Binary_Operator |
IDSB |
|
IDS_Trinary_Operator |
IDS_Trinary_Operator |
IDST |
|
ID_Continue |
ID_Continue |
IDC |
|
ID_Start |
ID_Start |
IDS |
|
Ideographic |
Ideographic |
Ideo |
|
Join_Control |
Join_Control |
Join_C |
|
Logical_Order_Exception |
Logical_Order_Exception |
LOE |
|
Lowercase |
Lowercase |
Lower |
|
Math |
Math |
Noncharacter_Code_Point |
Noncharacter_Code_Point |
NChar |
|
Pattern_Syntax |
Pattern_Syntax |
Pat_Syn |
|
Pattern_White_Space |
Pattern_White_Space |
Pat_WS |
|
Quotation_Mark |
Quotation_Mark |
QMark |
|
Radical |
Radical |
Regional_Indicator |
Regional_Indicator |
RI |
|
Sentence_Terminal |
Sentence_Terminal |
STerm |
|
Soft_Dotted |
Soft_Dotted |
SD |
|
Terminal_Punctuation |
Terminal_Punctuation |
Term |
|
Unified_Ideograph |
Unified_Ideograph |
UIdeo |
|
Uppercase |
Uppercase |
Upper |
|
Variation_Selector |
Variation_Selector |
VS |
|
White_Space |
White_Space |
space |
|
XID_Continue |
XID_Continue |
XIDC |
|
XID_Start |
XID_Start |
XIDS |
Basic_Emoji |
Emoji_Keycap_Sequence |
RGI_Emoji_Modifier_Sequence |
RGI_Emoji_Flag_Sequence |
RGI_Emoji_Tag_Sequence |
RGI_Emoji_ZWJ_Sequence |
RGI_Emoji |
The abstract operation UnicodeMatchPropertyValue takes arguments p (
PropertyValueAliases.txt
.Implementations must support the Unicode property values and property value aliases listed in PropertyValueAliases.txt
for the properties listed in
For example, Xpeo
and Old_Persian
are valid Script_Extensions
values, but xpeo
and Old Persian
aren't.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is
prefix is not supported.
The
The abstract operation RegExpCreate takes arguments P (an
The abstract operation RegExpAlloc takes argument newTarget (a
The abstract operation RegExpInitialize takes arguments obj (an Object), pattern (an
The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points), u (a Boolean), and v (a Boolean) and returns a
This section is amended in
It performs the following steps when called:
The RegExp
extends
clause of a class definition. Subclass super
call to the RegExp This function performs the following steps when called:
If pattern is supplied using a
The RegExp
The initial value of RegExp.prototype
is the
This property has the attributes { [[Writable]]:
RegExp[@@species]
is an
The value of the
RegExp prototype methods normally use their
The RegExp prototype object:
The RegExp prototype object does not have a
The initial value of RegExp.prototype.constructor
is
This method searches string for an occurrence of the regular expression pattern and returns an Array containing the results of the match, or
It performs the following steps when called:
RegExp.prototype.dotAll
is an
RegExp.prototype.flags
is an