ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars. The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification. Regardless of the external source text encoding, a conforming ECMAScript implementation processes the source text as if it was an equivalent sequence of
The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.
In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.
ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence
\u000A, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence
\u000A occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write
\n instead of
\u000A to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.
The abstract operation UTF16EncodeCodePoint takes argument cp (a Unicode code point). It performs the following steps when called:
The abstract operation CodePointsToString takes argument text (a sequence of Unicode code points). It converts text into a String value, as described in
The abstract operation UTF16SurrogatePairToCodePoint takes arguments lead (a code unit) and trail (a code unit). Two code units that form a UTF-16
The abstract operation CodePointAt takes arguments string (a String) and position (a non-negative
The abstract operation StringToCodePoints takes argument string (a String). It returns the sequence of Unicode code points that results from interpreting string as UTF-16 encoded Unicode text as described in
The abstract operation ParseText takes arguments sourceText (a sequence of Unicode code points) and goalSymbol (a nonterminal in one of the ECMAScript grammars). It performs the following steps when called:
Consider a text that has an
See also clause
There are four types of ECMAScript code:
evalfunction. More precisely, if the parameter to the built-in
evalfunction is a String, it is treated as an ECMAScript
evalis the global code portion of that
Function code is source text that is parsed to supply the value of the [[ECMAScriptCode]] and [[FormalParameters]] internal slots (see
In addition, if the source text referred to above is parsed as:
then the source text matching the
Function code is generally provided as the bodies of Function Definitions (
A Directive Prologue is the longest sequence of
A Use Strict Directive is an
"use strict" or
'use strict'. A
An ECMAScript syntactic unit may be processed using either unrestricted or strict mode syntax and semantics (
ECMAScript code that is not strict mode code is called non-strict code.
An ECMAScript implementation may support the evaluation of function