archives

« Bugzilla Issues Index

#524 — 6: Should use Unicode code point rather than Unicode character


Section 6, Source Text, and the rest of the specification should use the term "Unicode code point" rather than "Unicode character" or "Unicode scalar value" when referring to the general content of source text or text interpreted from UTF-16 code unit sequences.

For "character", people have different ideas what the term means, and redefining it, as ES5 did, would just add to the confusion.

"Unicode character" is not defined in the Unicode standard, as far as I can tell, but seems to be used in the sense of "code point assigned to abstract character" or possibly "designated code point". With either definition, it would exclude code points reserved for future assignment, such as characters that were added in Unicode 6.1 if your implementation was based on Unicode 5.1. Such a restriction would be a constant source of interoperability problems.

"Unicode scalar value" is defined in the Unicode standard as "Any Unicode code point except high-surrogate and low-surrogate code points." We cannot exclude surrogate code points from source code, as this would break compatibility with existing code.

"Unicode code point" and "UTF-16 code unit" are the terms we have to use most of the time.

The term "Unicode character" can be used when only assigned characters are meant, e.g., when referring to individual characters such as "comma" or "reverse solidus", or to the characters that can be used in identifiers.


fixed in rev20 editor's draft


fixed in rev20 draft, Oct. 28, 2013


Verified in rev 26 draft.