archives

« Bugzilla Issues Index

#967 — 15.5.4.25 codePointAt usability issue


The definition of codePointAt has results:
out-of-bounds -> Undefined
normal BMP char -> the codepoint
lead surrogate of a good pair -> the codepoint
trail surrogate of a good pair -> codeunit in [0xDC00:0xDFFF] !!ambiguous
bad trail surrogate -> codeunit in [0xDC00:0xDFFF]
bad lead surrogate -> codeunit in [0xD800:0xDBFF]

Note that a well-paired trail surrogate still results in a value even though the previous codeunit "subsumed" it. So, if the caller is indexing down a string then it should take the well-paired trail surrogate value out of the sequence.

UTF16 experts can write code to check these possibilities; but for general usability lets have:
Undefined for the trail surrogate of a good pair, and
NaN for bad surrogate.

Then codePointAt would do the work for the casual user and experts can probe the string with charCodeAt (or codeUnitAt if it exists) if they really want to know the situation of bad surrogates.

========================
Unchanged, users are called upon to write code patterns like the messy....

// if the indexed position is part of a well-formed surrogate pair
// then result is either the entire code-point (for lead surrogates)
// or undefined (for trail surrogates)
// result is NaN for bad surrogates
// (result is always undefined for out-of-bounds position)

cp = str.charPointAt( pos );
if (0xDC00 <= cp && cp <= 0xDFFF) {
cu = str.charCodeAt( pos-1 );
if (0xD800 <= cu && cu <= 0xDBFF) {
cp = undefined; // trail surrogate of good pair
}
}
if (0xD800 <= cp && cp <= 0xDFFF) {
cp = NaN; // bad surrogate
}


(Typo in my example code above: for 'charPointAt' read 'codePointAt')


See discussion at
https://mail.mozilla.org/pipermail/es-discuss/2012-November/thread.html#26340


It's time to put ES6 to bed. Norbert made a good response to this proposal and nobody has further championed these changes within TC39, so at this point in time it doesn't look like we are going to make further ES6 changes in this area.

Proposals are being made for post ES6 features (see https://github.com/tc39/ecma262 ), so you may want to consider re-proposing some of the additional String functions.