« Bugzilla Issues Index

#407 — String.prototype.localeCompare spec inconsistent about canonical equivalence

From Markus Scherer's comments on internationalization support in ECMAScript edition 4 (applicable to all editions from ES3 to ES6 draft 8):

Edition 3 section String.prototype.localeCompare() has a very abbreviated explanation of canonical equivalence, and one part requires canonical equivalence while another only recommends it.

Proposal: To update the text for this function by largely referring to relevant sections of the Unicode Standard, without expanding the required semantics of the function. Specifically (changes with <del>deletion</del> and <ins>insertion</ins>):

In the second NOTE make the following changes: "This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the ruls of the host environment's current locale. It is <del>strongly recommended</del> <ins>required</ins> that this function treat strings that are canonically equivalent according to the Unicode standard as identical <del>(in other words, compare the strings as if they had both been converted to Normalised Form C or D first)</del>. It is also recommended that this function not honour Unicode compatibility equivalences or decompositions. <ins>For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as <a href="">UAX #15</a> Unicode Normalization Forms and <a href="">UTN #5</a> Canonical Equivalence in Applications. See also <a href="">UTS #10</a> Unicode Collation Algorithm."</ins>

Change the following paragraph: "If no language-sensitive comparison at all is available from the host environment, this function may perform a <del>bitwise</del> <ins>canonical equivalence</ins> comparison.

fixed in rev20 editor's draft

fixed in rev20 draft, Oct. 28, 2013

Checked in rev 26 draft:

In the old version, the normative text required that implementations detect canonical equivalence, but the non-normative text only recommended that.

In the new version, the normative text added a loophole for implementations that don't have a locale-sensitive implementation available, in which case they don't have to detect canonical equivalence either, but now the non-normative text requires detection of canonical equivalence.

I believe the right solution is to revert the normative text to requiring detection of canonical equivalence. Note that canonical equivalence has nothing to do with language sensitivity - it's a core feature of the Unicode Standard that makes up for the fact that compatibility with legacy character encodings required Unicode to occasionally provide two different encodings for what humans would consider the same character.

fixed again in rev27 editor's draft

made canonically equivalent comparison mandatory and eliminated all normative mentions of host environment capabilities.

Rationale: since we require an implementation to provide String.prototype.normalize is isn't really an extra burden to require this function to do canonically equivalent comparison.

fixed in rev27 draft

Very nice - thank you!

Verified in rev 28 draft.