#407 — String.prototype.localeCompare spec inconsistent about canonical equivalence

bug_id: 407
creation_ts: 2012-06-22 15:20:00 -0700
short_desc: String.prototype.localeCompare spec inconsistent about canonical equivalence
delta_ts: 2014-12-01 20:22:54 -0800
product: Draft for 6th Edition
component: technical issue
version: Rev 8: June 15, 2012 Draft
rep_platform: All
op_sys: All
bug_status: VERIFIED
resolution: FIXED
priority: Normal
bug_severity: normal
everconfirmed: true
reporter: Norbert
assigned_to: Allen Wirfs-Brock

commentid: 1045
comment_count: 0
who: Norbert
bug_when: 2012-06-22 15:20:12 -0700

From Markus Scherer's comments on internationalization support in ECMAScript edition 4 (applicable to all editions from ES3 to ES6 draft 8):

Edition 3 section 15.5.4.9 String.prototype.localeCompare() has a very abbreviated explanation of canonical equivalence, and one part requires canonical equivalence while another only recommends it.

Proposal: To update the text for this function by largely referring to relevant sections of the Unicode Standard, without expanding the required semantics of the function. Specifically (changes with <del>deletion</del> and <ins>insertion</ins>):

In the second NOTE make the following changes: "This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the ruls of the host environment's current locale. It is <del>strongly recommended</del> <ins>required</ins> that this function treat strings that are canonically equivalent according to the Unicode standard as identical <del>(in other words, compare the strings as if they had both been converted to Normalised Form C or D first)</del>. It is also recommended that this function not honour Unicode compatibility equivalences or decompositions. <ins>For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as <a href="http://www.unicode.org/reports/tr15/">UAX #15</a> Unicode Normalization Forms and <a href="http://www.unicode.org/notes/tn5/">UTN #5</a> Canonical Equivalence in Applications. See also <a href="http://www.unicode.org/reports/tr10/">UTS #10</a> Unicode Collation Algorithm."</ins>

Change the following paragraph: "If no language-sensitive comparison at all is available from the host environment, this function may perform a <del>bitwise</del> <ins>canonical equivalence</ins> comparison.

https://sites.google.com/site/markusicu/unicode/es/i18n-2003#Comparisons

commentid: 5910
comment_count: 1
who: Allen Wirfs-Brock
bug_when: 2013-10-18 15:49:53 -0700

fixed in rev20 editor's draft

commentid: 6109
comment_count: 2
who: Allen Wirfs-Brock
bug_when: 2013-10-29 09:45:52 -0700

fixed in rev20 draft, Oct. 28, 2013

commentid: 9433
comment_count: 3
who: Norbert
bug_when: 2014-07-20 22:32:47 -0700

Checked in rev 26 draft:

In the old version, the normative text required that implementations detect canonical equivalence, but the non-normative text only recommended that.

In the new version, the normative text added a loophole for implementations that don't have a locale-sensitive implementation available, in which case they don't have to detect canonical equivalence either, but now the non-normative text requires detection of canonical equivalence.

I believe the right solution is to revert the normative text to requiring detection of canonical equivalence. Note that canonical equivalence has nothing to do with language sensitivity - it's a core feature of the Unicode Standard that makes up for the fact that compatibility with legacy character encodings required Unicode to occasionally provide two different encodings for what humans would consider the same character.

commentid: 9723
comment_count: 4
who: Allen Wirfs-Brock
bug_when: 2014-08-07 16:27:02 -0700

fixed again in rev27 editor's draft

made canonically equivalent comparison mandatory and eliminated all normative mentions of host environment capabilities.

Rationale: since we require an implementation to provide String.prototype.normalize is isn't really an extra burden to require this function to do canonically equivalent comparison.

commentid: 9920
comment_count: 5
who: Allen Wirfs-Brock
bug_when: 2014-08-25 08:29:25 -0700

fixed in rev27 draft

commentid: 10695
comment_count: 6
who: Norbert
bug_when: 2014-12-01 20:22:54 -0800

Very nice - thank you!

Verified in rev 28 draft.

archives

#407 — String.prototype.localeCompare spec inconsistent about canonical equivalence