#206 — Locale sensitivity in toLowerCase/toUpperCase

bug_id: 206
creation_ts: 2011-09-20 11:54:00 -0700
short_desc: Locale sensitivity in toLowerCase/toUpperCase
delta_ts: 2014-07-20 17:23:30 -0700
product: Draft for 6th Edition
component: technical issue
version: Rev 18: September 5, 2013 Draft
rep_platform: All
op_sys: All
bug_status: VERIFIED
resolution: FIXED
priority: Normal
bug_severity: normal
blocked: 226
everconfirmed: true
reporter: Norbert
assigned_to: Allen Wirfs-Brock
cc: mathias

commentid: 453
comment_count: 0
who: Norbert
bug_when: 2011-09-20 11:54:10 -0700

The specification of String.prototype.toLowerCase in ES 5.1 (which is also referenced in String.prototype.toUpperCase) refers to the Unicode character database for case mappings, explicitly including "not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later".

The SpecialCasings.txt file includes not only a large number of locale-insensitive mappings, but also a few locale-sensitive mappings. In particular, the Turkish mappings for the Latin letters "I" and "i" (which map to "ı" (U+0131) and "İ" (U+0130) in Turkish) have been in the file since Unicode 2.1.8, while additional ones were added later.

The specification of String.prototype.toLocaleLowerCase in ES 5.1, however, seems to imply that String.prototype.toLowerCase should not use the locale-sensitive mappings: "This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment‘s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings."

Shouldn't the specification for String.prototype.toLowerCase explicitly exclude the locale-sensitive mappings in SpecialCasings.txt?

SpecialCasing.txt in Unicode 2.1.8:
http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt

SpecialCasing.txt in Unicode 2.1.9, which corrected the Turkish mapping for "I":
http://www.unicode.org/Public/2.1-Update4/SpecialCasing-2.txt

SpecialCasing.txt in Unicode 6.0:
http://www.unicode.org/Public/6.0.0/ucd/SpecialCasing.txt

commentid: 1298
comment_count: 1
who: Norbert
bug_when: 2012-07-12 12:06:16 -0700

As of ES6 draft July 2012, the algorithm now specifies use of the "language insensitive lower case equivalent", but the following paragraph still has no such restriction.

commentid: 5912
comment_count: 2
who: Allen Wirfs-Brock
bug_when: 2013-10-20 17:58:12 -0700

fixed in rev 20 editor's draft.

Paragrapth now reads:

The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).

commentid: 6092
comment_count: 3
who: Allen Wirfs-Brock
bug_when: 2013-10-29 09:45:41 -0700

fixed in rev20 draft, Oct. 28, 2013

commentid: 9412
comment_count: 4
who: Norbert
bug_when: 2014-07-20 17:23:30 -0700

Verified in rev 26 draft.

archives

#206 — Locale sensitivity in toLowerCase/toUpperCase