archives

« Bugzilla Issues Index

#206 — Locale sensitivity in toLowerCase/toUpperCase


The specification of String.prototype.toLowerCase in ES 5.1 (which is also referenced in String.prototype.toUpperCase) refers to the Unicode character database for case mappings, explicitly including "not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later".

The SpecialCasings.txt file includes not only a large number of locale-insensitive mappings, but also a few locale-sensitive mappings. In particular, the Turkish mappings for the Latin letters "I" and "i" (which map to "ı" (U+0131) and "İ" (U+0130) in Turkish) have been in the file since Unicode 2.1.8, while additional ones were added later.

The specification of String.prototype.toLocaleLowerCase in ES 5.1, however, seems to imply that String.prototype.toLowerCase should not use the locale-sensitive mappings: "This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment‘s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings."

Shouldn't the specification for String.prototype.toLowerCase explicitly exclude the locale-sensitive mappings in SpecialCasings.txt?

SpecialCasing.txt in Unicode 2.1.8:
http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt

SpecialCasing.txt in Unicode 2.1.9, which corrected the Turkish mapping for "I":
http://www.unicode.org/Public/2.1-Update4/SpecialCasing-2.txt

SpecialCasing.txt in Unicode 6.0:
http://www.unicode.org/Public/6.0.0/ucd/SpecialCasing.txt


As of ES6 draft July 2012, the algorithm now specifies use of the "language insensitive lower case equivalent", but the following paragraph still has no such restriction.


fixed in rev 20 editor's draft.

Paragrapth now reads:

The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).


fixed in rev20 draft, Oct. 28, 2013


Verified in rev 26 draft.