archives

« Bugzilla Issues Index

#1456 — BestFitMatcher+ResolveLocale may produce invalid language tags


Let's suppose a BestFitMatcher algorithm returns the locale "zh-Hant-TW" for the input "und-TW-u-kn-true". Per the current specification, this will give the following result:

> js> new Intl.Collator("und-TW-u-kn-true",{localeMatcher: "best fit"}).resolvedOptions().locale
> "zh-Hant--u-kn-trueTW"


This is due to the fact that BestFitMatcher is specified to store the "index of the first Unicode locale extension sequence within the request locale language tag". For "und-TW-u-kn-true", the index is 8 (or 6?). Later in ResolveLocale, this index position is used to splice the supportedExtension variable into the found locale. But since the found locale and the originally requested locale can have a different structure/length, this may produce invalid language tags.


André

Do you have a recommendation for fixing this?


To be honest I don't think that step (step 16 in rev10) makes any sense (*).

I'd just replace it with:
---
16. If the number of code unit elements in supportedExtension is greater than 2, then
a. Let foundLocale be the String value produced by concatenating foundLocale and supportedExtension.
---


(*) Maybe the preExtension and postExtension variables were intended to preserve other extension or private use subtags in `foundLocale` ?


(In reply to André Bargull from comment #2)
> To be honest I don't think that step (step 16 in rev10) makes any sense (*).
>
> I'd just replace it with:
> ---
> 16. If the number of code unit elements in supportedExtension is greater
> than 2, then
> a. Let foundLocale be the String value produced by concatenating
> foundLocale and supportedExtension.
> ---
>
>
> (*) Maybe the preExtension and postExtension variables were intended to
> preserve other extension or private use subtags in `foundLocale` ?


@Norbert

Can you provide any additional insight?


The point of step 12 in the 1.0 spec was to insert the supported part of the Unicode extension into a language tag that might have had other extensions or private use parts after the Unicode extension, e.g. if the original tag was zh-TW-u-kn-true-x-special and the implementation happened to support zh-u-kn-true-x-special.

You're right that the extensionIndex returned by BestFitMatcher must indicate the position where the Unicode locale extension would go in the locale returned, which may have a different structure than the successful requested locale.


https://github.com/tc39/ecma402/pull/74

thanks André for the fix.