#1456 — BestFitMatcher+ResolveLocale may produce invalid language tags

bug_id: 1456
creation_ts: 2013-04-29 06:08:00 -0700
short_desc: BestFitMatcher+ResolveLocale may produce invalid language tags
delta_ts: 2016-02-29 09:11:37 -0800
product: Internationalization - ECMA-402
component: Specification
version: Edition 3.0 Drafts
rep_platform: All
op_sys: All
bug_status: RESOLVED
resolution: FIXED
priority: Normal
bug_severity: normal
everconfirmed: true
reporter: André Bargull
assigned_to: Rick Waldron
cc: ["caridy", "edf", "waldron.rick"]

commentid: 3686
comment_count: 0
who: André Bargull
bug_when: 2013-04-29 06:08:12 -0700

Let's suppose a BestFitMatcher algorithm returns the locale "zh-Hant-TW" for the input "und-TW-u-kn-true". Per the current specification, this will give the following result:

> js> new Intl.Collator("und-TW-u-kn-true",{localeMatcher: "best fit"}).resolvedOptions().locale
> "zh-Hant--u-kn-trueTW"

This is due to the fact that BestFitMatcher is specified to store the "index of the first Unicode locale extension sequence within the request locale language tag". For "und-TW-u-kn-true", the index is 8 (or 6?). Later in ResolveLocale, this index position is used to splice the supportedExtension variable into the found locale. But since the found locale and the originally requested locale can have a different structure/length, this may produce invalid language tags.

commentid: 12921
comment_count: 1
who: Rick Waldron
bug_when: 2015-02-18 12:28:14 -0800

André

Do you have a recommendation for fixing this?

commentid: 12948
comment_count: 2
who: André Bargull
bug_when: 2015-02-18 16:16:51 -0800

To be honest I don't think that step (step 16 in rev10) makes any sense (*).

I'd just replace it with:
---
16. If the number of code unit elements in supportedExtension is greater than 2, then
a. Let foundLocale be the String value produced by concatenating foundLocale and supportedExtension.
---

(*) Maybe the preExtension and postExtension variables were intended to preserve other extension or private use subtags in `foundLocale` ?

commentid: 13186
comment_count: 3
who: Rick Waldron
bug_when: 2015-02-20 07:02:50 -0800

(In reply to André Bargull from comment #2)
> To be honest I don't think that step (step 16 in rev10) makes any sense (*).
>
> I'd just replace it with:
> ---
> 16. If the number of code unit elements in supportedExtension is greater
> than 2, then
> a. Let foundLocale be the String value produced by concatenating
> foundLocale and supportedExtension.
> ---
>
>
> (*) Maybe the preExtension and postExtension variables were intended to
> preserve other extension or private use subtags in `foundLocale` ?

@Norbert

Can you provide any additional insight?

commentid: 13340
comment_count: 4
who: Norbert
bug_when: 2015-02-24 20:02:36 -0800

The point of step 12 in the 1.0 spec was to insert the supported part of the Unicode extension into a language tag that might have had other extensions or private use parts after the Unicode extension, e.g. if the original tag was zh-TW-u-kn-true-x-special and the implementation happened to support zh-u-kn-true-x-special.

You're right that the extensionIndex returned by BestFitMatcher must indicate the position where the Unicode locale extension would go in the locale returned, which may have a different structure than the successful requested locale.

commentid: 14932
comment_count: 5
who: Caridy Patiño
bug_when: 2016-02-29 09:11:37 -0800

https://github.com/tc39/ecma402/pull/74

thanks André for the fix.

archives

#1456 — BestFitMatcher+ResolveLocale may produce invalid language tags