archives

« Bugzilla Issues Index

#1244 — 6.2.1: Definition of Unicode Locale Extension Sequences incorrect


Section 6.2.1 defines the term “Unicode locale extension sequence” as “any substring of a language tag that starts with a separator "-" and the singleton "u" and includes the maximum sequence of following non-singleton subtags and their preceding "-" separators.”

This definition doesn't agree with RFC 5646, which makes any subtag sequence starting with the subtag "x" a private use subtag sequence, in which the singleton "u" has no predefined meaning.

The incorrect definition leads to at least one inconsistency within ECMA-402: The BestAvailableLocale operation expects a “structurally valid and canonicalized BCP 47 language tag” in the locale argument, however, the LookupMatcher operation removes all Unicode locale extension sequences from tags it passes to LookupMatcher. If the determination of Unicode locale extension sequences doesn't take private use sequences into consideration, then the tag "x-u-foo" gets reduced to "x", which is no longer well-formed.

The definition should be changed to “any substring of a language tag that is not part of a private use subtag sequence, starts with a separator "-" and the singleton "u", and includes the maximum sequence of following non-singleton subtags and their preceding "-" separators.”


Fixed in rev7


Verified in rev 10 draft.