archives

« Bugzilla Issues Index

#325 — Unicode digits is not well-specified


The spec says "... must not contain digits as specified by the Unicode Standard" in a couple of places. But it's not clear what that means. Unicode has various number-like properties, and it's not clear which one is meant. Examples of possible meanings:

Numeric_Type = Decimal (the tightest definition, only contains simple digits)

Numeric_Type = Digit | Decimal (includes dingbat, superscript, circled, parenthesized, etc digits, together with digits from historical numbering systems without a zero)

Numeric_Type = Digit | Decimal | Numeric (includes anything with a number value, from vulgar fractions to non-decimal systems to the CJK ideograph for one thousand)

For examples of characters, see:
http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt

We need to make clear what we mean here.


The intent here is to provide a testable guarantee for the digits occurring in formatted numeric strings. Such a guarantee can only be provided for those numbering systems that strictly follow the algorithms in the spec, not for those that are implementation dependent. Most of the numbering systems that follow the spec use digits in the General Category “Number, decimal digit”, so I'm changing the spec to disallow those digits in pattern strings. The "hanidec" numbering system uses characters in the General Category "Letter, other" as digits, so it won’t be covered by the guarantee. Also, a number of currency names and codes contain digits, so currency formats can't be covered by the guarantee either.