#325 — Unicode digits is not well-specified

bug_id: 325
creation_ts: 2012-03-28 16:57:00 -0700
short_desc: Unicode digits is not well-specified
delta_ts: 2012-10-16 15:13:39 -0700
product: Internationalization - ECMA-402
component: Specification
version: Edition 1.0 drafts
rep_platform: All
op_sys: All
bug_status: RESOLVED
resolution: FIXED
priority: Normal
bug_severity: enhancement
everconfirmed: true
reporter: Roozbeh Pournader
assigned_to: Norbert
cc: cira

commentid: 845
comment_count: 0
who: Roozbeh Pournader
bug_when: 2012-03-28 16:57:57 -0700

The spec says "... must not contain digits as specified by the Unicode Standard" in a couple of places. But it's not clear what that means. Unicode has various number-like properties, and it's not clear which one is meant. Examples of possible meanings:

Numeric_Type = Decimal (the tightest definition, only contains simple digits)

Numeric_Type = Digit | Decimal (includes dingbat, superscript, circled, parenthesized, etc digits, together with digits from historical numbering systems without a zero)

Numeric_Type = Digit | Decimal | Numeric (includes anything with a number value, from vulgar fractions to non-decimal systems to the CJK ideograph for one thousand)

For examples of characters, see:
http://www.unicode.org/Public/UNIDATA/extracted/DerivedNumericType.txt

We need to make clear what we mean here.

commentid: 875
comment_count: 1
who: Norbert
bug_when: 2012-04-29 18:48:22 -0700

The intent here is to provide a testable guarantee for the digits occurring in formatted numeric strings. Such a guarantee can only be provided for those numbering systems that strictly follow the algorithms in the spec, not for those that are implementation dependent. Most of the numbering systems that follow the spec use digits in the General Category “Number, decimal digit”, so I'm changing the spec to disallow those digits in pattern strings. The "hanidec" numbering system uses characters in the General Category "Letter, other" as digits, so it won’t be covered by the guarantee. Also, a number of currency names and codes contain digits, so currency formats can't be covered by the guarantee either.

archives

#325 — Unicode digits is not well-specified