#764 — Character properties

bug_id: 764
creation_ts: 2012-10-09 14:08:00 -0700
short_desc: Character properties
delta_ts: 2013-08-23 04:40:35 -0700
product: Internationalization - ECMA-402
component: Specification
version: Edition 2.0 proposals
rep_platform: All
op_sys: All
bug_status: CONFIRMED
priority: High
bug_severity: enhancement
everconfirmed: true
reporter: Nebojša Ćirić
assigned_to: Norbert
cc: ["mathias", "tomerm"]

commentid: 1899
comment_count: 0
who: Nebojša Ćirić
bug_when: 2012-10-09 14:08:28 -0700

Provide a way to:

1. Get Unicode character property for a given set of characters
2. Get characters that have specified property (e.g. script)

We should skip actual property names in the data set, to save on size.

This would be exposed in RegEx, but if not (or if it takes too long) we could expose low level API under Intl namespace.

commentid: 2991
comment_count: 1
who: Tomer Mahlin
bug_when: 2012-12-05 10:27:29 -0800

I believe this is the right place to provide a support for following items:

1. Verification if specific character is strong RTL or strong LTR

2. Getting range of characters which belong to a given language

3. Verification if given language is RTL or LTR - this can be done by getting all characters for given language and inspecting their strong directionality property.

4. Returning natural base text direction for given language - this is the same as 3 but with different conceptual emphasis

5. Identification of Unicode scripts, blocks, character properties /
categories etc. For example:
a. Unicode character properties: \p{L} or \p{Letter}
b. Unicode scripts: \p{Common} or \p{Arabic}
c. Unicode blocks: \p{InArabic}, \p{InSyriac}

commentid: 3022
comment_count: 2
who: Norbert
bug_when: 2012-12-17 16:46:15 -0800

At the 2012-12-14 internationalization meeting, Norbert was asked to write a
strawman.

archives

#764 — Character properties