archives

« Bugzilla Issues Index

#764 — Character properties


Provide a way to:

1. Get Unicode character property for a given set of characters
2. Get characters that have specified property (e.g. script)

We should skip actual property names in the data set, to save on size.

This would be exposed in RegEx, but if not (or if it takes too long) we could expose low level API under Intl namespace.


I believe this is the right place to provide a support for following items:

1. Verification if specific character is strong RTL or strong LTR

2. Getting range of characters which belong to a given language

3. Verification if given language is RTL or LTR - this can be done by getting all characters for given language and inspecting their strong directionality property.

4. Returning natural base text direction for given language - this is the same as 3 but with different conceptual emphasis

5. Identification of Unicode scripts, blocks, character properties /
categories etc. For example:
a. Unicode character properties: \p{L} or \p{Letter}
b. Unicode scripts: \p{Common} or \p{Arabic}
c. Unicode blocks: \p{InArabic}, \p{InSyriac}


At the 2012-12-14 internationalization meeting, Norbert was asked to write a
strawman.