#522 — Use Unicode character names consistently

bug_id: 522
creation_ts: 2012-07-12 12:29:00 -0700
short_desc: Use Unicode character names consistently
delta_ts: 2015-03-17 16:57:07 -0700
product: Draft for 6th Edition
component: editorial issue
version: Rev 9: July 8, 2012 Draft
rep_platform: All
op_sys: All
bug_status: RESOLVED
resolution: FIXED
priority: Normal
bug_severity: normal
everconfirmed: true
reporter: Norbert
assigned_to: Allen Wirfs-Brock
cc: mathias

commentid: 1300
comment_count: 0
who: Norbert
bug_when: 2012-07-12 12:29:47 -0700

The specification in many places references Unicode characters, sometimes using the names provided by the Unicode standard, but often using other names of unknown provenance. Sometimes, as in the Quote algorithm in 15.12.3, two different names are used for the same character (reverse solidus vs. backslash).

I'd suggest consistently using the Unicode character names, along with their code point value or UTF-16 code unit value, throughout the document.

commentid: 9119
comment_count: 1
who: Allen Wirfs-Brock
bug_when: 2014-07-01 17:06:08 -0700

I think I've have them all switch to the official Unicode names.

Fixed in rev26 editor's draft

(but we'll see if any others turn up).

commentid: 9336
comment_count: 2
who: Allen Wirfs-Brock
bug_when: 2014-07-19 17:30:14 -0700

fixed in rev26

commentid: 9629
comment_count: 3
who: Norbert
bug_when: 2014-07-27 21:22:37 -0700

Checked in rev 26 draft: Searching the document still finds one or more occurrences of:
- slash (which the Unicode Standard calls SOLIDUS)
- backslash (REVERSE SOLIDUS)
- quote (QUOTATION MARK or APOSTROPHE)
- open left bracket or opening left bracket (LEFT SQUARE BRACKET)
- closing right bracket or right bracket (RIGHT SQUARE BRACKET)
- underscore (LOW LINE)
- brace or curly brace (LEFT CURLY BRACKET or RIGHT CURLY BRACKET)
- BYTE ORDER MARK (ZERO WIDTH NO-BREAK SPACE)
- FORM FEED (FORM FEED (FF))
- LINE FEED (LINE FEED (LF))
- CARRIAGE RETURN (CARRIAGE RETURN (CR))

commentid: 10804
comment_count: 4
who: Allen Wirfs-Brock
bug_when: 2014-12-05 15:14:51 -0800

fixed again in rev29 editor's draft

commentid: 10898
comment_count: 5
who: Allen Wirfs-Brock
bug_when: 2014-12-07 14:35:06 -0800

fixed in rev29

commentid: 10930
comment_count: 6
who: Mathias Bynens
bug_when: 2014-12-08 03:46:40 -0800

In rev29, the code point value is still not consistently being used alongside the canonical symbol name.

Section 11, for example: s/SOLIDUS/U+002F SOLIDUS/

commentid: 10931
comment_count: 7
who: Mathias Bynens
bug_when: 2014-12-08 03:59:57 -0800

11.8.4: REVERSE SOLIDUS (\), CARRIAGE RETURN (CR), LINE SEPARATOR, PARAGRAPH SEPARATOR, and LINE FEED (LF).

commentid: 10938
comment_count: 8
who: Mathias Bynens
bug_when: 2014-12-08 04:59:12 -0800

13.4 LEFT CURLY BRACKET

21.2.3.1 REVERSE SOLIDUS

commentid: 10941
comment_count: 9
who: Mathias Bynens
bug_when: 2014-12-08 05:11:18 -0800

24.3.2

QUATION MARK (sic)
LEFT CURLY BRACKET
COMMA
RIGHT CURLY BRACKET
COLON
LEFT SQUARE BRACKET
RIGHT SQUARE BRACKET

commentid: 11010
comment_count: 10
who: Allen Wirfs-Brock
bug_when: 2014-12-11 10:56:15 -0800

fixed QUATIOB spelling in 24.3.2

commentid: 13749
comment_count: 11
who: Allen Wirfs-Brock
bug_when: 2015-03-16 11:59:08 -0700

fixed in rev36 editor's draft

or at least the ones listed in Comment 6 - Comment 9

commentid: 13826
comment_count: 12
who: Allen Wirfs-Brock
bug_when: 2015-03-17 16:57:07 -0700

in rev36

archives

#522 — Use Unicode character names consistently