archives

« Bugzilla Issues Index

#2368 — 21.2.3.3.2: Pattern interpretation doesn't account for code unit patterns


Step 8 if RegExpInitialise starts with "Parse P interpreted as UTF-16 encoded Unicode code points using the grammars in 21.2.1." This fails to mention that, as explained in 21.2.2, patterns can be interpreted as either code unit or code point ("BMP" or "Unicode" - see bug 2367) patterns. It should mention this distinction as well as the necessary conversion to a List of SourceCharacter values, different for code unit and code point patterns.


fixed in rev23 editor's draft.

Made things a bit more explicit WRT to these points.


fixed in rev23 draft


Looking at the rev 25 draft, I like the clean separation of code paths for BMP and Unicode. However, some of the details still need improvements:

- In step 9.a, P is not interpreted as a list of UTF-16 encoded code points, but as a list of UTF-16 code units individually interpreted as source characters.

- In step 10.b, the description of the list is hard to parse. How about "... List whose elements are the code points resulting from interpreting P as a sequence of UTF-16 encoded Unicode code points."?


fixed in rev27 editor's draft


fixed in rev27 draft


Small grammatical error resulting from the edits in step 10.b: "code points of resulting". Remove "of".


fixed in rev29 editor's draft


fixed in rev29


Verified in rev 32 draft.