archives

« Bugzilla Issues Index

#1137 — U+200B must be considered a whitespace character.


Tests S15.10.2.12_A1_T1 and S15.10.2.12_A2_T1 require that U+200B is not considered a whitespace character.

Section 7.2 (White Space) of the ES5.1 spec states the following,

"ECMAScript implementations must recognize all of the white space characters defined in Unicode 3.0. Later editions of the Unicode Standard may define other white space characters. ECMAScript implementations may recognise white space characters from later editions of the Unicode Standard."

U+200B (Zero Width Space) belongs to category Cf in Unicode 4.0.1 and later. Prior to that it belonged to category Zs (Space). Hence, U+200B should be considered a whitespace character by a conformant implementation.

(Opera's Carakan is one implementation that follows the above requirement.)


Created attachment 34
treat u200B ZWSP as whitespace

As noted earlier this change fails with jsc, spidermonkey and v8.

@sof, please review this patch and let me know if it is what you envisioned.


Thanks Trevor; yes, shifting U+200B from S15*A1_T1.js to S15*A2_T1.js is what I had in mind.


4.0.1 was published in March 2004! Does anyone know if there are bugs logged against jsc, v8 and spidermonkey for this issue?

http://www.unicode.org/history/publicationdates.html


I suppose whether implementations are passing or failing a test shouldn't be a consideration when it comes to correctness.

@Brian, how has situations like this been handled in the past? Should this land and let the implementers deal this their new failure as they see fit?