archives

« Bugzilla Issues Index

#118 — Various tests assumine U+0085 is a Unicode space separator (Zs), but it's not


U+0085 NEXT LINE is a control character, but it's not a space character. Yet various tests expect it is:

15.5.4.20-3-2 String.prototype.trim - 'S' is a string with all WhiteSpace fail
15.5.4.20-3-3 String.prototype.trim - 'S' is a string with all union of WhiteSpace and LineTerminator fail
15.5.4.20-3-4 String.prototype.trim - 'S' is a string start with union of all LineTerminator and all WhiteSpace fail
15.5.4.20-3-5 String.prototype.trim - 'S' is a string end with union of all LineTerminator and all WhiteSpace fail
15.5.4.20-3-6 String.prototype.trim - 'S' is a string start with union of all LineTerminator and all WhiteSpace and end with union of all LineTerminator and all WhiteSpace fail


According to Table 2 (Whitespace Characters), an ES5 implementation has to contend with:
Other category ―Zs
Any other Unicode ―space separator
<USP>
from the Unicode Standard 3.0.


The question then is does \u0085 (i.e., next line) fall under <USP> in Unicode 3.0? My gut reaction was also no until I saw the description under Table 3:
Only the characters in Table 3 are treated as line terminators. Other new line or line breaking characters are treated as white space but not as line terminators.

Isn't NEXT LINE considered to be a line breaking/newline character? This also seems to be the interpretation on Wikipedia (http://en.wikipedia.org/wiki/Newline):
ECMAScript[5] accepts LS and PS as line breaks, but considers U+0085 (NEL) white space, not a line break.
although I don't place much faith in Wikipedia:)


Hi David,

I was just investigating this test failure on WebKit, and thought I'd weigh in with my 2 cents. I agree that the text under table 3 is a little confusing, but to quote Allen WB from an ES 5-discuss email of July 1st, "Always follow the algorithm if there is a conflict between an algorithm and descriptive text.'. ES5 white space seems to be clearly defined by section 7.2, "White Space", as "The ECMAScript white space characters are listed in Table 2." I cannot really see how section 7.2 can be seen as anything but normative, and if the descriptive text in a different section appears to contradict this, then I think it has to be disregarded.

The presence of form-feed (U+000C) in Table 2 would also seem to be illustrative, since like U+0085 it too is a category [Cc] vertical spacing character. Form feed is considered white space in ES5, but this arises from the spec only because U+000C appears in table 2. Conversely, since U+0085 is excluded from table 2, it cannot considered whitespace in ES5.

cheers,
G.


Thanks for the clarification. I'll fix this in source control tomorrow morning, and file a spec bug to get the "Other new
line or line breaking characters are treated as white space but not as line
terminators" bit either removed or re-worded.


Changes have been to Mercurial. I'll close the bug once the website gets updated.


Website has been updated. The only usage of \u0085 in IE Test Center-based tests should now be 15.5.4.20-4-39.js which is valid, albeit not all that useful.