archives

« Bugzilla Issues Index

#406 — CreateHTML abstract operation needs to define escaping in attributes


The steps in B.2.2.2 which define the CreateHTML abstract operation for compatibility functions like String.prototype.anchor includes the creation of strings representing the serialized form of HTML elements with attributes.

The steps for creating the serialized form of the attribute do not take into account the escaping necessary for some characters in attributes which are implemented by at least some browsers.

In Chrome, at least the following characters are escaped: < > " '

For example, in Chrome:

"abc".anchor("123<>\"'789")

Produces:

"<a name="123&lt;&gt;&quot;&#039;789">abc</a>"

whereas the current draft spec would indicate that the output should be:

"<a name="123<>"'789">abc</a>"


Firefox, IE, and Safari all appear to conform to the current ES6 spec. and not do this escaping. Chrome's behavior, while reasonable, seems like a deviation from the previous interoperable behavior. We can bring this up with TC39 and see if there is any interest in adopting the Chrome approach. However, as these functions a generally consider legacy features with marginal utility, I don't know how much interest there will be in making any changes.


Ah, sorry - I saw the addition to Annex B, tested the only browser I had handy and noticed the difference. Sounds like a low priority web compat bug for Chrome rather than a spec bug.


*** Bug 463 has been marked as a duplicate of this bug. ***


Escaping <, > and ' (like Chrome/V8 does) is pointless and doesn’t improve security, as the result appears in a quoted HTML attribute value wrapped in double quoted anyway. It should not be specced, IMHO.

However, escaping " into &quot; (like Chrome/V8 has always done) seems like the right thing to do for security reasons. Not escaping it results in an XSS vector, e.g.:

''.link('"><script>alert("h4x")<\/script>' // XSS vector in Firefox, Opera, and IE

Escaping " into &quot; doesn’t seem to introduce any compatibility problems, as Chrome/V8 has always escaped those four characters mentioned before. Furthermore, no code that relies on this could be found by grepping the web200904 data set. http://krijnhoetmer.nl/irc-logs/whatwg/20120620#l-567

I’d say it’s fair to assume the only Web content that relies on " not getting escaped by these functions are XSS vectors.

Some more info, cross-posted from bug 463:

Firefox/Spidermonkey is going to change its behavior to escape " as &quot; for the reasons mentioned above: https://bugzilla.mozilla.org/show_bug.cgi?id=352437 Opera/Carakan will change its behavior too, as soon as other browsers change (bug DSK-369206). The IE bug is here: https://connect.microsoft.com/IE/feedback/details/752391

FWIW, http://mathias.html5.org/specs/javascript/#escapeattributevalue requires escaping the ". Tests for this behavior can be found here: http://mathias.html5.org/tests/javascript/string/

Here’s a list of the methods that have this issue:

* String.prototype.anchor(name)
* String.prototype.fontcolor(color)
* String.prototype.fontsize(size)
* String.prototype.link(href)


FWIW, here’s the bug about V8 needlessly escaping ', <, and >: http://code.google.com/p/v8/issues/detail?id=2217 A patch that removes the escaping and only leaves the " → &quot; escape is available.


(In reply to comment #4)
> Firefox/Spidermonkey is going to change its behavior to escape " as &quot; for
> the reasons mentioned above […]

Update: Firefox/Spidermonkey just landed this change. https://bugzilla.mozilla.org/show_bug.cgi?id=352437#c16


corrected in editor's draft


fixed in rev10, Sept. 27 2012 draft