archives

« Bugzilla Issues Index

#3417 — 18.2.6.1.2: refer to the Encoding Standard’s definition of UTF-8


> A formal description and implementation of UTF-8 is given in RFC 3629.
> In UTF-8, characters are encoded using sequences of 1 to 6 octets.

Why not refer to The Encoding Standard (https://encoding.spec.whatwg.org/#utf-8) rather than RFC 3629?

Then you can replace “sequences of 1 to 6 octets” with “sequences of 1 to 4 octets” which matches table 40.


Note that the RFC has the same byte restriction, but that in particular for error handling the Encoding Standard is a better reference, since it says exactly how many U+FFFD can be produced for decoding purposes.


This is all legacy ES specification language that isn't going to change for ES6. I'll move this bug to ES7 in case somebody want to explore it in that context.

A issue is that we want to preserve the legacy behavior described by these ES algorithms (even if it is different from the Encoding standard's definition). Before changing normative references or replacing the algorithms in this section with references to other algorithms, somebody would need to verify that there were no observable differences in the results produced.