archives

« Bugzilla Issues Index

#2071 — Refer to the latest available Unicode version rather than using a fixed version number


From http://javascript.spec.whatwg.org/#unicode-database-version:

For optimal interoperability, JavaScript implementations should use the latest available Unicode database to determine which ECMAScript characters are allowed in Identifiers and IdentifierNames, and which ECMAScript characters are whitespace characters.


The current draft doesn't specify a fixed Unicode version; it specifies a minimum version "or successor". Specifying a minimum version means:
- that implementations must support at least that version in order to be conformant,
- that the conformance test suite can fail implementations that don't support at least that version,
- that applications that use only the characters of the minimum version in identifiers are guaranteed to run on all implementations of a given ECMAScript edition, which a few years after publication of that edition means all implementations that applications need to run on.

If on the other hand the spec required the latest Unicode version, then
- implementations would become non-conformant each time a new version of Unicode introduces new identifier characters, some implementations for a short time, some for a long time;
- the conformance test suite would have to be revised each time a new version of Unicode introduces new identifier character, and would then fail for implementations that previously were conformant;
- application developers would have to track which implementations support which Unicode versions, and serve different source code using different identifiers to different implementations.

I have a hard time imagining that any developer would actually serve different source code using different identifiers to different implementations - using the latest Unicode characters in identifiers just isn't important enough. So I think they're better served by the guarantee and stability provided by the current spec. Developers developing for a single ECMAScript implementation (e.g., Node or Windows 8 UI) can still take advantage of a newer Unicode version that their target environment supports.


Norbert nicely described why we require a minimum version rather than the current version.

In addition, at the Sept 2013 TC19 meeting the consensus was that the ES5 recommendation about using Unicode 3.0 identifier characters to maximize interoperability was unnecessary (even if updated to a newer version) and should be deleted. https://github.com/rwaldron/tc39-notes/blob/master/es6/2013-09/sept-17.md#57-note-in-116-wrt-unicode-versions-update-to-unicode-51

The min. Unicode 5.1.0 requirement will remain, and the interoperability recommendation is gone.


(In reply to comment #1)
> The current draft doesn't specify a fixed Unicode version; it specifies a
> minimum version "or successor".

That’s fine!

> If on the other hand the spec required the latest Unicode version, then […]

Note that I used the verb “should”, intended to be interpreted as per RFC 2119. I didn’t suggest requiring the latest Unicode version, but rather recommending it, i.e. whenever a new Unicode version is released it would make sense to update the engine’s data tables — and the spec could encourage this IMHO. That’s what this issue is about (sorry for being unclear).

(In reply to comment #2)
> The min. Unicode 5.1.0 requirement will remain, and the interoperability
> recommendation is gone.

That’s good news.

The meeting notes don’t say anything about why Unicode v5.1.0 was chosen instead of a more recent version, though. Any idea?

To clarify, I’d like to see the spec recommend a minimum Unicode version (preferably one that is more recent than v5.1.0) while also recommending engines to update their data tables to newer Unicode versions as they are released (without it being a strict requirement).


(In reply to comment #3)
> The meeting notes don’t say anything about why Unicode v5.1.0 was chosen
> instead of a more recent version, though. Any idea?

Why I chose Unicode 5.1:
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html#Unicode

As we get closer to finalizing the spec, it may be worthwhile asking Microsoft
- whether Explorer versions implementing ES 6 will rely on Windows or .NET for Unicode character properties or use their own tables for that information,
- whether Explorer versions implementing ES 6 will run on Windows 7.
It may be possible to increase the minimum Unicode version based on the answers.

> To clarify, I’d like to see the spec recommend a minimum Unicode version
> (preferably one that is more recent than v5.1.0) while also recommending
> engines to update their data tables to newer Unicode versions as they are
> released (without it being a strict requirement).

That might be a helpful recommendation for new implementations, similar to the recommendations we already provide to use the IANA time zone database and, for ECMA-402, CLDR.


(In reply to comment #4)
> As we get closer to finalizing the spec, it may be worthwhile asking Microsoft
> - whether Explorer versions implementing ES 6 will rely on Windows or .NET for
> Unicode character properties or use their own tables for that information,
> - whether Explorer versions implementing ES 6 will run on Windows 7.
> It may be possible to increase the minimum Unicode version based on the
> answers.

CCing Luke Hoban from Microsoft, who can hopefully answer that question.


Just found this thread and wanted to add some information. Windows 7 does indeed only support Unicode 5.1 (which may be the motivation for the current minimum required version). It’s safest to assume IE could continue depending on OS APIs for Unicode support, so the Unicode 5.1 minimum seems like the best choice.


It sounds to me like for this edition we should just stick with 5.1.


See also: https://github.com/tc39/ecma262/pull/300