Stage 1 Draft / May 20, 2026

Default Behaviours for some Intl APIs

Introduction

This proposal seeks to modify the ECMA-402 specification.

See the explainer for more information.

6 Identification of Locales, Currencies, Time Zones, Measurement Units, Numbering Systems, Collations, and Calendars

6.2.4 IsRootLocale ( locale )

The abstract operation IsRootLocale takes argument locale (a well-formed language tag String) and returns a Boolean. It determines whether locale represents the "und" locale. It performs the following steps when called:

  1. Assert: IsWellFormedLanguageTag(locale) is true.
  2. Let lowerLocale be the ASCII-lowercase of locale.
  3. Let baseName be GetLocaleBaseName(lowerLocale).
  4. Let language be GetLocaleLanguage(baseName).
  5. If language is "und", return true; else return false.

10 Collator Objects

10.3.3.2 CompareStrings ( collator, x, y )

The implementation-defined abstract operation CompareStrings takes arguments collator (an Intl.Collator), x (a String), and y (a String) and returns a Number, but not NaN. The returned Number represents the result of an implementation-defined locale-sensitive String comparison of x with y. The result is intended to correspond with a sort order of String values according to the effective locale and collation options of collator, and will be negative when x is ordered before y, positive when x is ordered after y, and zero in all other cases (representing no relative ordering between x and y). String values must be interpreted as UTF-16 code unit sequences as described in ECMA-262, 6.1.4, and a surrogate pair (a code unit in the range 0xD800 to 0xDBFF followed by a code unit in the range 0xDC00 to 0xDFFF) within a string must be interpreted as the corresponding code point.

Behaviour as described below depends upon locale-sensitive identification of the sequence of collation elements for a string, in particular "base letters", and different base letters always compare as unequal (causing the strings containing them to also compare as unequal). Results of comparing variations of the same base letter with different case, diacritic marks, or potentially other aspects further depends upon collator.[[Sensitivity]] as follows:

Table 1: Effects of Collator Sensitivity
[[Sensitivity]] Description "a" vs. "á" "a" vs. "A"
"base" Characters with the same base letter do not compare as unequal, regardless of differences in case and/or diacritic marks. equal equal
"accent" Characters with the same base letter compare as unequal only if they differ in accents and/or other diacritic marks, regardless of differences in case. not equal equal
"case" Characters with the same base letter compare as unequal only if they differ in case, regardless of differences in accents and/or other diacritic marks. equal not equal
"variant" Characters with the same base letter compare as unequal if they differ in case, diacritic marks, and/or potentially other differences. not equal not equal
Note 1
The mapping from input code points to base letters can include arbitrary contractions, expansions, and collisions, including those that apply special treatment to certain characters with diacritic marks. For example, in Swedish, "ö" is a base letter that differs from "o", and "v" and "w" are considered to be the same base letter. In Slovak, "ch" is a single base letter, and in English, "æ" is a sequence of base letters starting with "a" and ending with "e".

If collator.[[IgnorePunctuation]] is true, then punctuation is ignored (e.g., strings that differ only in punctuation compare as equal).

For the interpretation of options settable through locale extension keys, see Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions.

The actual return values are implementation-defined to permit encoding additional information in them, but this operation for any given collator, when considered as a function of x and y, is required to be a consistent comparator defining a total ordering on the set of all Strings. This operation is also required to recognize and honour canonical equivalence according to the Unicode Standard, including returning +0𝔽 when comparing distinguishable Strings that are canonically equivalent.

If IsRootLocale(collator.[[Locale]]) is true, the CompareStrings abstract operation must be implemented following Unicode Technical Standard #10: Unicode Collation Algorithm, using only tailorings for the effective collation options of collator as provided by the Common Locale Data Repository (available at https://cldr.unicode.org/), and no locale-specific tailorings.

Note 2
It is recommended that the CompareStrings abstract operation be implemented following Unicode Technical Standard #10: Unicode Collation Algorithm, using tailorings for the effective locale and collation options of collator. It is recommended that implementations use the tailorings provided by the Common Locale Data Repository (available at https://cldr.unicode.org/).
Note 3
Applications should not assume that the behaviour of the CompareStrings abstract operation for Collator instances with the same resolved options will remain the same for different versions of the same implementation.

19 Segmenter Objects

19.8.1 FindBoundary ( segmenter, string, startIndex, direction )

The abstract operation FindBoundary takes arguments segmenter (an Intl.Segmenter), string (a String), startIndex (a non-negative integer), and direction (before or after) and returns a non-negative integer. It finds a segmentation boundary between two code units in string in the specified direction from the code unit at index startIndex according to the locale and options of segmenter and returns the immediately following code unit index. It performs the following steps when called:

  1. Let len be the length of string.
  2. Assert: startIndex < len.
  3. Let locale be segmenter.[[Locale]].
  4. Let granularity be segmenter.[[SegmenterGranularity]].
  5. If direction is before, then
    1. Search string for the last segmentation boundary that is preceded by at most startIndex code units from the beginning, using locale locale and text element granularity granularity.
    2. If a boundary is found, return the count of code units in string preceding it.
    3. Return 0.
  6. Assert: direction is after.
  7. Search string for the first segmentation boundary that follows the code unit at index startIndex, using locale locale and text element granularity granularity.
  8. If a boundary is found, return the count of code units in string preceding it.
  9. Return len.

TODO: If IsRootLocale(segmenter.[[Locale]]) is true, ...

Note
Boundary determination is implementation-dependent, but general default algorithms are specified in Unicode Standard Annex #29. It is recommended that implementations use locale-sensitive tailorings such as those provided by the Common Locale Data Repository (available at https://cldr.unicode.org).

20 Locale Sensitive Functions of the ECMAScript Language Specification

20.1.2.1 TransformCase ( S, locales, targetCase )

The abstract operation TransformCase takes arguments S (a String), locales (an ECMAScript language value), and targetCase (lower or upper). It interprets S as a sequence of UTF-16 encoded code points, as described in ECMA-262, 6.1.4, and returns the result of ILD transformation into targetCase as a new String value. It performs the following steps when called:

  1. Let requestedLocales be ? CanonicalizeLocaleList(locales).
  2. If requestedLocales is not an empty List, then
    1. Let requestedLocale be requestedLocales[0].
  3. Else,
    1. Let requestedLocale be DefaultLocale().
  4. Let availableLocales be an Available Locales List which includes the language tags for which the Unicode Character Database contains language-sensitive case mappings. If the implementation supports additional locale-sensitive case mappings, availableLocales should also include their corresponding language tags.
  5. Let match be LookupMatchingLocaleByPrefix(availableLocales, « requestedLocale »).
  6. If match is not undefined, let locale be match.[[locale]]; else let locale be "und".
  7. Let codePoints be StringToCodePoints(S).
  8. If targetCase is lower, then
    1. LetIf IsRootLocale(locale) is true, let newCodePoints be a List whose elements are the result of a lowercase transformation of codePoints according to the Unicode Default Case Conversion algorithm; else let newCodePoints be a List whose elements are the result of a lowercase transformation of codePoints according to an implementation-derived algorithm using locale or the Unicode Default Case Conversion algorithm.
  9. Else,
    1. Assert: targetCase is upper.
    2. LetIf IsRootLocale(locale) is true, let newCodePoints be a List whose elements are the result of a uppercase transformation of codePoints according to the Unicode Default Case Conversion algorithm; else let newCodePoints be a List whose elements are the result of an uppercase transformation of codePoints according to an implementation-derived algorithm using locale or the Unicode Default Case Conversion algorithm.
  10. Return CodePointsToString(newCodePoints).

Code point mappings may be derived according to a tailored version of the Default Case Conversion Algorithms of the Unicode Standard. Implementations may use locale-sensitive tailoring defined in the file SpecialCasing.txt of the Unicode Character Database and/or CLDR and/or any other custom tailoring. Regardless of tailoring, a conforming implementation's case transformation algorithm must always yield the same result given the same input code points, locale, and target case.

Note
The case mapping of some code points may produce multiple code points, and therefore the result may not be the same length as the input. Because both toLocaleUpperCase and toLocaleLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toLocaleUpperCase().toLocaleLowerCase() is not necessarily equal to s.toLocaleLowerCase() and s.toLocaleLowerCase().toLocaleUpperCase() is not necessarily equal to s.toLocaleUpperCase().

Copyright & Software License

Software License

All Software contained in this document ("Software") is protected by copyright and is being made available under the "BSD License", included below. This Software may be subject to third party rights (rights from parties other than Ecma International), including patent rights, and no licenses under such third party rights are granted under this license even if the third party concerned is a member of Ecma International. SEE THE ECMA CODE OF CONDUCT IN PATENT MATTERS AVAILABLE AT https://ecma-international.org/memento/codeofconduct.htm FOR INFORMATION REGARDING THE LICENSING OF PATENT CLAIMS THAT ARE REQUIRED TO IMPLEMENT ECMA INTERNATIONAL STANDARDS.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the authors nor Ecma International may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE ECMA INTERNATIONAL "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ECMA INTERNATIONAL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.