archives

« Bugzilla Issues Index

#635 — 11.1: "[Lexical goal InputElementRegExp]" circularity


In 11.1 "Primary Expressions",
under "Syntax",
in the production for PrimaryExpression,
one of the many alternatives is:
[Lexical goal InputElementRegExp] RegularExpressionLiteral

According to 5.1.6, this means that the RegularExpressionLiteral token must be lexically recognized using the goal symbol InputElementRegExp. Everywhere else in the production, the absence of such a phrase indicates that the default lexical goal symbol (i.e., InputElementDiv) is used.

But this is circular. In order to know which lexical goal symbol to use to get the next token, you need to already know whether the next token is a RegularExpressionLiteral.

Formerly, the choice of lexical goal symbol was determined by "syntactic grammar context" (specifically, whether it permitted a leading division or division-assignment operator). So, in the context of "ready to parse a PrimaryExpression", the division operators would not be permitted, and so the next token would be obtained via the InputElementRegExp goal symbol. Note that this goal symbol would be chosen based solely on the (left-)context, with no knowledge of any input to the right of the current position. I don't think there's any reason to depart from that.


corrected in editor's draft.

moved lexical goal annotation to
MemberExpression : PrimaryExpression

Also tweaked use of lexical goal annotation for in specification of TemplateStrings.


On a related note, section 7 says:
"The InputElementDiv goal symbol is the default goal symbol and is used
in those syntactic grammar contexts where a leading division (/) or
division-assignment (/=) operator is permitted."
But this is not entirely true any more. It is used in *some* of those contexts, but not all, since now InputElementQuasiTail must be used in some of those contexts.

For instance, consider the context after having processed this much input:
x = `foo${ a
Certainly, division and division-assignment operators are permitted in that context (e.g.
x = `foo${ a / 2 }`
is a valid continuation of the given prefix), but InputElementDiv should not be used, because if the continuation happens to be:
x = `foo${ a }`
InputElementDiv would recognize the '}' as RightBracePunctuator, which is not syntactically valid. Instead, InputElementQuasiTail should be used.


To address the above points, I think you should:
(a) drop the "[Lexical goal]" notation, and
(b) change the paragraph in section 7 to (something like):

If the context allows RegularExpressionLiteral, use InputElementRegExp.

If the context allows QuasiMiddle or QuasiTail, use InputElementQuasiTail.

Otherwise, use InputElementDiv.

(And you can note that the first two possibilities are [or should be] mutually exclusive.)


(In reply to comment #1)
> moved lexical goal annotation to
> MemberExpression : PrimaryExpression

This doesn't eliminate the circularity, merely relocates it. (In order to know which lexical goal symbol to use to get the next token, you'd have to already know whether the following input is a PrimaryExpression [vs a FunctionExpression or MemberExpression].)


(In reply to comment #4)
> (In reply to comment #1)
> > moved lexical goal annotation to
> > MemberExpression : PrimaryExpression
>
> This doesn't eliminate the circularity, merely relocates it. (In order to know
> which lexical goal symbol to use to get the next token, you'd have to already
> know whether the following input is a PrimaryExpression [vs a
> FunctionExpression or MemberExpression].)

I don't think there is actually a circularity problem. It's probably because we have different understandings of the meaning of alternative lexical goal symbols. Let look at your example:
x = `foo${ a / 2 }`
If the current context is
x = `foo${ a
then we must be deep into the expression grammar (we've just recognized "a" as a PrimaryExpression) and we used InputElementDiv to tokenize the next character which would have been / recognized as a DevPunctuator. The expression parse would continue as expected popping back to MultiplicativeExpression. But lets say that instead the next token was RightBracePunctuator. That token doesn't appear in the expression grammar at this position, so we have a complete Expression and the parse would pop all the way back out of Expression to the production:
TemplateLiteral : TemplateHead Expression [Lexical goal InputElementTemplateTail] TemplateSpans

(note the StringTemplate grammar has changed in various ways since the last release draft)

This production says that to proceed past the Expression we need to look at the next token using the InputElementTemplateTail lexical goal symbol. So we retokenize starting at the current accepted input point (even though we had already tokenized using InputElementDiv). This time we get the TemplateTail }` as the next token and that will successfully match within TemplateSpans.

Do you seen any issues with this interpretation. What would need to be said in 5.1.6 to make this interpretation clearer?

I like the [Lexical goal] annotation because it explicitly identifies the contexts where alternative lexical goals must be used. Previous, that was left to the reader to figure out for them selves.


(In reply to comment #5)
>
> I don't think there is actually a circularity problem.
> ...
> But lets say that instead the next token was RightBracePunctuator.
> That token doesn't appear in the expression grammar at this position,
> so we have a complete Expression

If the lookahead token isn't valid in the current context, a conventional parser would report a syntax error.

> and the parse would pop all the way back out of Expression to the
> production:
> TemplateLiteral : TemplateHead Expression [Lexical goal
> InputElementTemplateTail] TemplateSpans

(Your wording ("pop back out to a production") suggests that you're imagining an LL(k) or recursive-descent parser, which isn't appropriate to the ECMAScript grammar.)

> This production says that to proceed past the Expression we need to look at
> the next token using the InputElementTemplateTail lexical goal symbol. So we
> retokenize starting at the current accepted input point (even though we had
> already tokenized using InputElementDiv).

Ah, well, retokenizing is certainly an odd thing to expect of a parser/lexer. And unnecessary, since you could have done the correct tokenization in the first place (i.e., using InputElementTemplateTail).

> This time we get the TemplateTail }` as the next token and that will
> successfully match within TemplateSpans.
>
> Do you seen any issues with this interpretation.

In addition to the ones given above, there's the fact that the model of parsing/lexing that you have in mind is different from what's described in previous drafts/editions.

Moreover, you didn't actually address my claim of circularity. (My example wasn't there to show circularity, but to show the no-longer-correctness of a statement in clause 7.) Circularity arises when different right-hand-sides of a production begin with different (explicit or implicit)"lexical goals" (as happens for PrimaryExpression): the correct choice of lexical goal symbol depends on what's next, but you can't know what's next until you tokenize it. Here, you can't "pop back to a production" where there's only one "next" lexical goal.

(At this point, you might suggest trying them both and going with whatever's valid. But that's another complication, and another step away from a conventional parsng model.)

And it's all unnecessary, when you could simply adopt my suggestion in comment #3.

> What would need to be said in 5.1.6 to make this interpretation clearer?

Well, you'd probably have to describe your parser/lexer model in more detail. Of course, I'm not suggesting you do so, I'm suggesting that you not use that model.

> I like the [Lexical goal] annotation because it explicitly identifies the
> contexts where alternative lexical goals must be used.

I can understand the attraction, but I think it's misguided. You appear to be equating "a syntactic context" with "a point in a production", but in general, a syntactic context corresponds to many points in many productions. (Think of a state in the LR automaton.)

> Previous, that was left to the reader to figure out for them selves.

I think it would suffice to add Notes in a couple Syntax sections. E.g., in 11.1, remind the reader that in any context where RegularExpressionLiteral is allowed (and thus, where PrimaryExpression is allowed, etc), the next token must be found using InputElementRegExp as the lexical goal symbol.

In summary, I believe that the introduction of "[Lexical goal]" notations unnecesarily complicates the parsing model needed to support/use them, and thus actually makes it harder for the reader to figure out.


fixed in rev10, Sept. 27 2012 draft


(Maybe it was fixed in rev10, but it isn't fixed in rev33.)

(A)
In 12.3 "Left-Hand-Side Expressions", the production for MemberExpression
has a lexical goal of InputElementRegExp at the start of its first RHS,
and (implicitly, according to 5.1.5) a lexical goal of InputElementDiv at the start of all its other RHSs. Clearly, this is a conflict as to which lexical goal to use in this context (the start of a MemberExpression).

And in 14.4 "Generator Function Definitions", consider the production for YieldExpression. After a 'yield' keyword, what lexical goal should be used to get the next token? RHS #2 says InputElementRegExp, RHS #1 says nothing, so presumably the default InputElementDiv should be used. Again, a conflict.

(Please note that I'm not saying there's something ambiguous or unparsable about the grammar itself. I'm just saying that the "Lexical goal" annotations don't make sense.)


(B)
In 11 "ECMAScript Language: Lexical Grammar", this sentence still appears:
The InputElementDiv goal symbol is the default goal symbol and is used
in those syntactic grammar contexts where a leading division (/) or
division-assignment (/=) operator is permitted.

This sentence is not true: there are contexts permitting division and
division-assignment in which InputElementDiv should not be used and
InputElementTemplateTail *must* be used.


My proposed solution to both of the above is still comment #3 (changing "Quasi" to "Template", of course).


sold, go rid of [lexical goal] annotation and updated clause 11 language as suggest.

fixed in rev34 editor's draft


Thanks!

But mulling over the YieldExpression example, I think I've found a deeper problem, which I'll raise separately.


fixed in rev34