The parsing technique used by SMIE does not allow tokens to behave differently in different contexts. For most programming languages, this manifests itself by precedence conflicts when converting the BNF grammar.
Sometimes, those conflicts can be worked around by expressing the grammar slightly differently. For example, for Modula-2 it might seem natural to have a BNF grammar that looks like this:
... (inst ("IF" exp "THEN" insts "ELSE" insts "END") ("CASE" exp "OF" cases "END") ...) (cases (cases "|" cases) (caselabel ":" insts) ("ELSE" insts)) ...
But this will create conflicts for "ELSE"
: on the one hand, the
IF rule implies (among many other things) that "ELSE" = "END"
;
but on the other hand, since "ELSE"
appears within cases
,
which appears left of "END"
, we also have "ELSE" > "END"
.
We can solve the conflict either by using:
... (inst ("IF" exp "THEN" insts "ELSE" insts "END") ("CASE" exp "OF" cases "END") ("CASE" exp "OF" cases "ELSE" insts "END") ...) (cases (cases "|" cases) (caselabel ":" insts)) ...
or
... (inst ("IF" exp "THEN" else "END") ("CASE" exp "OF" cases "END") ...) (else (insts "ELSE" insts)) (cases (cases "|" cases) (caselabel ":" insts) (else)) ...
Reworking the grammar to try and solve conflicts has its downsides, tho, because SMIE assumes that the grammar reflects the logical structure of the code, so it is preferable to keep the BNF closer to the intended abstract syntax tree.
Other times, after careful consideration you may conclude that those
conflicts are not serious and simply resolve them via the
resolvers argument of smie-bnf->prec2
. Usually this is
because the grammar is simply ambiguous: the conflict does not affect
the set of programs described by the grammar, but only the way those
programs are parsed. This is typically the case for separators and
associative infix operators, where you want to add a resolver like
'((assoc "|"))
. Another case where this can happen is for the
classic dangling else problem, where you will use '((assoc
"else" "then"))
. It can also happen for cases where the conflict is
real and cannot really be resolved, but it is unlikely to pose a problem
in practice.
Finally, in many cases some conflicts will remain despite all efforts to
restructure the grammar. Do not despair: while the parser cannot be
made more clever, you can make the lexer as smart as you want. So, the
solution is then to look at the tokens involved in the conflict and to
split one of those tokens into 2 (or more) different tokens. E.g., if
the grammar needs to distinguish between two incompatible uses of the
token "begin"
, make the lexer return different tokens (say
"begin-fun"
and "begin-plain"
) depending on which kind of
"begin"
it finds. This pushes the work of distinguishing the
different cases to the lexer, which will thus have to look at the
surrounding text to find ad-hoc clues.