The concrete syntax is provided by the user in files of type `con'.
Since EBNF constructs are allowed in these files, they are first translated
into their strict BNF equivalents before being processed by Maptool
(see Using extended BNF to describe more complex rules). The abstract syntax is extracted from the
The remainder of this section will discuss how Maptool deduces the correspondence between the two syntaxes, the use of `.map' files to influence the mapping process, and some usage hints.
File of type `.map' can be provided by the user to influence the way in which certain rules are matched. The syntax of `.map' files can be found with other grammar description towards the end of this document (see Grammars for the Specification Files).
There are currently three ways in which the mapping can be affected. The first are symbolic equivalence classes, which group together symbols that have the same semantic meaning. The second method is to map specific rules. Using this method, concrete rules can be rewritten and/or reordered to match a specific abstract rule. The third method controls the elimination of literal chain rules.
Symbolic equivalence classes are used to group together symbols
appearing in the concrete syntax because the semantics of the symbols
are equivalent. As a result, a single symbol can be used to represent
all of the members of the symbolic equivalence class in the abstract
syntax. This representative symbol can either be one of the concrete
symbols or a new symbol altogether. Symbolic equivalence classes are
specified in files of type `map'. A series of symbolic
equivalences must be preceded by the keyword
MAPSYM Expr ::= Primary Factor .
Application of symbolic equivalence classes to rules in the concrete syntax is done before the matching process begins. Symbolic equivalence classes can only be created for symbols which are either all nonterminals or all terminals (see How to describe a context-free grammar). An error message will also be issued if a symbolic equivalence class specification includes abstract syntax symbols on the right hand side, since each abstract syntax symbol represents its own equivalence class.
For backward compatibility with previous releases of Eli, symbolic equivalence classes may also be specified in files of type `sym'.
Rule mapping allows users to rewrite a concrete rule for the purposes of matching it to a specific abstract rule. This is useful in cases where two syntactically different constructs are semantically equivalent. Consider the following expression language with bound identifiers:
Computation: LetExpr / WhereExpr . LetExpr: 'let' Definitions 'in' Expr . WhereExpr: Expr 'where' Definitions .
In this example,
MAPSYM BoundExpr ::= LetExpr WhereExpr .
The abstract rule that we can use to represent the two constructs is:
RULE: BoundExpr ::= Definitions Expr END;
Finally, we must use rule mapping specifications to rewrite the two concrete rules to match the abstract rule:
MAPRULE LetExpr: 'let' Definitions 'in' Expr < $1 $2 > . WhereExpr: Expr 'where' Definitions < $2 $1 > .
When rule matching proceeds, the concrete rule is seen in its rewritten form. An abstract syntax rule must exist in a `.lido' specification that corresponds to the rule mapping specification given. Note that the use of rule mapping makes it impossible to perform attribute computations during tree construction (see Constraints on grammar mapping).
The mapping process normally does not include literal chain rules in the
complete abstract syntax unless they appear in the user-supplied
abstract syntax (see Complete generated concrete and abstract syntaxes). Sometimes it is desirable to
preserve literal chain rules even if the user has not included them in
the abstract syntax. To force literal chain rules to be included in the
abstract syntax, use the
Care should be taken when using
Maptool begins by matching any
While the most obvious benefit to having Maptool deduce syntax fragments from one syntax and place them in the other is to reduce the amount of typing required, the more important advantage is the support it gives for incremental development. It allows the user to only specify those portions of the syntax with which they are concerned at the moment.
Chain rules have different behavior than other rules during the matching process. Descriptions for three different kinds of chain rules are given here to assist in the explanations given in the remainder of this section:
Based on the above definition for normal chain rules, we define coercions
between symbols. A symbol
The intermediate nonterminal symbols that are encountered as new concrete rules are added to the set may not appear on the right hand side of other concrete rules.
If Maptool doesn't find any concrete rules to match a
RULE: Program LISTOF Declaration | Statement END;
Maptool would generate the following:
Program: LST_Program . LST_Program: LST_Program Declaration . LST_Program: LST_Program Statement . LST_Program: .
This specifies zero or more occurrences of
There is one other important thing to note about the
This is the case when the rules matching the
Program: Program Declaration . Program: Program Statement . Program: .
As you can see, the root of the
Users should be aware that it is possible for the addition of this chain
rule to cause LALR(1) conflicts for the parsability of the concrete syntax
that do not appear in the absence of the
For example, consider the following abstract rules:
RULE: Declaration ::= IdDef Type END; RULE: IdDef ::= Identifier END;
The following concrete rule will match the first of the above abstract
rules, because of the coercion defined between
Declaration: Identifier Type .
The reason for doing this is to distinguish semantically between
It is possible for Maptool to detect multiple possible matching abstract rules for a single concrete rule. Maptool signals an error in this case that must be fixed by changing the grammar to disambiguate the contexts.
After rule matching is complete, unmatched concrete rules, except trivial chain rules and literal chain rules (see Chain rule definitions) are added to the abstract syntax. The reason for this is that trivial chain rules are meaningless in the abstract syntax and literal chain rules are only meaningful if they have attribute computations associated with them, in which case they would already have been specified as part of the abstract syntax.
Sometimes it is desirable to include literal chain rules in the abstract
syntax even when the user has not explicitly included them there. A
typical situation where this occurs is when generating output conforming
to the concrete syntax using the Idem tool (see Abstract Syntax Tree Unparsing). In this situation the output must contain all
literals hence the literal chain rules must be in the abstract syntax so
that Idem can generate output patterns for them. To preserve the
literal chain rules in the abstract syntax use the
Unmatched abstract rules are included in the concrete syntax except in the following instances:
Users can use the
The generation of the parsing grammar (the input to the parser) may be
In order to deal with this, Maptool must sometimes inject generated
chain rules into the parsing grammar to which tree building actions can
be attached. These injected chain rules may cause the parsing grammar
to exhibit LALR(1) conflicts. If so, an error will be reported to
indicate that the
In trying to resolve such a conflict, it is useful to use the
This section begins by describing typical patterns of syntax development. This is followed by two more specific examples of how to use the mapping techniques described in the previous sections.
When developing a translator for an existing language, the complete concrete syntax is typically already available. In these cases, it is advantageous to start with the complete concrete syntax and add symbolic equivalences and rule mapping specifications to suit the attribute computations as they are being developed.
On the other hand, when designing a new language, it is easier to start work by specifying attribute computations and adding concrete syntax rules as necessary to resolve issues of precedence, associativity, and other parsing ambiguities.
When errors relating to the syntax appear, it is strongly recommended that
the first course of action be to look at the complete generated versions of the
syntaxes by using the
The LIGA attribute grammar system allows users to specify that the first pass of computations are to be performed as the abstract syntax tree is being built. This is specified either by an option given in a LIGA control specification see Order Options of LIGA Control Language or by using an additional keyword in an attribute grammar computation see Computations of LIDO - Reference Manual.
Combining computations with tree construction, however, requires that the tree be constructed in strict left-to-right and bottom-to-top order. In the presence of more advanced grammar mappings, it is not possible to maintain this strict ordering. For this reason, Maptool generates the LIGA control directive:
ORDER: TREE COMPLETE ;
when it detects that one of these grammar mappings is required. The control directive indicates that the tree should be constructed completely before any computations take place.
The grammar mappings which cause Maptool to emit these directives are the use of chain rules in the abstract syntax that do not exist in the concrete syntax (see Matching remaining rules) and any use of rule mapping (see Specifying rule mappings). Aside from symbolic mappings (see Specifying symbolic equivalence classes) and the use of LISTOF constructs, the generated concrete and abstract syntaxes need to be identical in order to allow computations to take place during tree construction.
Literal terminals often distinguish phrases whose structures are identical
except for the particular literal terminal.
For example, in a normal arithmetic expression the phrase describing
addition and the phrase describing subtraction are identical except for
When phrases have identical structure except for one or more literals, the tree computations carried out at the nodes corresponding to those phrases are often identical except for some parameter that depends on the particular literal. It is then useful to abstract from the distinct literals, obtaining a single phrase with which to associate the computation and a set of phrases with which to associate the parameter evaluation. The key point here is that in many cases the computation will apply to a wide variety of translation problems, whereas the particular set of literals characterizes a single translation problem. By abstracting from the distinct literals, the computation can be reused.
To abstract from a specific literal, simply replace that literal with a nonterminal and add a production that derives the literal from that nonterminal. This added production represents the phrase with which the parameter evaluation would be associated. The computation for the phrase in which the literal was replaced by the nonterminal will now obtain the parameter value from the corresponding child, rather than evaluating it locally.
For an example of abstraction use, see Mapping expressions for overload resolution.
It is quite common for a single operator to have different meanings that
depend on the types of its operands.
For example, in Pascal the operator
In order to reuse the tree computation to resolve overloading,
abstract from the particular set of literals
that represent the operators of the language.
Then define equivalence classes in which every nonterminal representing an
expression is replaced by
Expr: Expr Binop Expr. Expr: Unop Expr. Expr: Identifier. Expr: Integer. ...
As an example of the process, consider a language with integer and Boolean expressions in the style of Pascal.
The literals that represent operators in this language are
Addop: '+' / '-' / 'or' . Mulop: '*' / '/' / 'div' / 'mod' / 'and' . Sign: '+' / '-' . Notop: 'not' .
These productions abstract from the literals, and embody the information about the precedence and association (all operators are left-associative) needed to determine the phrase structure.
Using these new nonterminals, define the phrase structure of an expression:
SimpleExpression: Sign Sum / Sum . Sum: Sum Addop Term / Term . Term: Term Mulop Factor / Factor . Factor: Notop Factor / Primary . Primary: Integer / Id / 'true' / 'false' / '(' SimpleExpression ')' .
All of the dyadic operators fall into the same equivalence class, which
should be represented by the symbol
MAPSYM Binop ::= Addop Mulop . Unop ::= Sign Notop . Expr ::= SimpleExpression Sum Term Factor Primary .