The default behavior of an Eli-generated lexical analyzer is to treat each
ASCII character as an entity distinct from all other ASCII characters.
This behavior is inappropriate for applications that do not distinguish
upper-case letters from lower-case letters in certain contexts.
For example, a Pascal compiler ignores the case of letters in identifiers
and keywords, but distinguishes them in strings.
Thus the Pascal identifiers
identical but the strings
'mystring' are different.
Case insensitivity is reflected in the identity of character sequences.
In other words, the character sequences
myid are considered to be identical character sequences if and only
if the generated processor is insensitive to the case of letters.
Two character sequences are identical as far as the remainder of the
processor is concerned if they have the same classification and their
values are equal (see Specifications).
Since the classification and value are determined by the token processor,
it is the token processor that must implement case insensitivity.
Two conditions must be met if a processor is to be insensitive to case:
A token processor that maintains a table of character sequences
in which all letters are of one case must be available.
The specification of each case-insensitive character sequence
must invoke such a token processor.
The token processor
maintains a table of character sequences
and provides the same classification and value for
identical character sequences.
mkidn treats upper-case letters and lower-case letters as
This behavior is controlled by an exported variable,
(see Unique Identifier Management of Library Reference Manual):
dofold=0 character sequences are entered into the table as they
are specified to
mkidn; otherwise all letters in the sequence are
converted to upper case before the sequence is entered into the table.
Although the value of
dofold could be altered on the basis of
context by user-defined code, it is normally constant throughout the
To generate a processor in which
dofold=1, specify the parameter
+fold in the request
(see fold -- Make the Processor Case-Insensitive of Products and Parameters Reference Manual).
If this parameter is not specified in the request, Eli will produce a
The value set by
mkidn is the (unique) index of
the transformed character sequence in the table.
Thus if that value is used to retrieve the sequence at a later time, the
result will be the original sequence with all lower-case letters replaced
by their upper-case equivalents.
Since literal symbols are recognized exactly as they stand in the grammar,
they are case sensitive by definition.
For example, if a grammar for Pascal contains the literal symbol
'begin' then the generated processor will recognize only the
begin as an instance of that literal symbol.
This behavior could be changed by redefining the literal symbol as a
nonliteral symbol (say)
BEGIN, and providing the following
specification in a type-`gla' file:
BEGIN: $[Bb][Ee][Gg][Ii][Nn] [mkidn]
If the number of literal symbols to be treated as case-insensitive is
large, this is a very tedious and error-prone approach.
It also distorts the grammar by converting literal terminal symbols
to non-literal terminal symbols.
To solve this problem, Eli allows the user to specify a set of literal
symbols that should be placed into the table used by
their classification codes, at the time the generated lexical analyzer is
+fold parameter is also specified, all lower-case letters in
these symbols will be replaced by their upper-case equivalents before the
symbol is placed into the table.
The desired behavior is then obtained by invoking
recognizing the appropriate character sequence in the input text.
The set of literal symbols to be placed into the table is specified by
giving a sequence of regular expressions in a type-`gla' file, and
then deriving the
:kwd product from that file
(see kwd -- Recognize Specified Literals as Identifiers of Products and Parameters Reference Manual).
The regular expressions describe the form of the literal symbols in the
grammar, not the input character sequences to be recognized.
Suppose, for example, that a Pascal grammar specified all keywords as
literal symbols made up of lower-case letters:
'while' Expression 'do' Statement /
A type-`gla' file describing the form these symbols take in the
grammar would consist of the single line
If the name of that file was `PascalKey.gla' then the user could tell
Eli to initialize
mkidn's table with all of the keywords by
including the following line in a type-`specs' file:
In Pascal, keywords have the form of identifiers in the input text.
Therefore the canned description
PASCAL_IDENTIFIER suffices to
recognize both identifiers and keywords.
mkidn to obtain the classification
and value of the sequence recognized by the regular
mkidn's table has been initialized with the character
sequences for the literal keyword symbols, and their classifications,
they will be appropriately recognized.
:kwd product and the
+fold parameter are independent of
Thus, in order to make the generated lexical analyzer accept Pascal
keywords with arbitrary case the user must both provide the
specification and derive with the