The purpose of the lexical analyzer is to partition the input text, delivering a sequence of comments and basic symbols. Comments are character sequences to be ignored, while basic symbols are character sequences that correspond to terminal symbols of the grammar defining the phrase structure of the input (see Context-Free Grammars and Parsing of Syntactic Analysis).
A user must define the forms of comments and the forms of all basic symbols corresponding to non-literal terminal symbols of the grammar. Eli can deduce the form of a literal terminal symbol from the grammar specification.
The definition consists of one or more type-`gla' files. Each line of a type-`gla' file describes a set of character sequences. If a line begins with an identifier followed by a colon (:), then all of the character sequences described by the line are instances of the non-literal terminal symbol named by that identifier; otherwise they are comments.
Here is an example of a type-`gla' file:
HexInteger: $0[Xx][0-9A-Fa-f]+ $! (auxEOL) Identifier: C_IDENTIFIER
The first line of this specification uses a regular expression to define a
hexadecimal integer as a zero, followed by the letter
Since certain lexical structures are common to many languages, Eli provides
a library of definitions that can be invoked simply be giving their names.
Chapter 1 defines the usage, form and content of specifications provided by the user as type-`gla' files. Those specifications may refer to canned descriptions, which are defined in Chapter 2. Chapter 3 presents the default processing of spaces, tabs and newlines and explains how to define other strategies. The treatment and meaning of literal terminal symbols is discussed in Chapter 4, and Chapter 5 explains how a generated lexical analyzer can be made insensitive to the case of letters. Complex lexical analysis problems may require modification of the behavior of the generated module; Chapter 6 discusses the possibilities.