Get Eli: Translator Construction Made Easy at
    Fast, secure and Free Open Source software downloads

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions


 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Type Analysis

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual


 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees


 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution


 o System Administration Guide

Open PDF File

Lexical Analysis

The purpose of the lexical analyzer is to partition the input text, delivering a sequence of comments and basic symbols. Comments are character sequences to be ignored, while basic symbols are character sequences that correspond to terminal symbols of the grammar defining the phrase structure of the input (see Context-Free Grammars and Parsing of Syntactic Analysis).

A user must define the forms of comments and the forms of all basic symbols corresponding to non-literal terminal symbols of the grammar. Eli can deduce the form of a literal terminal symbol from the grammar specification.

The definition consists of one or more type-`gla' files. Each line of a type-`gla' file describes a set of character sequences. If a line begins with an identifier followed by a colon (:), then all of the character sequences described by the line are instances of the non-literal terminal symbol named by that identifier; otherwise they are comments.

Here is an example of a type-`gla' file:

HexInteger:  $0[Xx][0-9A-Fa-f]+
             $!  (auxEOL)
Identifier:  C_IDENTIFIER

The first line of this specification uses a regular expression to define a hexadecimal integer as a zero, followed by the letter X (either upper or lower case) and one or more hexadecimal digits represented in the usual way. In the second line, one form of comment is defined by a regular expression and the name of a C routine. The C routine will be invoked when the regular expression has been matched. This approach allows the user to define character sequences operationally when a declarative definition is tedious or does not support appropriate error reporting.

Since certain lexical structures are common to many languages, Eli provides a library of definitions that can be invoked simply be giving their names. C_IDENTIFIER, in the third line, is such an invocation. The effect of the third line is to define the form of the basic symbol Identifier as that of an identifier in C: a letter or underscore followed by some sequence of letters, digits and underscores.

Chapter 1 defines the usage, form and content of specifications provided by the user as type-`gla' files. Those specifications may refer to canned descriptions, which are defined in Chapter 2. Chapter 3 presents the default processing of spaces, tabs and newlines and explains how to define other strategies. The treatment and meaning of literal terminal symbols is discussed in Chapter 4, and Chapter 5 explains how a generated lexical analyzer can be made insensitive to the case of letters. Complex lexical analysis problems may require modification of the behavior of the generated module; Chapter 6 discusses the possibilities.