Title: Compilation of Imperative, Functional, Logical and Object Oriented Languages
1Chapter 5
- Compilation of Imperative, Functional, Logical
and Object Oriented Languages
2Compilation of Imperative, Functional, Logical
and Object Oriented Languages
- Compilation of imperative, functional, logical
and object-oriented languages differ in nature. - We will explore all four above categories as
- Compilation of Imperative Languages
- The P machine Architecture
- Compilation of Functional Languages
- Compilation of Logic Programming Languages
- Compilation of Object Oriented Languages
3Compilation of Imperative Languages
- Imperative programming language possess the
following constructs and concepts which be mapped
onto the constructs, concepts and instruction
sequences of abstract or real computer - Variable are containers for data objects
whose contents (values) may changed during the
execution of the program. The values are changed
by the execution of statements such as
assignments. - Expression are terms formed from
constants, names and operators which are
evaluated during execution. - Explicit specification of the control
flow. The branch instruction goto, which
exists in most imperative programming languages,
can be directly compiled into the unconditional
branch instruction of the target machine.
4The P machine Architecture
-
- The (abstract) P machine was developed to make
the Zurich implementation of Pascal portable. - Anyone wishing to implement Pascal on a real
computer had only to write an interpreter for
the instructions of this abstract machine. - The Pascal compiler, written in Pascal and
compiled into P-code, could then be run on the
real computer. - P machine can be implemented by using
Stack, memory etc.
5Compilation of Functional Languages
Functional programming
languages originated with LISP. LaMa is also
known as functional programming
language. Imperative languages have (at
least) two worlds, the world of expressions and
the world of statements. Expression provide
values statements alter the state of variables
or determine the flow of control. Functiona
l languages only contain expression and the
execution of a functional program involves the
evaluation of the associated program expression
which defines the program result. Its
evaluation may also involve the evaluation of
many other expression, if an expression calls
other expressions via a function application
however there is no explicit statement-defined
control flow.
6Compilation of Functional Languages Contd.
A variable in a functional program identifies
an expression unlike in imperative languages,
it does not identify one or more storage
locations. Its value cannot change as a
result of the execution of the program the only
possibility is the reduction of the expression
it identifies to its value. The MaMa
machine Architecture was introduced to compile
LaMa language.
7Compilation of Logic Programming Languages
Three different terminologies are used in
discussions of logic programs. When
programming is involved, we speak of procedures,
alternatives of procedures, calls, variables,
and so on. When explaining the logical
foundations, we use words such as variable,
function and predicate symbols, terms, atomic
formulae, and so on. Finally, term such as
literal, Horn clause, unification and resolution
comes from the mechanization of logic in
automated theorem-proving procedures. The
WiM machine architecture was introduced to
compile ProLog language.
8Compilation of Object Oriented Languages
Software systems are becoming increasingly
complex and large. Thus, there is a
growing need to make the development of such
systems more efficient and more
transparent. The ultimate objective is to
construct software systems, like present-day
hardware systems (e.g. cars, washing machines
etc) from ready-made standard building blocks,
Attempts to progress towards this objective
cover the following areas (among
other) Modularization
Reusability of
modules Extensibility of
modules Abstraction
Object oriented languages afford new
possibilities in these areas. Thus, object
orientation is viewed as an important paradigm in
relation to management of the complexity of
software systems.
9The Structure of Compilers
Compilers for high-level programming
languages are large, complex software
systems. the development of large software
systems should always begin with the
decomposition of the overall system into
subsystems (modules) with a well-defined and
understood functionality. The division
used should also involve sensible interfaces
between the modules. The compiler
structure described in what follows is a
conceptual structure, that is, it identifies
the subtasks of the compilation of a source
language into a target language and specifies
possible interfaces between the modules
implementing these subtasks. The real
module structure of the compiler will be derived
from this conceptual structure latter.
10The Structure of Compilers Contd.
The first coarse structuring of the
compilation process is the division into an
analysis phase and a synthesis
phase. In the analysis phase the
syntactic structure and some of the semantic
properties of the source program are
computed. The semantic properties that can
be computed by a compiler are called the
static semantics. This includes all
semantic information that can be determined
solely from the program in question, without
executing it with the input data. The
results of the analysis phase comprise either
messages about syntax or semantic errors in the
program (that is, a rejection of the program) or
an appropriate representation of the syntactic
structure and the static semantic properties
of the program. This phase is (ideally)
independent of the properties of the target
language and the target machine. The
synthesis phase for a compiler takes this program
representation and converts it (possibly in
several steps) into an equivalent target program.
11Compiler Subtasks
The compilation process decomposes into a
sequence of sub- processes. Each
sub-processes receives a representation of the
program and produces a further representation
of a different type or of the same type but
with modified content. We shall now follow
the sequence of sub-processes step by step to
explain their tasks and the structure of the
program representation.
12Compiler Subtasks
The compilation process decomposes into a
sequence of sub- processes. Each
sub-processes receives a representation of the
program and produces a further representation
of a different type or of the same type but
with modified content. We shall now follow
the sequence of sub-processes step by step to
explain their tasks and the structure of the
program representation.
13Lexical Analysis
A module, usually called SCANNER, carries
out the lexical analysis of a source
program. It reads the source program in
from a file in the form of a character string
and decomposes this character string into a
sequence of lexical units of the programming
language, called SYMBOLS. Typical lexical
units include the standard representations for
object of type integer, real, char, boolean and
string, together with identifiers, comments,
punctuation symbols and single or multiple
character operators such as , lt, gt, lt, gt,
, (, ), , , and so on. The scanner can
distinguish between sequences of space characters
and/or line feeds, which only have a meaning
as separators and can subsequently be ignored,
and relevant sequences of such characters (e.g.
String). The output of the scanner, if it
does not encounter an error, is a
representation of the source program as a
sequence of symbols or encoded symbols.
14Lexical Analysis Contd.
For example, the representation is as follows
id(int) sep id(a)com id(b)sem sep (it
include NL) id(a)eq int(2) sem sep (it
include NL) id(b)eq id(a)mul id(a) add
int(1) sem sep (it include NL)
15Screening
- The task of the screener is to recognize the
following symbols in the symbol string
produces by the scanner - Symbols that have a special meaning in the
programming language, for example among the
identifiers, the reserved symbols of the language
such as , , int, float etc - Symbols that are irrelevant for the subsequent
processing and will be eliminated, for example
string of space characters and line feeds, which
have been used as separators between symbols, and
comments - Symbols that are not part of the program but
directives to the compiler (program), for example
the type of diagnosis to be performed, the type
of compilation protocol desired, and so on - In addition, the screener is often given the task
of encoding the symbols of certain classes of
symbols (such as the identifiers) in a unique way
and replacing each occurrence of a symbol by its
code
16Screening Contd.
- Thus, for example, if all the occurrences of an
identifier in a program are replaced by the same
natural number, the character-string
representation of the identifier need only be
stored once and thus the problem of having to
store identifiers of different length is
concentrated in a specialized part of the program
- In practice, the scanner and screener are usually
combined into a single procedure (which is simply
called the scanner - Conceptually, however, they should be separated,
because the task of the scanner can be
accomplished by a finite automaton, while that of
the screener must necessarily (sensibly) be
carried out by other functions - The representation is as follows
- int id(1) com id(2) sem
- id(1) eq int(2) sem
- id(2) eq id(1) mul id(1) add int(1)
17Syntax Analysis
- The syntax analysis should determine the
structure of the program over and above the
lexical structure - It knows the structure of expressions,
statements, declarations, and lists of these
constructs and attempts to recognize the
structure of a program in the given symbol string
- The corresponding module, called the PARSER,
must also be able to detect, locate, and diagnose
errors in the syntax structure, there is a wealth
of methods for syntax analysis - There are various equivalent forms of parser
output. - In our conceptual compiler structure we use the
syntax tree of the program as output
18Semantic Analysis
- The task of semantic analysis is to determine
those properties of programs, above and beyond
the (context-free) syntactic properties, that can
be computed using only the program text. - These properties are often called static
semantic properties, unlike dynamic
properties, which are properties of programs that
can only be determined when the complied program
is run. - Thus, the two terms, static and dynamic are
associated with the two times, compile time and
run time. - The static semantic properties includes
- The type correctness or incorrectness of programs
in strongly typed languages such as Pascal. A
necessary condition for type correctness is that
every identifier must be declared (implicitly or
explicitly) and there should be no double
declarations. - The existence of a consistent type assignment to
all functions of a program in (functional)
language with polymorphism. Here, a function
whose type is only partially defined, for example
using type variables, can be applied to
combinations of arguments of a different type and
essentially do the same thing.
19Semantic Analysis Contd.
- For example, for the first statement a2 one
checks whether there is a variable name on the
left side and whether the type of the right side
matches that of the left side. - These two questions are answered positively,
since a is declared as a variable and lexically,
the character string 2 is recognized as a
representation of an integer constant. - In the second statement baa1 the type of the
right side has to be computed. - This computation involves the types of the
terminal operands (all integer) and rules that
compute the type of a sum or a product from the
types of the operands. - Here, we note that the arithmetic operators in
most programming languages are overloaded, that
is, they stand for the operations they designate
over both integer and real numbers, possibly even
with different precision. - In the type computation this overloading is
eliminated. - In our example, it is established tat an integer
multiplication and an integer addition are
involved. - Thus, the result of the whole expression is of
type integer.
20Machine-independent Optimization
- This is optional phase and does not exist in all
compilers. - It does not necessarily belong to either the
analysis phase or the synthesis phase. - However, it uses information computed by semantic
analysis and, unlike the subtasks of the
synthesis phase, it is machine independent
Address Assignment (Part of Code Generator)
- The synthesis phase of the compilation begins
with storage allocation and address assignment. - This involves properties of the target machine
such as the word length, the address length, the
directly addressable units of the machine and the
existence or non-existence of instructions giving
efficient access to parts of directly accessible
units.
21Generation of the Target Program (Part of Code
Generator)
- The code generator generates the instructions of
the target program. - For this, it uses the addresses assigned in the
previous step to address variables. - However, the time efficiency of the target
program can often be increased if it is possible
to hold the values of variables and expressions
in machine registers. - Access to these is generally faster than access
to memory locations. - Since each machine has only a limited number of
such registers the code generator must use them
to the greatest advantage to store frequently
used values. - This task is called register allocation.
22Real Compiler Structures
- So far, we have considered a conceptual compiler
structure. - Its modular structure was characterized by the
following properties. - The compilation process is divided into a
sequence of sub-processes. - Each sub-process communicates with its successor
without feedback the information flows in one
direction only. - The intermediate representation of the source
program can be described by mechanisms from the
theory of formal languages, such as regular
expression, context-free grammars, attribute
grammars, and so on. - The distribution of tasks among sub-processes is
in part based on the correspondence between the
description mechanisms referred to above and
automaton models and is in part carried out
pragmatically in order to split a complex task
into two separate, more manageable subtasks.
23Real Compiler Structures contd.
- Why is it not a good real compiler structure?
- In the design of a real compiler (one that is to
be implemented), the structure is influenced by
the complexity of the subtasks, the requirements
on the compiler and the constraints of the
computer and the operating systems. - This idea of compiler generation led to the
development of other description mechanisms and
generation procedures discuss earlier shows the
subtasks of the conceptual compiler structure
that can be describe (in part) by formal
specifications and for which generation
procedures exist.