Title: Expression Trees
1Expression Trees
Zhengwei QI Most slides from Eric Roberts, CS
106B, Stanford, and Shiuh-Sheng Yu
2Part I basic concepts
3Where Are We Heading?
- In Lab 3, your task is to implement an
interpreter for the BASIC programming language.
In doing so, you will learn a little bit about
how interpreters and compilers work that will
almost certainly prove useful as you write your
own programs.
4Where Are We Heading?
- In doing so, you will learn a little bit about
how interpreters and compilers work that will
almost certainly prove useful as you write your
own programs.
- The goal for the next two courses is to give you
a sense of how a compiler can make sense of an
arithmetic expression like
y 3 (x 1)
- To accomplish that goal, I will talk about
several ideas that initially might seem
unrelated - Tree structures
- Class hierarchies
- The recursive nature of expressions
- Taken together, these ideas will make it possible
to define data structures that represent
arithmetic expressions.
5Trees
- In the text, the first example I use to
illustrate tree structures is the royal family
tree of the House of Normandy(????(1066?-1135)
- This example is useful for defining terminology
- William I is the root of the tree.
- Adela is a child of William I and the parent of
Stephen. - Robert, William II, Adela, and Henry I are
siblings. - Henry II is a descendant of William I, Henry I,
and Matilda. - William I is an ancestor of everyone else in this
tree.
6Family Trees
- Trees of the English royal family can get very
complicated. The tree at the right, for example,
shows the royal lineage through the time of the
War of the Roses encompassing the Houses of
Plantagenet, York, and Lancaster. - A tree diagram of this sort is very helpful, for
example, if you want to understand Shakespeares
history plays.
7Family Trees
And here is my all-time favorite family tree from
J. R. R. Tolkien (The Lord of the Rings)
8Trees Are Everywhere
- But family trees are by no means the only kind.
Here, for example, is the first evolutionary
tree, which appears as a diagram in Darwins
Notebook B
Charles Darwin (around the time of Voyage of the
Beagle)
9Trees Are Everywhere
- The tree structure of evolution is even more
clearly shown in Ernst Haeckels diagram of the
tree of life from 1866. - The details of evolutionary tree diagrams have
changed markedly over the last 150 years, but
their general structurestarting with a single
root and branching to form new lineagesremains
constant.
10Representing Family Trees
- The first step in writing programs that work with
trees is to design a suitable data structure. - In diagrammatic form, the goal is to transform a
tree diagram into an internal representation that
looks like this
11A Recursive Definition for Trees
- When you think about trees with an eye toward
representing their internal structure, the
following definition is extremely useful - A tree is a pointer to a node.
- A node is a structure that contains some number
of trees, usually along with some additional
data. - Although this definition is clearly circular, it
is not necessarily infinite, either because - Tree pointers can be NULL indicating an empty
tree. - Nodes can contain an empty list of children.
- In C, programmers typically define a structure
or object type to represent a node and then use
an explicit pointer type to represent the tree.
12Recursive trees definition
- Recursively, as a data type a tree is defined as
a value (of some data type, possibly empty),
together with a list of trees (possibly an empty
list), the subtrees of its children symbolically
t v t1, ..., tk
13Mutual recursions definition
- a tree can be defined in terms of a forest (a
list of trees), where a tree consists of a value
and a forest (the subtrees of its children)
f t1, ..., tk t v f
14Type theorys definition
As an ADT, the abstract tree type T with values
of some type E is defined, using the abstract
forest type F (list of trees), by the functions
value T ? E children T ? F
nil () ? F
node E F ? T with the axioms
value(node(e, f)) e children(node(e, f))
f
15The familytree.h Interface
/ File familytree.h ------------------
This file is an interface to a simple class that
represents an individual person in a family
tree. / ifndef _familytree_h define
_familytree_h include ltstringgt include
"vector.h" / Class PersonNode
----------------- This class defines the
structure of an individual in the family tree,
which consists of a name and a vector of
children. / class PersonNode public
16The familytree.h Interface
/ File familytree.h ------------------
This file is an interface to a simple class that
represents an individual person in a family
tree. / ifndef _familytree_h define
_familytree_h include ltstringgt include
"vector.h" / Class PersonNode
----------------- This class defines the
structure of an individual in the family tree,
which consists of a name and a vector of
children. / class PersonNode public
17The familytree.h Interface
/ Constructor PersonNode Usage
PersonNode person new PersonNode(name)
-------------------------------------------------
This function constructs a new PersonNode with
the specified name. The newly constructed
entry has no children, but clients can add
children by calling the addChild method. /
PersonNode(stdstring name) / Method
getName Usage string name
person-gtgetName() ----------------------------
----------- Returns the name of the person.
/ stdstring getName() / Method
addChild Usage person-gtaddChild(child)
------------------------------- Adds child to
the end of the list of children for person, and
makes person the parent of child. / void
addChild(PersonNode child)
18The familytree.cpp Implementation
/ File familytree.cpp --------------------
This program implements the familytree.h
interface. / include ltstringgt include
"vector.h" include "familytree.h" using
namespace std PersonNodePersonNode(string
name) this-gtname name this-gtparent
NULL string PersonNodegetName() return
name void PersonNodeaddChild(PersonNode
child) children.add(child)
child-gtparent this
19The familytree.cpp Implementation
/ File familytree.cpp --------------------
This program implements the familytree.h
interface. / include ltstringgt include
"vector.h" include "familytree.h" using
namespace std PersonNodePersonNode(string
name) this-gtname name this-gtparent
NULL string PersonNodegetName() return
name void PersonNodeaddChild(PersonNode
child) children.add(child)
child-gtparent this
20Trees and Class Hierarchies
- For the most part, classes in object-oriented
languages such as C and Java are structured as
a tree. - One of the defining characteristics of the
object-oriented paradigm is that classes form
hierarchies. Any class can be designated as a
subclass of some other class, which is called its
superclass. As noted on this weeks section
handout, most class hierarchies are
tree-structured even though C permits more
complicated structures. - A class represents a specialization of its
superclass. If you create an object that is an
instance of a class, that object is also an
instance of all other classes in the hierarchy
above it in the superclass chain. - When you define a new class in C, that class
automatically inherits the behavior of its
superclass.
21Hierarchical Data And Trees
- The element at the top of the hierarchy is the
root. - Elements next in the hierarchy are the children
of the root. - Elements next in the hierarchy are the
grandchildren of the root, and so on. - Elements that have no children are leaves.
22Example Tree
23Leaves
President
VP3
VP1
VP2
Manager
Manager1
Manager2
Manager
Worker Bee
24Parent, Grandparent, Siblings, Ancestors,
Descendants
President
VP3
VP1
VP2
Manager
Manager2
Manager
Manager1
Worker Bee
25Levels
26Caution
- Some texts start level numbers at 0 rather than
at 1. - Root is at level 0.
- Its children are at level 1.
- The grand children of the root are at level 2.
- And so on.
27height depth number of levels
28Node Degree Number Of Children
3
2
1
1
0
0
1
0
0
29Tree Degree Max Node Degree
Degree of tree 3.
30- Root The top node in a tree.
- Parent The converse notion of child.
- Siblings Nodes with the same parent.
- Descendant a node reachable by repeated
proceeding from parent to child. - Ancestor a node reachable by repeated
proceeding from child to parent. - Leaf a node with no children.
- Internal node a node with at least one child.
- External node a node with no children.
- Degree number of sub trees of a node.
- Edge connection between one node to another.
- Path a sequence of nodes and edges connecting a
node with a descendant. - Level The level of a node is defined by 1 the
number of connections between the node and the
root. - Height The height of a node is the number of
edges on the longest downward path between the
root and a leaf. - Forest A forest is a set of n 0 disjoint
trees.
31Part II Compiler and Interpreter
www.cs.fsu.edu/xyuan/cop4020
32Overview
- Compiler phases
- Lexical analysis
- Syntax analysis
- Semantic analysis
- Intermediate (machine-independent) code
generation - Intermediate code optimization
- Target (machine-dependent) code generation
- Target code optimization
33Source program with macros
A typical compilation process
Preprocessor
Source program
Compiler
Target assembly program
Try g with v, -E, -S flags on linprog.
assembler
Relocatable machine code
linker
Absolute machine code
34- What is a compiler?
- A program that reads a program written in one
language (source language) and translates it into
an equivalent program in another language (target
language). - Two components
- Understand the program (make sure it is correct)
- Rewrite the program in the target language.
- Traditionally, the source language is a high
level language and the target language is a low
level language (machine code).
Target program
Source program
compiler
Error message
35Compilation Phases and Passes
- Compilation of a program proceeds through a fixed
series of phases - Each phase use an (intermediate) form of the
program produced by an earlier phase - Subsequent phases operate on lower-level code
representations - Each phase may consist of a number of passes over
the program representation - Pascal, FORTRAN, C languages designed for
one-pass compilation, which explains the need for
function prototypes - Single-pass compilers need less memory to operate
- Java and ADA are multi-pass
36Compiler Front- and Back-end
Abstract syntax tree orother intermediate form
Source program (character stream)
Scanner(lexical analysis)
Machine-Independent Code Improvement
Tokens
Parser(syntax analysis)
Modified intermediate form
Front endanalysis
Back endsynthesis
Target Code Generation
Parse tree
Semantic Analysis and Intermediate Code Generation
Assembly or object code
Machine-Specific Code Improvement
Abstract syntax tree orother intermediate form
Modified assembly or object code
37Scanner Lexical Analysis
- Lexical analysis breaks up a program into tokens
- Grouping characters into non-separatable units
(tokens) - Changing a stream to characters to a stream of
tokens
program gcd (input, output)var i, j
integerbegin read (i, j) while i ltgt j
do if i gt j then i i - j else j j -
i writeln (i)end.
program gcd ( input , output )
var i , j integer
beginread ( i , j )
whilei ltgt j do if i
gt jthen i i - j
else j i - i
writeln ( i) end .
38Scanner Lexical Analysis
- What kind of errors can be reported by lexical
analyzer? - A b _at_3
39Parser Syntax Analysis
- Checks whether the token stream meets the
grammatical specification of the language and
generates the syntax tree. - A syntax error is produced by the compiler when
the program does not meet the grammatical
specification. - For grammatically correct program, this phase
generates an internal representation that is easy
to manipulate in later phases - Typically a syntax tree (also called a parse
tree). - A grammar of a programming language is typically
described by a context free grammer, which also
defines the structure of the parse tree.
40Context-Free Grammars
- A context-free grammar defines the syntax of a
programming language - The syntax defines the syntactic categories for
language constructs - Statements
- Expressions
- Declarations
- Categories are subdivided into more detailed
categories - A Statement is a
- For-statement
- If-statement
- Assignment
ltstatementgt ltfor-statementgt ltif-statementgt
ltassignmentgtltfor-statementgt for (
ltexpressiongt ltexpressiongt ltexpressiongt )
ltstatementgtltassignmentgt ltidentifiergt
ltexpressiongt
41Example Micro Pascal
ltProgramgt program ltidgt ( ltidgt ltMore_idsgt )
ltBlockgt .ltBlockgt ltVariablesgt begin ltStmtgt
ltMore_Stmtsgt endltMore_idsgt , ltidgt
ltMore_idsgt ?ltVariablesgt var ltidgt
ltMore_idsgt ltTypegt ltMore_Variablesgt
?ltMore_Variablesgt ltidgt ltMore_idsgt ltTypegt
ltMore_Variablesgt ?ltStmtgt ltidgt
ltExpgt if ltExpgt then ltStmtgt else ltStmtgt
while ltExpgt do ltStmtgt begin ltStmtgt
ltMore_Stmtsgt endltExpgt ltnumgt ltidgt
ltExpgt ltExpgt ltExpgt - ltExpgt
42Parsing examples
- Pos init / rate 60 ? id1 id2 / id3
const ? syntax error (exp exp exp cannot be
reduced). - Pos init rate 60 ? id1 id2 id3 const ?
id1
id2
id3
60
43Semantic Analysis
- Semantic analysis is applied by a compiler to
discover the meaning of a program by analyzing
its parse tree or abstract syntax tree. - A program without grammatical errors may not
always be correct program. - pos init rate 60
- What if pos is a class while init and rate are
integers? - This kind of errors cannot be found by the parser
- Semantic analysis finds this type of error and
ensure that the program has a meaning.
44Semantic Analysis
- Static semantic checks (done by the compiler) are
performed at compile time - Type checking
- Every variable is declared before used
- Identifiers are used in appropriate contexts
- Check subroutine call arguments
- Check labels
- Dynamic semantic checks are performed at run
time, and the compiler produces code that
performs these checks - Array subscript values are within bounds
- Arithmetic errors, e.g. division by zero
- Pointers are not dereferenced unless pointing to
valid object - A variable is used but hasn't been initialized
- When a check fails at run time, an exception is
raised
45Semantic Analysis and Strong Typing
- A language is strongly typed "if (type) errors
are always detected" - Errors are either detected at compile time or at
run time - Examples of such errors are listed on previous
slide - Languages that are strongly typed are Ada, Java,
ML, Haskell - Languages that are not strongly typed are
Fortran, Pascal, C/C, Lisp - Strong typing makes language safe and easier to
use, but potentially slower because of dynamic
semantic checks - In some languages, most (type) errors are
detected late at run time which is detrimental to
reliability e.g. early Basic, Lisp, Prolog, some
script languages
46Code Generation and Intermediate Code Forms
- A typical intermediate form of code produced by
the semantic analyzer is an abstract syntax tree
(AST) - The AST is annotated with useful information such
as pointers to the symbol table entry of
identifiers
Example AST for thegcd program in Pascal
47Code Generation and Intermediate Code Forms
- Other intermediate code forms
- intermediate code is something that is both close
to the final machine code and easy to manipulate
(for optimization). One example is the
three-address code - dst op1 op op2
- The three-address code for the assignment
statement - temp1 60
- temp2 id3 temp1
- temp3 id2 temp2
- id1 temp3
- Machine-independent Intermediate code improvement
- temp1 id3 60.0
- id1 id2 temp1
48Target Code Generation and Optimization
- From the machine-independent form assembly or
object code is generated by the compiler - MOVF id3, R2
- MULF 60.0, R2
- MOVF id2, R1
- ADDF R2, R1
- MOVF R1, id1
- This machine-specific code is optimized to
exploit specific hardware features
49Summary
- Compiler front-end lexical analysis, syntax
analysis, semantic analysis - Tasks understanding the source code, making sure
the source code is written correctly - Compiler back-end Intermediate code
generation/improvement, and Machine code
generation/improvement - Tasks translating the program to a semantically
the same program (in a different language).
50Example
PowerPC Translation
9AC0
lwz
r16,0(r4)
load value from memory
add
r7,r7,r16
accumulate sum
stw
r7,0(r5)
store to memory
addic.
r5,r5,-1
decrement loop count, set cr0
beq
cr0,pc12
branch if loop exit
bl
F000
branch link to EM
4FDC
save source PC in link register
9AE4
bl
F000
branch link to EM
51C8
save source PC in link register
Emulation Manager
9C08
stw
r7,0(r6)
store last value of edx
xor
r7,r7,r7
clear edx
bl
F000
branch link to EM
6200
save source PC in link register
Branch to transfer code
Translated basic block is executed
Branch is taken to stub
Stub BAL to Emulation Mgr.
EM loads SPC from stub, using link
EM hashes SPC and does lookup
EM loads SPC from hash tbl compares
Load TPC from hash table
Continue execution
Jump indirect to next translated block
51Part III Expressions Tree
52Recursive Structure of Expressions
- In most programming languages, an expression is a
recursive structure that can contain
subexpressions. - In the model I use in Chapter 17, every
expression falls into one of the following forms - An integer constant
- A variable name that holds an integer value
- Two expressions joined by an operator
- An expression enclosed in parentheses
- This structure can be represented in the form of
a grammar.
53Parse Trees
- When the C compiler looks at an expression, it
needs to understand what the expression means by
translating it into an internal form. This
process generally consists of two steps - Lexical analysis, in which the source text is
broken up into units called tokens. - Parsing, in which the tokens are assembled into a
recursive structure called an parse tree that
embodies its structure.
54The Expression Class Hierarchy
- Because expressions have more than one form, a
C class that represents expressions can be
represented most easily by a class hierarchy in
which each of the expression types is a separate
subclass, as shown in the following diagram
55Representing Inheritance in C
- In contrast to Java, a subclass does not
automatically override the definition of a method
in its superclass. To permit such overriding,
both classes must mark the prototype for that
method with the keyword virtual.
- An abstract class is a class that doesnt
actually represent any objects but instead serves
only as a common superclass for concrete classes
that do correspond to objects. In C, methods
for an abstract class that are always implemented
by the concrete subclasses are indicated by
including 0 before the semicolon on the
prototype line.
56The exp.h Interface
/ File exp.h ----------- This
interface defines a class hierarchy for
expressions, which allows the client to
represent and manipulate simple binary
expression trees. / ifndef _exp_h define
_exp_h include ltstringgt include
"evalstate.h" / Type ExpressionType
-------------------- This enumerated type is
used to differentiate the three different
expression types CONSTANT, IDENTIFIER, and
COMPOUND. / enum ExpressionType CONSTANT,
IDENTIFIER, COMPOUND
57The exp.h Interface
/ File exp.h ----------- This
interface defines a class hierarchy for
expressions, which allows the client to
represent and manipulate simple binary
expression trees. / ifndef _exp_h define
_exp_h include ltstringgt include
"evalstate.h" / Type ExpressionType
-------------------- This enumerated type is
used to differentiate the three different
expression types CONSTANT, IDENTIFIER, and
COMPOUND. / enum ExpressionType CONSTANT,
IDENTIFIER, COMPOUND
58The exp.h Interface
/ Class Expression -----------------
This class is used to represent a node in an
expression tree. Expression is an example of
an abstract class, which defines the structure
and behavior of a set of classes but has no
objects of its own. Any object must be one of
its three concrete subclasses 1.
ConstantExp -- an integer constant 2.
IdentifierExp -- a string representing an
identifier 3. CompoundExp -- two
expressions combined by an operator The
abstract class defines an interface common to all
Expression objects each subclass provides its
own implementation of the common interface.
/ class Expression public
Expression() virtual Expression()
virtual int eval(EvalState state) 0
virtual stdstring toString() 0 virtual
ExpressionType type() 0
59The exp.h Interface
/ Class ConstantExp ------------------
This subclass represents a constant integer
expression. / class ConstantExp public
Expression public ConstantExp(int val)
virtual int eval(EvalState state) virtual
stdstring toString() virtual ExpressionType
type() int getValue() private int
value
60The exp.h Interface
/ Class IdentifierExp --------------------
This subclass represents a expression
corresponding to a variable. / class
IdentifierExp public Expression public
IdentifierExp(string name) virtual int
eval(EvalState state) virtual stdstring
toString() virtual ExpressionType type()
string getName() private stdstring
name
61The evalstate.h Interface
/ File evalstate.h -----------------
This interface exports the EvalState class, which
keeps track of information required by the
evaluator, such as the values of variables.
/ ifndef _evalstate_h define
_evalstate_h include ltstringgt include
"map.h" class EvalState public
EvalState() EvalState() void
setValue(stdstring var, int value) int
getValue(stdstring var) bool
isDefined(stdstring var) private
Mapltintgt symbolTable endif
62Code for the eval Method
int ConstantExpeval(EvalState state)
return value int IdentifierExpeval(EvalState
state) if (!state.isDefined(name))
error(name " is undefined") return
state.getValue(name) int CompoundExpeval(Eva
lState state) if (op "") if
(lhs-gtgetType() ! IDENTIFIER) error("Illegal lhs
in assignment") int val
rhs-gteval(state) state.setValue(((Identifie
rExp ) lhs)-gtgetName(), val) return val
int left lhs-gteval(state) int right
rhs-gteval(state) if (op "") return left
right if (op "-") return left - right
if (op "") return left right if (op
"/") return left / right if (op "")
return left right error("Illegal operator
in expression") return 0
63Reference
64Next
- Paring Strategies
- PAC Chapter 17
65The End