Expression Trees - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Expression Trees

Description:

Expression Trees Zhengwei QI Most s from Eric Roberts, CS 106B, Stanford, and Shiuh-Sheng Yu – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 66
Provided by: qiz6
Category:

less

Transcript and Presenter's Notes

Title: Expression Trees


1
Expression Trees
Zhengwei QI Most slides from Eric Roberts, CS
106B, Stanford, and Shiuh-Sheng Yu
2
Part I basic concepts
3
Where Are We Heading?
  • In Lab 3, your task is to implement an
    interpreter for the BASIC programming language.
    In doing so, you will learn a little bit about
    how interpreters and compilers work that will
    almost certainly prove useful as you write your
    own programs.

4
Where Are We Heading?
  • In doing so, you will learn a little bit about
    how interpreters and compilers work that will
    almost certainly prove useful as you write your
    own programs.
  • The goal for the next two courses is to give you
    a sense of how a compiler can make sense of an
    arithmetic expression like

y 3 (x 1)
  • To accomplish that goal, I will talk about
    several ideas that initially might seem
    unrelated
  • Tree structures
  • Class hierarchies
  • The recursive nature of expressions
  • Taken together, these ideas will make it possible
    to define data structures that represent
    arithmetic expressions.

5
Trees
  • In the text, the first example I use to
    illustrate tree structures is the royal family
    tree of the House of Normandy(????(1066?-1135)
  • This example is useful for defining terminology
  • William I is the root of the tree.
  • Adela is a child of William I and the parent of
    Stephen.
  • Robert, William II, Adela, and Henry I are
    siblings.
  • Henry II is a descendant of William I, Henry I,
    and Matilda.
  • William I is an ancestor of everyone else in this
    tree.

6
Family Trees
  • Trees of the English royal family can get very
    complicated. The tree at the right, for example,
    shows the royal lineage through the time of the
    War of the Roses encompassing the Houses of
    Plantagenet, York, and Lancaster.
  • A tree diagram of this sort is very helpful, for
    example, if you want to understand Shakespeares
    history plays.

7
Family Trees
And here is my all-time favorite family tree from
J. R. R. Tolkien (The Lord of the Rings)
8
Trees Are Everywhere
  • But family trees are by no means the only kind.
    Here, for example, is the first evolutionary
    tree, which appears as a diagram in Darwins
    Notebook B

Charles Darwin (around the time of Voyage of the
Beagle)
9
Trees Are Everywhere
  • The tree structure of evolution is even more
    clearly shown in Ernst Haeckels diagram of the
    tree of life from 1866.
  • The details of evolutionary tree diagrams have
    changed markedly over the last 150 years, but
    their general structurestarting with a single
    root and branching to form new lineagesremains
    constant.

10
Representing Family Trees
  • The first step in writing programs that work with
    trees is to design a suitable data structure.
  • In diagrammatic form, the goal is to transform a
    tree diagram into an internal representation that
    looks like this

11
A Recursive Definition for Trees
  • When you think about trees with an eye toward
    representing their internal structure, the
    following definition is extremely useful
  • A tree is a pointer to a node.
  • A node is a structure that contains some number
    of trees, usually along with some additional
    data.
  • Although this definition is clearly circular, it
    is not necessarily infinite, either because
  • Tree pointers can be NULL indicating an empty
    tree.
  • Nodes can contain an empty list of children.
  • In C, programmers typically define a structure
    or object type to represent a node and then use
    an explicit pointer type to represent the tree.

12
Recursive trees definition
  • Recursively, as a data type a tree is defined as
    a value (of some data type, possibly empty),
    together with a list of trees (possibly an empty
    list), the subtrees of its children symbolically

t v t1, ..., tk
13
Mutual recursions definition
  • a tree can be defined in terms of a forest (a
    list of trees), where a tree consists of a value
    and a forest (the subtrees of its children)

f t1, ..., tk t v f
14
Type theorys definition
As an ADT, the abstract tree type T with values
of some type E is defined, using the abstract
forest type F (list of trees), by the functions
value T ? E children T ? F
nil () ? F
node E  F ? T with the axioms
value(node(e, f))  e children(node(e, f))
 f
15
The familytree.h Interface
/ File familytree.h ------------------
This file is an interface to a simple class that
represents an individual person in a family
tree. / ifndef _familytree_h define
_familytree_h include ltstringgt include
"vector.h" / Class PersonNode
----------------- This class defines the
structure of an individual in the family tree,
which consists of a name and a vector of
children. / class PersonNode public
16
The familytree.h Interface
/ File familytree.h ------------------
This file is an interface to a simple class that
represents an individual person in a family
tree. / ifndef _familytree_h define
_familytree_h include ltstringgt include
"vector.h" / Class PersonNode
----------------- This class defines the
structure of an individual in the family tree,
which consists of a name and a vector of
children. / class PersonNode public
17
The familytree.h Interface
/ Constructor PersonNode Usage
PersonNode person new PersonNode(name)
-------------------------------------------------
This function constructs a new PersonNode with
the specified name. The newly constructed
entry has no children, but clients can add
children by calling the addChild method. /
PersonNode(stdstring name) / Method
getName Usage string name
person-gtgetName() ----------------------------
----------- Returns the name of the person.
/ stdstring getName() / Method
addChild Usage person-gtaddChild(child)
------------------------------- Adds child to
the end of the list of children for person, and
makes person the parent of child. / void
addChild(PersonNode child)
18
The familytree.cpp Implementation
/ File familytree.cpp --------------------
This program implements the familytree.h
interface. / include ltstringgt include
"vector.h" include "familytree.h" using
namespace std PersonNodePersonNode(string
name) this-gtname name this-gtparent
NULL string PersonNodegetName() return
name void PersonNodeaddChild(PersonNode
child) children.add(child)
child-gtparent this
19
The familytree.cpp Implementation
/ File familytree.cpp --------------------
This program implements the familytree.h
interface. / include ltstringgt include
"vector.h" include "familytree.h" using
namespace std PersonNodePersonNode(string
name) this-gtname name this-gtparent
NULL string PersonNodegetName() return
name void PersonNodeaddChild(PersonNode
child) children.add(child)
child-gtparent this
20
Trees and Class Hierarchies
  • For the most part, classes in object-oriented
    languages such as C and Java are structured as
    a tree.
  • One of the defining characteristics of the
    object-oriented paradigm is that classes form
    hierarchies. Any class can be designated as a
    subclass of some other class, which is called its
    superclass. As noted on this weeks section
    handout, most class hierarchies are
    tree-structured even though C permits more
    complicated structures.
  • A class represents a specialization of its
    superclass. If you create an object that is an
    instance of a class, that object is also an
    instance of all other classes in the hierarchy
    above it in the superclass chain.
  • When you define a new class in C, that class
    automatically inherits the behavior of its
    superclass.

21
Hierarchical Data And Trees
  • The element at the top of the hierarchy is the
    root.
  • Elements next in the hierarchy are the children
    of the root.
  • Elements next in the hierarchy are the
    grandchildren of the root, and so on.
  • Elements that have no children are leaves.

22
Example Tree
23
Leaves
President
VP3
VP1
VP2
Manager
Manager1
Manager2
Manager
Worker Bee
24
Parent, Grandparent, Siblings, Ancestors,
Descendants
President
VP3
VP1
VP2
Manager
Manager2
Manager
Manager1
Worker Bee
25
Levels
26
Caution
  • Some texts start level numbers at 0 rather than
    at 1.
  • Root is at level 0.
  • Its children are at level 1.
  • The grand children of the root are at level 2.
  • And so on.

27
height depth number of levels
28
Node Degree Number Of Children
3
2
1
1
0
0
1
0
0
29
Tree Degree Max Node Degree
Degree of tree 3.
30
  • Root  The top node in a tree.
  • Parent  The converse notion of child.
  • Siblings  Nodes with the same parent.
  • Descendant  a node reachable by repeated
    proceeding from parent to child.
  • Ancestor  a node reachable by repeated
    proceeding from child to parent.
  • Leaf  a node with no children.
  • Internal node  a node with at least one child.
  • External node  a node with no children.
  • Degree  number of sub trees of a node.
  • Edge  connection between one node to another.
  • Path  a sequence of nodes and edges connecting a
    node with a descendant.
  • Level  The level of a node is defined by 1 the
    number of connections between the node and the
    root.
  • Height  The height of a node is the number of
    edges on the longest downward path between the
    root and a leaf.
  • Forest  A forest is a set of n 0 disjoint
    trees.

31
Part II Compiler and Interpreter
  • Prof. Xin Yuan

www.cs.fsu.edu/xyuan/cop4020
32
Overview
  • Compiler phases
  • Lexical analysis
  • Syntax analysis
  • Semantic analysis
  • Intermediate (machine-independent) code
    generation
  • Intermediate code optimization
  • Target (machine-dependent) code generation
  • Target code optimization

33
Source program with macros
A typical compilation process
Preprocessor
Source program
Compiler
Target assembly program
Try g with v, -E, -S flags on linprog.
assembler
Relocatable machine code
linker
Absolute machine code
34
  • What is a compiler?
  • A program that reads a program written in one
    language (source language) and translates it into
    an equivalent program in another language (target
    language).
  • Two components
  • Understand the program (make sure it is correct)
  • Rewrite the program in the target language.
  • Traditionally, the source language is a high
    level language and the target language is a low
    level language (machine code).

Target program
Source program
compiler
Error message
35
Compilation Phases and Passes
  • Compilation of a program proceeds through a fixed
    series of phases
  • Each phase use an (intermediate) form of the
    program produced by an earlier phase
  • Subsequent phases operate on lower-level code
    representations
  • Each phase may consist of a number of passes over
    the program representation
  • Pascal, FORTRAN, C languages designed for
    one-pass compilation, which explains the need for
    function prototypes
  • Single-pass compilers need less memory to operate
  • Java and ADA are multi-pass

36
Compiler Front- and Back-end
Abstract syntax tree orother intermediate form
Source program (character stream)
Scanner(lexical analysis)
Machine-Independent Code Improvement
Tokens
Parser(syntax analysis)
Modified intermediate form
Front endanalysis
Back endsynthesis
Target Code Generation
Parse tree
Semantic Analysis and Intermediate Code Generation
Assembly or object code
Machine-Specific Code Improvement
Abstract syntax tree orother intermediate form
Modified assembly or object code
37
Scanner Lexical Analysis
  • Lexical analysis breaks up a program into tokens
  • Grouping characters into non-separatable units
    (tokens)
  • Changing a stream to characters to a stream of
    tokens

program gcd (input, output)var i, j
integerbegin  read (i, j)  while i ltgt j
do    if i gt j then i i - j else j j -
i  writeln (i)end.
program gcd ( input , output )
var i , j integer
beginread ( i , j )
whilei ltgt j do if i
gt jthen i i - j
else j i - i  
writeln ( i) end .
38
Scanner Lexical Analysis
  • What kind of errors can be reported by lexical
    analyzer?
  • A b _at_3

39
Parser Syntax Analysis
  • Checks whether the token stream meets the
    grammatical specification of the language and
    generates the syntax tree.
  • A syntax error is produced by the compiler when
    the program does not meet the grammatical
    specification.
  • For grammatically correct program, this phase
    generates an internal representation that is easy
    to manipulate in later phases
  • Typically a syntax tree (also called a parse
    tree).
  • A grammar of a programming language is typically
    described by a context free grammer, which also
    defines the structure of the parse tree.

40
Context-Free Grammars
  • A context-free grammar defines the syntax of a
    programming language
  • The syntax defines the syntactic categories for
    language constructs
  • Statements
  • Expressions
  • Declarations
  • Categories are subdivided into more detailed
    categories
  • A Statement is a
  • For-statement
  • If-statement
  • Assignment

ltstatementgt ltfor-statementgt ltif-statementgt
ltassignmentgtltfor-statementgt for (
ltexpressiongt ltexpressiongt ltexpressiongt )
ltstatementgtltassignmentgt ltidentifiergt
ltexpressiongt
41
Example Micro Pascal
ltProgramgt program ltidgt ( ltidgt ltMore_idsgt )
ltBlockgt .ltBlockgt ltVariablesgt begin ltStmtgt
ltMore_Stmtsgt endltMore_idsgt , ltidgt
ltMore_idsgt ?ltVariablesgt var ltidgt
ltMore_idsgt ltTypegt ltMore_Variablesgt
?ltMore_Variablesgt ltidgt ltMore_idsgt ltTypegt
ltMore_Variablesgt ?ltStmtgt ltidgt
ltExpgt if ltExpgt then ltStmtgt else ltStmtgt
while ltExpgt do ltStmtgt begin ltStmtgt
ltMore_Stmtsgt endltExpgt ltnumgt ltidgt
ltExpgt ltExpgt ltExpgt - ltExpgt
42
Parsing examples
  • Pos init / rate 60 ? id1 id2 / id3
    const ? syntax error (exp exp exp cannot be
    reduced).
  • Pos init rate 60 ? id1 id2 id3 const ?


id1

id2

id3
60
43
Semantic Analysis
  • Semantic analysis is applied by a compiler to
    discover the meaning of a program by analyzing
    its parse tree or abstract syntax tree.
  • A program without grammatical errors may not
    always be correct program.
  • pos init rate 60
  • What if pos is a class while init and rate are
    integers?
  • This kind of errors cannot be found by the parser
  • Semantic analysis finds this type of error and
    ensure that the program has a meaning.

44
Semantic Analysis
  • Static semantic checks (done by the compiler) are
    performed at compile time
  • Type checking
  • Every variable is declared before used
  • Identifiers are used in appropriate contexts
  • Check subroutine call arguments
  • Check labels
  • Dynamic semantic checks are performed at run
    time, and the compiler produces code that
    performs these checks
  • Array subscript values are within bounds
  • Arithmetic errors, e.g. division by zero
  • Pointers are not dereferenced unless pointing to
    valid object
  • A variable is used but hasn't been initialized
  • When a check fails at run time, an exception is
    raised

45
Semantic Analysis and Strong Typing
  • A language is strongly typed "if (type) errors
    are always detected"
  • Errors are either detected at compile time or at
    run time
  • Examples of such errors are listed on previous
    slide
  • Languages that are strongly typed are Ada, Java,
    ML, Haskell
  • Languages that are not strongly typed are
    Fortran, Pascal, C/C, Lisp
  • Strong typing makes language safe and easier to
    use, but potentially slower because of dynamic
    semantic checks
  • In some languages, most (type) errors are
    detected late at run time which is detrimental to
    reliability e.g. early Basic, Lisp, Prolog, some
    script languages

46
Code Generation and Intermediate Code Forms
  • A typical intermediate form of code produced by
    the semantic analyzer is an abstract syntax tree
    (AST)
  • The AST is annotated with useful information such
    as pointers to the symbol table entry of
    identifiers

Example AST for thegcd program in Pascal
47
Code Generation and Intermediate Code Forms
  • Other intermediate code forms
  • intermediate code is something that is both close
    to the final machine code and easy to manipulate
    (for optimization). One example is the
    three-address code
  • dst op1 op op2
  • The three-address code for the assignment
    statement
  • temp1 60
  • temp2 id3 temp1
  • temp3 id2 temp2
  • id1 temp3
  • Machine-independent Intermediate code improvement
  • temp1 id3 60.0
  • id1 id2 temp1

48
Target Code Generation and Optimization
  • From the machine-independent form assembly or
    object code is generated by the compiler
  • MOVF id3, R2
  • MULF 60.0, R2
  • MOVF id2, R1
  • ADDF R2, R1
  • MOVF R1, id1
  • This machine-specific code is optimized to
    exploit specific hardware features

49
Summary
  • Compiler front-end lexical analysis, syntax
    analysis, semantic analysis
  • Tasks understanding the source code, making sure
    the source code is written correctly
  • Compiler back-end Intermediate code
    generation/improvement, and Machine code
    generation/improvement
  • Tasks translating the program to a semantically
    the same program (in a different language).

50
Example
PowerPC Translation
9AC0
lwz
r16,0(r4)
load value from memory
add
r7,r7,r16
accumulate sum
stw
r7,0(r5)
store to memory
addic.
r5,r5,-1
decrement loop count, set cr0
beq
cr0,pc12
branch if loop exit
bl
F000
branch link to EM
4FDC
save source PC in link register
9AE4
bl
F000
branch link to EM
51C8
save source PC in link register
Emulation Manager
9C08
stw
r7,0(r6)
store last value of edx
xor
r7,r7,r7
clear edx
bl
F000
branch link to EM
6200
save source PC in link register
Branch to transfer code
Translated basic block is executed
Branch is taken to stub
Stub BAL to Emulation Mgr.
EM loads SPC from stub, using link
EM hashes SPC and does lookup
EM loads SPC from hash tbl compares
Load TPC from hash table
Continue execution
Jump indirect to next translated block
51
Part III Expressions Tree
52
Recursive Structure of Expressions
  • In most programming languages, an expression is a
    recursive structure that can contain
    subexpressions.
  • In the model I use in Chapter 17, every
    expression falls into one of the following forms
  • An integer constant
  • A variable name that holds an integer value
  • Two expressions joined by an operator
  • An expression enclosed in parentheses
  • This structure can be represented in the form of
    a grammar.

53
Parse Trees
  • When the C compiler looks at an expression, it
    needs to understand what the expression means by
    translating it into an internal form. This
    process generally consists of two steps
  • Lexical analysis, in which the source text is
    broken up into units called tokens.
  • Parsing, in which the tokens are assembled into a
    recursive structure called an parse tree that
    embodies its structure.

54
The Expression Class Hierarchy
  • Because expressions have more than one form, a
    C class that represents expressions can be
    represented most easily by a class hierarchy in
    which each of the expression types is a separate
    subclass, as shown in the following diagram

55
Representing Inheritance in C
  • In contrast to Java, a subclass does not
    automatically override the definition of a method
    in its superclass. To permit such overriding,
    both classes must mark the prototype for that
    method with the keyword virtual.
  • An abstract class is a class that doesnt
    actually represent any objects but instead serves
    only as a common superclass for concrete classes
    that do correspond to objects. In C, methods
    for an abstract class that are always implemented
    by the concrete subclasses are indicated by
    including 0 before the semicolon on the
    prototype line.

56
The exp.h Interface
/ File exp.h ----------- This
interface defines a class hierarchy for
expressions, which allows the client to
represent and manipulate simple binary
expression trees. / ifndef _exp_h define
_exp_h include ltstringgt include
"evalstate.h" / Type ExpressionType
-------------------- This enumerated type is
used to differentiate the three different
expression types CONSTANT, IDENTIFIER, and
COMPOUND. / enum ExpressionType CONSTANT,
IDENTIFIER, COMPOUND
57
The exp.h Interface
/ File exp.h ----------- This
interface defines a class hierarchy for
expressions, which allows the client to
represent and manipulate simple binary
expression trees. / ifndef _exp_h define
_exp_h include ltstringgt include
"evalstate.h" / Type ExpressionType
-------------------- This enumerated type is
used to differentiate the three different
expression types CONSTANT, IDENTIFIER, and
COMPOUND. / enum ExpressionType CONSTANT,
IDENTIFIER, COMPOUND
58
The exp.h Interface
/ Class Expression -----------------
This class is used to represent a node in an
expression tree. Expression is an example of
an abstract class, which defines the structure
and behavior of a set of classes but has no
objects of its own. Any object must be one of
its three concrete subclasses 1.
ConstantExp -- an integer constant 2.
IdentifierExp -- a string representing an
identifier 3. CompoundExp -- two
expressions combined by an operator The
abstract class defines an interface common to all
Expression objects each subclass provides its
own implementation of the common interface.
/ class Expression public
Expression() virtual Expression()
virtual int eval(EvalState state) 0
virtual stdstring toString() 0 virtual
ExpressionType type() 0
59
The exp.h Interface
/ Class ConstantExp ------------------
This subclass represents a constant integer
expression. / class ConstantExp public
Expression public ConstantExp(int val)
virtual int eval(EvalState state) virtual
stdstring toString() virtual ExpressionType
type() int getValue() private int
value
60
The exp.h Interface
/ Class IdentifierExp --------------------
This subclass represents a expression
corresponding to a variable. / class
IdentifierExp public Expression public
IdentifierExp(string name) virtual int
eval(EvalState state) virtual stdstring
toString() virtual ExpressionType type()
string getName() private stdstring
name
61
The evalstate.h Interface
/ File evalstate.h -----------------
This interface exports the EvalState class, which
keeps track of information required by the
evaluator, such as the values of variables.
/ ifndef _evalstate_h define
_evalstate_h include ltstringgt include
"map.h" class EvalState public
EvalState() EvalState() void
setValue(stdstring var, int value) int
getValue(stdstring var) bool
isDefined(stdstring var) private
Mapltintgt symbolTable endif
62
Code for the eval Method
int ConstantExpeval(EvalState state)
return value int IdentifierExpeval(EvalState
state) if (!state.isDefined(name))
error(name " is undefined") return
state.getValue(name) int CompoundExpeval(Eva
lState state) if (op "") if
(lhs-gtgetType() ! IDENTIFIER) error("Illegal lhs
in assignment") int val
rhs-gteval(state) state.setValue(((Identifie
rExp ) lhs)-gtgetName(), val) return val
int left lhs-gteval(state) int right
rhs-gteval(state) if (op "") return left
right if (op "-") return left - right
if (op "") return left right if (op
"/") return left / right if (op "")
return left right error("Illegal operator
in expression") return 0
63
Reference
  • PAC Chapter 17

64
Next
  • Paring Strategies
  • PAC Chapter 17

65
The End
Write a Comment
User Comments (0)
About PowerShow.com