Chapter 5 Compilers - PowerPoint PPT Presentation

1 / 113
About This Presentation
Title:

Chapter 5 Compilers

Description:

Chapter 5 Compilers System Software Chih-Shun Hsu Basic Compiler Functions Three steps in the compilation process scanning parsing, and code generation The task of ... – PowerPoint PPT presentation

Number of Views:2290
Avg rating:3.0/5.0
Slides: 114
Provided by: Jas105
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5 Compilers


1
Chapter 5 Compilers
  • System Software
  • Chih-Shun Hsu

2
Basic Compiler Functions
  • Three steps in the compilation processscanning
    parsing, and code generation
  • The task of scanning the source statement,
    recognizing and classifying the various tokens,
    is known as lexical analysis
  • The part of the compiler that performs this
    analytic function is called the scanner
  • Each statement in the program must be recognized
    as some language construct
  • This process, which is called syntactic analysis
    or parsing, is performed by a part of the
    compiler that is called the parser
  • The last step in the basic translation process is
    the generation of object code

3
Grammars(2/1)
  • A grammar for a programming language is a formal
    description of the syntax, or form, of programs
    and individual statements
  • A BNF (for Backus-Naur Form) grammar consists of
    a set of rules, each of which defines the syntax
    of some construct in the programming language
  • The symbol can be read is defined to be
  • On the left of this symbol is the language
    construct being defined, and on the right is a
    description of the syntax being defined for it

4
Grammar(2/2)
  • Character strings enclosed between the angle
    brackets lt and gt are called nonterminal symbols
  • Entries not enclosed in angle brackets are
    terminal symbols of the grammar
  • The rule offers many possibilities is separated
    by the symbol
  • It is convenient to display the analysis of a
    source statement in terms of a grammar as a tree
  • This tree is usually called the parse tree, or
    syntax tree for the statement
  • If there is more than one possible parse tree for
    a given statement, the grammar is said to be
    ambiguous

5
Example of a Pascal Program
6
Simplified Pascal Grammar
7
Parse Tree(2/1)
8
Parse Tree(2/2)
9
Lexical Analysis(2/1)
  • Lexical analysis involves scanning the program to
    be compiled and recognizing the tokens that make
    up the source statements
  • An identifier might be defined by the rules
  • ltidentgtltlettergtltidentgtltlettergtltidentgtltdigitgt
  • ltlettergtA B C D Z
  • ltdigitgt0 1 2 3 9
  • The output of the scanner consists of a sequence
    of tokens
  • The parser would be responsible for saving any
    tokens that it might require for later analysis

10
Lexical Analysis(2/2)
  • In addition to its primary function of
    recognizing tokens, the scanner is responsible
    for reading the lines of the source program as
    needed, and possible for printing the source
    listing
  • The scanner must take into account any special
    format required of the source statements
  • The scanner must also incorporate knowledge about
    language-dependent items such as whether blanks
    function as delimiters for tokens or not
  • In FORTRAN, any keyword may also be used as an
    identifier and blanks are ignored in statements

11
Modeling Scanners as Finite Automata
  • A finite automaton consists of a finite set of
    states and a set of transitions from one state to
    another
  • States are represented by circles, and
    transitions by arrows from one state to another
  • Each arrow is labeled with a character or a set
    characters that cause the specified transition to
    occur
  • The starting state has an arrow entering it
  • Final states are identified by double circles
  • Each of the finite automata was designed to
    recognize one particular type of token
  • If there is no entry in a column, there is no
    transition corresponding to that character, and
    the automaton halts

12
Graphical Representation of a finite automaton
13
Finite Automaton to Recognize Tokens
14
Token Recognition using Algorithmic Code
15
Tabular Representation of Finite Automaton
16
Syntactic Analysis
  • During syntactic analysis, the source statements
    written by the programmer are recognized as
    language constructs described by the grammar
    being used
  • Parsing techniques are divided into two
    classesbottom-up and top-downaccording to the
    way in which the parse tree is constructed
  • Top-down methods begin with the rule of the
    grammar that specifies the goal of the analysis,
    and attempt to construct the tree so that the
    terminal nodes match the statements being
    analyzed
  • Bottom-up methods begin with the terminal nodes
    of the tree, and attempt to combine these into
    successively higher-level nodes until the root is
    reached

17
Operator-Precedence Parsing
  • The operator-precedence method is based on
    examining pairs of consecutive operators in the
    source program, and making decisions about which
    operation should be performed first
  • The first step in constructing an
    operator-precedence parser is to determine the
    precedence relations between the operators of the
    grammar
  • Operator is taken to mean any terminal symbol
  • If there is no precedence relation between a pair
    of tokens, this means that these two tokens
    cannot appear together in any legal statement

18
Precedence Matrix
19
Operator-Precedence Parse of a READ Statement(2/1)
20
Operator-Precedence Parse of a READ Statement(2/2)
21
Left-toRight Sacn
  • The left-to-right scan is continued in each step
    only far enough to determine the next portion of
    the statement to be recognized, which is the
    first portion delimited by lt and gt
  • Once this portion has been determined, it is
    interpreted as a nonterminal, according to some
    rules of the grammar
  • This process continue until the complete
    statement is recognized
  • The parse tree is constructed from the terminal
    nodes up toward the root, hence the term
    bottom-up parsing

22
Shift-Reduce Parsing
  • Operator precedence was one of the earliest
    bottom-up parsing methods
  • The operator precedence technique were developed
    into a more general method known as shift-reduce
    parsing
  • Shift-reduce parsers make use of a stack to store
    tokens that have not yet been recognized in terms
    of the grammar
  • The two main actions that can be taken are shift
    (push current token onto the stack) and reduce
    (recognize symbols on top of the stack according
    to rule of the grammar)
  • The most powerful shift-reduce parsing technique
    is called LR(k) (the integer k indicates the
    number of tokens following the current position
    that are considered in making parsing decisions)

23
Example of Shift-Reduce Parsing (3/1)
24
Example of Shift-Reduce Parsing (3/2)
25
Example of Shift-Reduce Parsing (3/3)
26
Recursive-Descent Parsing(2/1)
  • Recursive-descenttop-down method
  • A recursive-descent parser is made up of a
    procedure for each nonterminal symbol in the
    grammar
  • When a procedure is called, it attempts to find a
    substring of the input, beginning with the
    current token, that can be interpreted as the
    nonterminal with which the procedure is
    associated
  • It may call other procedures, or even call itself
    recursively, to search for other nonterminals

27
Recursive-Descent Parsing(2/2)
  • If a procedure finds the nonterminal that is its
    goal it returns an indication of success to its
    called, otherwise, it return an indication of
    failure
  • For the recursive-descent technique, it must be
    possible to decide which alternative to use by
    examining the next input token
  • Top-down parsers cannot be directly used with a
    grammar that contains immediate left recursion
  • The parse tree is constructed beginning at the
    root, hence the term top-down parsing

28
Simplified Pascal Grammar Modified or
Recursive-Descent
29
Recursive-descent Parse of a READ statement(3/1)
30
Recursive-descent Parse of a READ statement(3/2)
31
Recursive-descent Parse of a READ statement(3/1)
32
Recursive-descent Parse of an Assignment
statement(5/1)
33
Recursive-descent Parse of an Assignment
statement(5/2)
34
Recursive-descent Parse of an Assignment
statement(5/3)
35
Recursive-descent Parse of an Assignment
statement(5/4)
36
Recursive-descent Parse of an Assignment
statement(5/5)
37
Code Generation
  • Semantic routines related to the meaning
    associates with the corresponding construct in
    the language
  • Code-generation routines semantic routines
    generate object code directly
  • Our code-generation routines make use of two data
    structures for working storage a list and a
    stack
  • As each piece of object code is generated, we
    assume that a location counter LOCCTR is updated
    to reflect the next available address in the
    compiled program (exactly as it is in an
    assembler)

38
Code Generation for a READ Statement(2/1)
39
Code Generation for a READ Statement(2/2)
40
Code Generation for an Assignment Statement(9/1)
41
Code Generation for an Assignment Statement(9/2)
42
Code Generation for an Assignment Statement(9/3)
43
Code Generation for an Assignment Statement(9/4)
44
Code Generation for an Assignment Statement(9/5)
45
Code Generation for an Assignment Statement(9/6)
46
Code Generation for an Assignment Statement(9/7)
47
Code Generation for an Assignment Statement(9/8)
48
Code Generation for an Assignment Statement(9/1)
49
Other Code-Generation Routines(6/1)
50
Other Code-Generation Routines(6/2)
51
Other Code-Generation Routines(6/3)
52
Other Code-Generation Routines(6/4)
53
Other Code-Generation Routines(6/5)
54
Other Code-Generation Routines(6/6)
55
Object Code Generated for Program(3/1)
56
Object Code Generated for Program(3/2)
57
Object Code Generated for Program(3/3)
58
Machine-Dependent Compiler Features
  • The real machine dependencies of a compiler are
    related to the generation and optimization of the
    object code
  • In intermediate form, the syntax and semantics of
    the source statements have been completely
    analyzed, but the actual translation into machine
    code has not yet been performed
  • It is much easier to analyze and manipulate
    intermediate form of the program for the purposes
    of code optimization

59
Intermediate Form of the Program
  • Quadruple operation, op1, op2, result
  • Operation is some function to be performed by the
    object code, op1 and op2 are the operands for
    this operation, and result designates where the
    resulting value is to be placed
  • The quadruples can be rearranged to eliminate
    redundant load and store operations, and the
    intermediate results can be assigned to registers
    or to temporary variables to make their use as
    efficient as possible
  • After optimization has been performed, the
    modified quadruples are translated into machine
    code

60
Examples of Quadruples
  • SUMSUMVALUE ?
  • , SUM, VALUE, i1
  • , i1, , SUM
  • VARIANCESUMSQ DIV 100-MEANMEAN ?
  • DIV, SUMSQ 100, i1
  • , MEAN, MEAN, i2
  • -, i1, i2, i3
  • , i3, , VARIANCE

61
Intermediate Code for the Program(2/1)
62
Intermediate Code for the Program(2/2)
63
Machine-Dependent Code Optimization(3/1)
  • First problem assignment and use of registers
  • Machine instructions that use registers as
    operands are usually faster than the
    corresponding instructions that refer to
    locations in memory
  • Select which register value to replace when it is
    necessary to assign a register for some other
    purpose
  • The value that will not be needed for the longest
    time is the one that should be replaced
  • If the register that is being reassigned contains
    the value of some variable already stored in
    memory, the value can simply be discarded

64
Machine-Dependent Code Optimization(3/2)
  • In making and using register assignments, a
    compiler must also consider the control flow of
    the program
  • A basic block is a sequence of quadruples with
    one entry point, which is at the beginning of the
    block, one exit point, which is at the end of the
    block, and no jumps within the block
  • More sophisticated code-optimization techniques
    can analyze a flow graph and perform register
    assignments that remain valid from one basic
    block to another
  • Another possibility for code optimization
    involves rearranging quadruples before machine is
    generated

65
Machine-Dependent Code Optimization(3/3)
  • Other possibilities for machine-dependent code
    optimization involve taking advantage of specific
    characteristics and instructions of the target
    machine
  • Special loop-control instructions or addressing
    modes that can be used to create more efficient
    object code
  • High-level machine instructions that can perform
    complicated functions such as calling procedure
    and manipulating data structures in a single
    operation
  • Consecutive instructions that involve different
    functional units can sometime be executed at the
    same time

66
Basic Blocks and Flow Graph
67
Rearrangement of Quadruples for Code
Optimization(2/1)
68
Rearrangement of Quadruples for Code
Optimization(2/2)
69
Machine-Independent Compiler Features
  • Structured Variables
  • Machine-Independent Code Optimization
  • Storage Allocation
  • Block-Structured Language

70
Structured Variables
  • Structured variables arrays, records, strings,
    and sets
  • Row-major order all array elements that have the
    same value of the first subscript are stored in
    contiguous locations
  • Column-major order all elements that have the
    same value of the second subscript are stored
    together
  • In row-major order, the rightmost subscript
    varies most rapidly in column-major order, the
    leftmost subscript varies most rapidly
  • Dynamic array the compiler creates a descriptor
    (dope vector) for the array for storing the lower
    and upper bounds for each array subscript

71
Row-Major and Column-Major
72
Code Generation for Array References(2/1)
73
Code Generation for Array References(2/2)
74
Machine-Independent Code Optimization
  • Elimination of common subexpressions
  • Removal of loop invariants
  • Rewriting the source program
  • The substitution of a more efficient operation
    for a less efficient one
  • Reduction in strength of an operation
  • Folding operand values are known at compilation
    time can be performed by the compiler
  • Loop unrolling converting a loop into a straight
    line code
  • Loop jamming merging of the bodies of loops

75
Elimination of common subexpressions and removal
of loop invariants(4/1)
76
Elimination of common subexpressions and removal
of loop invariants(4/2)
77
Elimination of common subexpressions and removal
of loop invariants(4/3)
78
Elimination of common subexpressions and removal
of loop invariants(4/4)
79
Reduction in Strength of Operations(2/1)
80
Reduction in Strength of Operations(2/2)
81
Storage Allocation(2/1)
  • If procedures may be called recursively, static
    allocation cannot be used
  • Each procedure call creates an activation record
    that contains storage for all the variables used
    by the procedure
  • For each activation record is associated with a
    particular invocation of the procedure
  • An activation record is not deleted until a
    return has been made from the corresponding
    invocation
  • Activation records are typically allocated on a
    stack, with the current record at the top of the
    stack

82
Storage Allocation(2/2)
  • When automatic allocation is used, the compiler
    must generate code for references to variables
    using some sort of relative addressing
  • When automatic allocation is used, storage is
    assigned to all variables used by a procedure
    when the procedure is called
  • A large block of free storage called a heap is
    obtained from the operating system at the
    beginning of the program
  • Allocation of storage from the heap are managed
    by the run-time procedure
  • A run-time garbage collection procedure scans the
    pointers in the program and reclaims areas from
    the heap that are no longer being used

83
Recursive Invocation of a Procedure Using Static
Storage
84
Recursive Invocation of a Procedure Using
Automatic Storage Allocation(2/1)
85
Recursive Invocation of a Procedure Using
Automatic Storage Allocation(2/2)
86
Block-Structured Languages(2/1)
  • A block is a portion of a program that has the
    ability to declare its own identifiers
  • At the beginning of each new block is recognized,
    it is assigned the next block number in sequence
  • The compiler can construct a table that describes
    the block structure
  • The block-level entry gives the nesting depth of
    each block
  • When a reference to an identifier appears in the
    source program, the compiler must first check the
    symbol table for a definition of that identifier
    by the current block
  • If no such definition is found, the compiler
    looks for a definition by the block that
    surrounds the currrent one

87
Block-Structured Languages(2/2)
  • Most block-structured languages make use of
    automatic storage allocation
  • One common method for providing access to
    variables in surrounding blocks uses a data
    structure called a display
  • The display contains pointers to the most recent
    activation records and for all blocks that
    surround the current one in the source program
  • The compiler for a block-structured language must
    include code at the beginning of a block to
    initialize the display for that block
  • At the end of the block, it must include code to
    restore the previous display contents

88
Nested of Blocks(2/1)
89
Nested of Blocks(2/2)
90
Using of Display for Procedures(2/1)
91
Using of Display for Procedures(2/2)
92
Compiler Design Options
  • Division into Passes
  • Interpreters
  • P-Code Compilers
  • Compiler-Compilers

93
Division into Passes
  • A language that allows forward references to data
    items cannot be compiled in one pass
  • If speed of compilation is important, a one-pass
    design might be preferred
  • If program are executed many times for each
    compilation, or if they process large amounts of
    data, then speed of execution becomes more
    important than speed of compilation, we might
    prefer a multi-pass compiler design that could
    incorporate sophisticated code-optimization
    techniques
  • Multi-pass compilers are also used when the
    amount of memory, or other system resources, is
    severely limited

94
Interpreters
  • Interpreters execute a version of the source
    program directly, instead of translating it into
    machine code
  • An interpreter usually performs lexical and
    syntactic analysis functions, and then translates
    the source program into an internal form
  • The process of translating a source into some
    internal form is simpler and faster than
    compiling it into machine code
  • Execution o the translated program by an
    interpreter is much slower than execution of the
    machine code produced by a compiler
  • The real advantage of an interpreter over a
    compiler is in the debugging facilities that can
    easily be provided
  • Interpreters are especially attractive in an
    educational environment (emphasis on learning and
    program testing)

95
P-Code Compilers
  • P-code compilers (also called bytecode compilers)
    are very similar in concept to interpreter
  • The source program is analyzed and converted into
    an intermediate form, which is then executed
    interpretively
  • With a P-code compiler, this intermediate forms
    is the machine language for a hypothetical
    machine, often called pseudo-machine
  • P-code object programs can be executed on any
    machine that has a P-code interpreter
  • The P-code object program is often much smaller
    than a corresponding machine code program would
    be
  • The interpretive execution of a P-code program
    may be much slower than the execution of the
    equivalent machine code
  • If execution speed is important, some P-code
    compilers support the use of machine-language
    subroutines

96
Translation and Execution Using a P-code Compiler
97
Compiler-Compilers
  • A compiler-compiler is a software tool that can
    be used to help in the task of compiler
    construction
  • The user provides a description of the language
    to be translated
  • This description may consist of a set of lexical
    rules for defining tokens and grammar for the
    source language
  • Some compiler-compilers use this information to
    generate a scanner and a parser directly
  • Other creates tables for use by standard
    table-driven scanning and parsing routines that
    are supplied by the compiler-compiler
  • The main advantage of using a compile-compiler is
    ease of compiler construction and testing
  • The writer can therefore focus more attention on
    good generation and optimization

98
Automated Compiler Construction using a
Compiler-Compiler
99
Implementation Examples
  • SunOS C Compiler
  • GNU NYU Ada Translator
  • Cray MPP FORTRAN Compiler
  • Java Compiler and Environment
  • The YACC Compiler-Compiler

100
SunOS C Compiler(3/1)
  • The SunOS C compiler runs on a variety of
    hardware platforms, including SPARC, x86, and
    PowerPC
  • The translation process begins with the execution
    of the C preprocessor, which performs file
    inclusion and macro processing
  • The output from the preprocessor goes to the C
    compiler itself
  • The preprocessor and compiler also accept source
    files that contain assembler language
    subprograms, and pass these on to the assembly
    phase
  • After preprocessing is complete, the actual
    process of program translation begins

101
SunOS C Compiler(3/2)
  • The lexical analysis of the program is performed
    during preprocessing
  • The compiler itself begins with syntactic
    analysis, followed by semantic analysis and code
    generation
  • The SunOS itself is largely written in C
  • Four different levels of code optimization can be
    specified by the user when a program is compiled
  • When requested, the SunOS C compiler can insert
    special code into the object program to gather
    information about its execution
  • SunOs C can also generate information that
    supports the operation of debugging tools

102
SunOS C Compiler(3/3)
  • The O1 level does minimal amount of local
    optimization at the assembler-language level
  • The O2 level provides basic local and global
    optimization, including register allocation and
    merging of basic block as well as elimination of
    common subexpressions and removal of loop
    invariants
  • The O3 and O4 levels include optimization that
    can improve execution speed, but usually produce
    a larger object program
  • O3 optimization performs loop unrolling
  • The O4 level automatically converts calls to
    user-written functions into i-line code

103
GNU NYU Ada Translator(2/1)
  • A compiler that is written in the language it
    compiles is often referred to as a self-compiler
  • One benefit of this approach is the ability to
    bootstrap improvements in language design and
    code generation
  • Syntax analysis in GNAT is performed by a
    hand-coded recursive descent parser
  • The syntax analysis phase also includes a
    sophisticated error recovery system
  • The semantic analyzer performs name and type
    resolution, and decorated the AST (abstract
    syntax tree) with various semantic attributes

104
GNU NYU Ada Translator(2/2)
  • After front-end processing is complete, the
    GNAT-to-GNU phase (Gigi) traverses the AST and
    calls generators that build corresponding GCC
    tree fragments
  • As each fragment is generated, the GCC back end
    performs the corresponding code-generation
    activity
  • Code generation itself is done using an
    intermediate representation that depends on the
    target machine
  • Configuration files describe the characteristics
    of the target machine

105
Overall Organization of GNAT
106
Cray MPP FORTRAN Compiler(2/1)
  • DIMENSION A(256)
  • CDIR SHARED A(BLOCK)
  • The compiler directive SHARED specifies that the
    elements of the array are to be divided among the
    processing elements that are assigned to execute
    the program
  • CDIR DOSHARED (I) ON A(I)
  • DO I1, 256
  • A(I)SQRT(A(I))
  • END DO
  • The compiler directive DOSHARED specifies that
    the iterations of the loop are to be divided
    among the available PEs

107
Cray MPP FORTRAN Compiler(2/2)
  • The compiler also implements low-level features
    that can be used if more detailed control of the
    processing is needed
  • HIIDX and LOWIDX return the highest and lowest
    subscript values for a given array on a specified
    PE
  • The function HOME returns the number of the PE on
    which a specified array element resides
  • The MPP FORTRAN compiler provides a number of
    tools that can used to synchronize the parallel
    execution of programs
  • When a PE encounters a barrier, it stops
    execution and waits until all other PEs have also
    reached the barrier

108
Array Elements Distributed Among PEs
109
Java Compiler and Environment(2/1)
  • The Java language itself is derived from C and
    C
  • Memory management is handled automatically, thus
    free the programmer from a complex and
    error-prone task
  • There are no procedure or functions in Java
    classes and methods are used instead
  • Programmers are constrained to use a pure
    object-oriented style, rather than mixing the
    procedural and object-oriented approaches
  • Java provides built-in support for multiple
    threads of execution, which allow different parts
    of an applications code to be executed
    concurrently
  • The Java compiler follows the P-code approach

110
Java Compiler and Environment(2/2)
  • The compiler generate bytecodesa high level,
    machine-independent code for a hypothetical
    machine (the Java Virtual Machine), which is
    implemented on each target computer by an
    interpreter and run-time system
  • A Java can be run, without modification and
    without recompiling, on any computer for which a
    Java interpreter exists
  • The bytecode approach allows easy integration of
    Java applications into the World Wide Web.
  • The java interpreter is designed to run as fast
    as possible, without needing to check the
    run-time environment
  • The automatic garbage collection system used to
    manage memory runs as a low-priority background
    thread
  • When high performance is needed, the Java
    bytecodes can be translated at execution time
    into machine code for the computer on which the
    application is running

111
The YACC Compiler-Compiler
  • YACC (Yet Another Compiler0Compiler) is a parser
    generator that is available on UNIX systems
  • LEX is a scanner generator that can be used to
    create scanners of the type required by YACC
  • The YACC parser generator accepts as input a
    grammar for the language being compiled and a set
    of actions corresponding to rules of the grammar
  • The YACC parser calls the semantic routines
    associated with each rule as the corresponding
    language construct recognized
  • The parsers generated by YACC use a bottom-up
    parsing method called LALR(1), which is slightly
    restricted form of shift-reduce parsing

112
Example of Input Specifications for LEX
113
Example of Input Specifications for YACC
Write a Comment
User Comments (0)
About PowerShow.com