CPSC 388 - PowerPoint PPT Presentation

Loading...

PPT – CPSC 388 PowerPoint presentation | free to download - id: 1585a2-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CPSC 388

Description:

Additional Tables. Symbol table ... Constant folding: replace '4 2' by 6. Combine common sub-expressions. Reordering expressions (often prior to constant folding) Etc. ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 30
Provided by: ellenw4
Learn more at: http://cs.hiram.edu
Category:
Tags: cpsc | folding | tables

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CPSC 388


1
Introduction
  • CPSC 388
  • Ellen Walker
  • Hiram College

2
Why Learn About Compilers?
  • Practical application of important computer
    science theory
  • Ties together computer architecture and
    programming
  • Useful tools for developing language interpreters
  • Not just programming languages!

3
Computer Languages
  • Machine language
  • Binary numbers stored in memory
  • Bits correspond directly to machine actions
  • Assembly language
  • A symbolic face for machine language
  • Line-for-line translation
  • High-level language (our goal!)
  • Closer to human expressions of problems, e.g.
    mathematical notation

4
Assembler vs. HLL
  • Assembler
  • Ldi r1, 2 -- put the value 2 in R1
  • Sto r1, x -- store that value in X
  • HLL
  • X 2

5
Characteristics of HLLs
  • Easier to learn (and remember)
  • Machine independent
  • No knowledge of architecture needed
  • as long as there is a compiler for that machine!

6
Early Milestones
  • FORTRAN (Formula Translation)
  • IBM (John Backus) 1954-1957
  • First High-level language, and first compiler
  • Chomsky Hierarchy (1950s)
  • Formal description of natural language structure
  • Ranks languages according to the complexity of
    their grammar

7
Chomsky Hierarchy
  • Type 3 Regular languages
  • Too simple for programming languages
  • Good for tokens, e.g. numbers
  • Type 2 Context Free languages
  • Standard representation of programming languages
  • Type 1 Context Sensitive Languages
  • Type 0 Unrestricted

8
Another View of the Hierarchy
CSL
CFL
RL
9
Formal Language Automata Theory
  • Machines to recognizes each language class
  • Turing Machine (computable languages)
  • Push-down Automaton (context-free languages)
  • Finite Automaton (regular languages)
  • Use machines to prove that a given language
    belongs to a class
  • Formally prove that a given language does not
    belong to a class

10
Practical Applications of Theory
  • Translate from grammar to formal machine
    description
  • Implement the formal machine to parse the
    language
  • Tools
  • Scanner Generator (RL / FA) LEX, FLEX
  • Parser Generator (CFL / FA) YACC, Bison

11
Beyond Parsing
  • Code generation
  • Optimization
  • Techniques to mindlessly improve code
  • Usually after code generation
  • Rarely optimal, simply better

12
Phases of a Compiler
  • Scanner -gt tokens
  • Parser -gt syntax tree
  • Semantic Analyzer -gt annotated tree
  • Source code optimizer -gt intermediate code
  • Code generator -gt target code
  • Target code optimizer -gt better target code

13
Additional Tables
  • Symbol table
  • Tracks all variable names and other symbols that
    will have to be mapped to addresses later
  • Literal table
  • Tracks literals (such as numbers and strings)
    that will have to be stored along with the
    eventual program

14
Scanner
  • Read a stream of characters
  • Perform lexical analysis to generate tokens
  • Update symbol and literal tables as needed
  • Example
  • Input aj 4 1
  • Tokens ID Lbrack ID Rbrack EQL NUM PLUS NUM

15
Parser
  • Performs syntax analysis
  • Relates the sequence of tokens to the grammar
  • Builds a tree that represents this relationship,
    the parse tree

16
Partial Grammar
  • assign-expr -gt expr expr
  • array-expr -gt ID expr
  • expr -gt array-expr
  • expr -gt expr expr
  • expr -gt ID
  • expr -gt NUM

17
Example Parse
assign-expression

expression
expression
add-expression
array-expression
expression
ID



expression
expression
ID
NUM
NUM
18
Abstract Syntax Tree
assign-expression
expression
expression
add-expression
array-expression
expression
ID
expression
expression
ID
NUM
NUM
19
Semantic Analyzer
  • Determine the meaning (not structure) of the
    program
  • This is compile-time or static semantics only
  • Example aj 4 1
  • a refers to an array location
  • a contains integers
  • j is an integer
  • j is in the range of the array (not checked in C)
  • Parse or Syntax tree is decorated with this
    information

20
Source Code Optimizer
  • Simplify and improve the source code by applying
    rules
  • Constant folding replace 42 by 6
  • Combine common sub-expressions
  • Reordering expressions (often prior to constant
    folding)
  • Etc.
  • Result modified, decorated syntax tree or
    Intermediate Representation

21
Code Generator
  • Generates code for the target machine
  • Example
  • MOV R0, j value of j into R0
  • MUL R0, 2 2j in R0 (int 2 wds)
  • MOV R1, a value of a in R1
  • ADD R1, R0 a2j in R1 (addr of aj)
  • MOV R1, 6 6 into address in R1

22
Target Code Optimizer
  • Apply rules to improve machine code
  • Example
  • MOV R0, j
  • SHL R0 (shift to multiply by 2)
  • Use more complex
  • MOV aR0, 6 machine instruction to
  • replace simpler ones

23
Major Data Structures
  • Tokens
  • Syntax Tree
  • Symbol Table
  • Literal Table
  • Intermediate Code
  • Temporary files

24
Structuring a Compiler
  • Analysis vs. Synthesis
  • Analysis understanding the source code
  • Synthesis generating the target code
  • Front end vs. Back end
  • Front end parsing intermediate code
    generation (target machine-independent)
  • Back end target code generation
  • Optimization included in both parts

25
Multiple Passes
  • Each pass process the source code once
  • One pass per phase
  • One pass for several phases
  • One pass for entire compilation
  • Language definition can preclude one-pass
    compilation

26
Runtime Environments
  • Static (e.g. FORTRAN)
  • No pointers, no dynamic allocation, no recursion
  • All memory allocation done prior to execution
  • Stack-based (e.g. C family)
  • Stack for nested allocation (call/return)
  • Heap for random allocation (new)
  • Fully dynamic (LISP)
  • Allocation is automatic (not in source code)
  • Garbage collection required

27
Error Handling
  • Each phase finds and handles its own types of
    errors
  • Scanning errors like 1o1 (invalid ID)
  • Parsing syntax errors
  • Semantic Analysis type errors
  • Runtime errors handled by the runtime environment
  • Exception handling by programmer often allowed

28
Compiling the Compiler
  • Using machine language
  • Immediately executable, hard to write
  • Necessary for the first (FORTRAN) compiler
  • Using a language with an existing compiler and
    the same target machine
  • Using the language to be compiled (bootstrapping)

29
Bootstrapping
  • Write a quick dirty compiler for a subset of
    the language (using machine language or another
    available HLL)
  • Write a complete compiler in the language subset
  • Compile the complete compiler using the quick
    dirty compiler
About PowerShow.com