# CPSC 388 - PowerPoint PPT Presentation

PPT – CPSC 388 PowerPoint presentation | free to download - id: 1585a2-ZDc1Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## CPSC 388

Description:

### Additional Tables. Symbol table ... Constant folding: replace '4 2' by 6. Combine common sub-expressions. Reordering expressions (often prior to constant folding) Etc. ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 30
Provided by: ellenw4
Category:
Tags:
Transcript and Presenter's Notes

Title: CPSC 388

1
Introduction
• CPSC 388
• Ellen Walker
• Hiram College

2
• Practical application of important computer
science theory
• Ties together computer architecture and
programming
• Useful tools for developing language interpreters
• Not just programming languages!

3
Computer Languages
• Machine language
• Binary numbers stored in memory
• Bits correspond directly to machine actions
• Assembly language
• A symbolic face for machine language
• Line-for-line translation
• High-level language (our goal!)
• Closer to human expressions of problems, e.g.
mathematical notation

4
Assembler vs. HLL
• Assembler
• Ldi r1, 2 -- put the value 2 in R1
• Sto r1, x -- store that value in X
• HLL
• X 2

5
Characteristics of HLLs
• Easier to learn (and remember)
• Machine independent
• No knowledge of architecture needed
• as long as there is a compiler for that machine!

6
Early Milestones
• FORTRAN (Formula Translation)
• IBM (John Backus) 1954-1957
• First High-level language, and first compiler
• Chomsky Hierarchy (1950s)
• Formal description of natural language structure
• Ranks languages according to the complexity of
their grammar

7
Chomsky Hierarchy
• Type 3 Regular languages
• Too simple for programming languages
• Good for tokens, e.g. numbers
• Type 2 Context Free languages
• Standard representation of programming languages
• Type 1 Context Sensitive Languages
• Type 0 Unrestricted

8
Another View of the Hierarchy
CSL
CFL
RL
9
Formal Language Automata Theory
• Machines to recognizes each language class
• Turing Machine (computable languages)
• Push-down Automaton (context-free languages)
• Finite Automaton (regular languages)
• Use machines to prove that a given language
belongs to a class
• Formally prove that a given language does not
belong to a class

10
Practical Applications of Theory
• Translate from grammar to formal machine
description
• Implement the formal machine to parse the
language
• Tools
• Scanner Generator (RL / FA) LEX, FLEX
• Parser Generator (CFL / FA) YACC, Bison

11
Beyond Parsing
• Code generation
• Optimization
• Techniques to mindlessly improve code
• Usually after code generation
• Rarely optimal, simply better

12
Phases of a Compiler
• Scanner -gt tokens
• Parser -gt syntax tree
• Semantic Analyzer -gt annotated tree
• Source code optimizer -gt intermediate code
• Code generator -gt target code
• Target code optimizer -gt better target code

13
• Symbol table
• Tracks all variable names and other symbols that
will have to be mapped to addresses later
• Literal table
• Tracks literals (such as numbers and strings)
that will have to be stored along with the
eventual program

14
Scanner
• Read a stream of characters
• Perform lexical analysis to generate tokens
• Update symbol and literal tables as needed
• Example
• Input aj 4 1
• Tokens ID Lbrack ID Rbrack EQL NUM PLUS NUM

15
Parser
• Performs syntax analysis
• Relates the sequence of tokens to the grammar
• Builds a tree that represents this relationship,
the parse tree

16
Partial Grammar
• assign-expr -gt expr expr
• array-expr -gt ID expr
• expr -gt array-expr
• expr -gt expr expr
• expr -gt ID
• expr -gt NUM

17
Example Parse
assign-expression

expression
expression
array-expression
expression
ID

expression
expression
ID
NUM
NUM
18
Abstract Syntax Tree
assign-expression
expression
expression
array-expression
expression
ID
expression
expression
ID
NUM
NUM
19
Semantic Analyzer
• Determine the meaning (not structure) of the
program
• This is compile-time or static semantics only
• Example aj 4 1
• a refers to an array location
• a contains integers
• j is an integer
• j is in the range of the array (not checked in C)
• Parse or Syntax tree is decorated with this
information

20
Source Code Optimizer
• Simplify and improve the source code by applying
rules
• Constant folding replace 42 by 6
• Combine common sub-expressions
• Reordering expressions (often prior to constant
folding)
• Etc.
• Result modified, decorated syntax tree or
Intermediate Representation

21
Code Generator
• Generates code for the target machine
• Example
• MOV R0, j value of j into R0
• MUL R0, 2 2j in R0 (int 2 wds)
• MOV R1, a value of a in R1
• MOV R1, 6 6 into address in R1

22
Target Code Optimizer
• Apply rules to improve machine code
• Example
• MOV R0, j
• SHL R0 (shift to multiply by 2)
• Use more complex
• MOV aR0, 6 machine instruction to
• replace simpler ones

23
Major Data Structures
• Tokens
• Syntax Tree
• Symbol Table
• Literal Table
• Intermediate Code
• Temporary files

24
Structuring a Compiler
• Analysis vs. Synthesis
• Analysis understanding the source code
• Synthesis generating the target code
• Front end vs. Back end
• Front end parsing intermediate code
generation (target machine-independent)
• Back end target code generation
• Optimization included in both parts

25
Multiple Passes
• Each pass process the source code once
• One pass per phase
• One pass for several phases
• One pass for entire compilation
• Language definition can preclude one-pass
compilation

26
Runtime Environments
• Static (e.g. FORTRAN)
• No pointers, no dynamic allocation, no recursion
• All memory allocation done prior to execution
• Stack-based (e.g. C family)
• Stack for nested allocation (call/return)
• Heap for random allocation (new)
• Fully dynamic (LISP)
• Allocation is automatic (not in source code)
• Garbage collection required

27
Error Handling
• Each phase finds and handles its own types of
errors
• Scanning errors like 1o1 (invalid ID)
• Parsing syntax errors
• Semantic Analysis type errors
• Runtime errors handled by the runtime environment
• Exception handling by programmer often allowed

28
Compiling the Compiler
• Using machine language
• Immediately executable, hard to write
• Necessary for the first (FORTRAN) compiler
• Using a language with an existing compiler and
the same target machine
• Using the language to be compiled (bootstrapping)

29
Bootstrapping
• Write a quick dirty compiler for a subset of
the language (using machine language or another
available HLL)
• Write a complete compiler in the language subset
• Compile the complete compiler using the quick
dirty compiler