CSE Translation of Programming Languages AKA: Compilers - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

CSE Translation of Programming Languages AKA: Compilers

Description:

Understand the processes, algorithms, and mathematics of programming language translation ... C USING HERON'S FORMULA WE CALCULATE THE. C AREA OF THE TRIANGLE ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: cbo4
Category:

less

Transcript and Presenter's Notes

Title: CSE Translation of Programming Languages AKA: Compilers


1
CSE Translation of Programming LanguagesAKA
Compilers
  • Charles B. Owen (Instructor)
  • 1138 E. B., 353-6488
  • Ken Horne (TA and grading)
  • Classroom 1225 Engineering Building

2
Introduction
Introduction to the class Structure, rules, etc.
Getting Started Why are we here? What does a
compiler do?
3
Course Objectives
  • Understand the processes, algorithms, and
    mathematics of programming language translation
  • Programming methods, algorithms, data structures,
    mathematics, etc.

4
Why Are We Here?
5
Well, of course
  • Its valuable to know how compilers work
  • You can write more efficient code
  • You can debug better
  • You can impress your friends
  • More

6
Its not just compilers
  • The ideas in this course are useful for
  • Expression evaluation in programs
  • Adding scripting language features
  • Parsing multimedia file formats (like MP3 or
    MPEG)
  • Creating network protocols
  • User interface design
  • Computer aided design
  • Hardware design
  • More

7
Course Structure
  • See the syllabus
  • http//www.cse.msu.edu/cse450
  • MW Lectures
  • Attendance is expected

8
Course Materials
  • Textbooks
  • Compilers Principles, Techniques, and Tools (2nd
    Edition), Aho, Lam, Sethi, and Ullman, 2006,
    ISBN-13 978-0321486813.
  • lex and yacc, Brown, Levine, and Mason, 1995,
    ISBN-13 978-1565920002.
  • WWW
  • http//www.cse.msu.edu/cse450
  • And on angel (angel.msu.edu)

9
Course Structure
  • Exams
  • Midterm exam
  • Final exam
  • Assignments
  • 6 programming assignment (planned)
  • Toe-tippers

Notice Bring a red pen to class
10
Policies
  • Reading
  • Read the chapters
  • Attendance
  • Will take care of itself
  • Other
  • Syllabus is online

11
In case you are interested...
  • Projects will build as a sequence
  • Well have some group projects
  • Ill try to show the use of these techniques
    beyond basic compilers

12
Reading and First Programming Assignment
  • I suggest
  • Read chapter 1 in text.
  • Start reading chapter 2
  • Project 1
  • Will begin next week

13
How are languages implemented
Source program
  • Compilers
  • Translate a language to some other form
  • Might be machine language, or could be a
    different language or byte-codes or something else

Compiler
Target program
  • Interpreters
  • Directly execute the programming language
  • Sort of like you do when you hand execute a
    program

Source program
Interpreter
Output
Input
Note A language is neither interpreted or
compiled. The implementation is what determines
this distinction. Some languages lend themselves
better to one method or the other.
14
Compiler Examples
C program
  • C
  • Usually compiles directly to machine language

Compiler
Java program
Machine Language
  • Java
  • Compiles to an intermediate code that no CPU
    actually executes.
  • This is then interpreted by a Java Virtual
    Machine (JVM).

Compiler
Byte-codes
Java byte-codes can also be compiled to machine
language.
15
Interpreter Examples
  • MATLAB
  • You can just type in the statements and they
    execute right away.

Matlab program
Interpreter
Output
Input
Byte codes
  • Common Language Runtime,
  • Java Virtual Machine
  • The most basic implementation

JVM
Output
Input
16
What are some consequences of each?
Source program
  • Compilers?

Compiler
Target program
Source program
  • Interpreters?

Interpreter
Output
Input
TT
17
What are some consequences of each?
Source program
  • Compilers?
  • Can be as slow as necessary
  • Can spend time optimizing code
  • Sees the program in its entirety

Compiler
Target program
Source program
  • Interpreters?
  • Can be interactive
  • Fastest time from start to execution

Interpreter
Output
Input
Mixture of both are common. Interpret dynamic
statements and for rapid startup and compile for
better performance later.
18
History of High-Level Languages
  • 1954 IBM 704
  • All programming in assembly
  • Programming costs exceeded hardware costs!

TEXT FEED COMPOSITION n20 TAPE v213
TAPE v240 TAPE n20 v140 TAPE n2 v3 0 n0
4 n1 0 v0 4 43) v215 v0 1 v215 v213 /
v215 v218 v213 - v215 n1 0 42) v(20n1)
v(240n1) x v213 n1 n1 1 -gt 42, n1 0
  • Solution Speedcoding
  • An interpreted computer language
  • Simple language to express floating point
    calculations

The IBM 704 did not have any floating point
support, it was implemented in the speedcoding
interpreter
19
Enter John Backus
  • Idea Translate programs to assembly

READ INPUT TAPE 5, 501, IA, IB, IC 501 FORMAT
(3I5) C IA, IB, AND IC MAY NOT BE NEGATIVE
IF (IA) 777, 777, 701 701 IF (IB) 777, 777,
702 702 IF (IC) 777, 777, 703 703 IF
(IAIB-IC) 777,777,704 704 IF (IAIC-IB)
777,777,705 705 IF (IBIC-IA) 777,777,799 777
STOP 1 C USING HERON'S FORMULA WE CALCULATE THE C
AREA OF THE TRIANGLE 799 S FLOATF (IA IB
IC) / 2.0 AREA SQRT( S (S - FLOATF(IA))
(S - FLOATF(IB)) (S - FLOATF(IC)))
WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA
601 FORMAT (4H A ,I5,5H B ,I5,5H C ,I5,8H
AREA ,F10.2, 13H SQUARE UNITS) STOP
END
  • Result Fortran I
  • 1954-1957
  • By 1958, 50 of all software is in Fortran!

He invented many of the basic techniques well
use in this course!
20
Structure of a Compiler
character stream
Lexical Analysis
token stream
These steps are often done in phases or
passes. This structure is very common. Each
step will be a set of algorithms well explore.
Parsing
Front End
syntax tree
Semantic Analysis
syntax tree
Intermediate Code Generate
Symbol Table
intermediate code
Optimization
Back End
intermediate code
Code Generation
target machine code
21
Lexical Analysis
character stream
Lexical Analysis
Read the character stream and converts it into a
stream of tokens A sequential set of characters,
called a lexeme, becomes a token. Were
recognizing substrings that are meaningful.
token stream
What is meaningful about this
speed speed 10 time
22
Lexemes for this string
speed speed 10 time
Well convert each of these into a token of the
form ltname, valuegt. Sometime the value will be
omitted. speed becomes ltid, 1gt, where id
means this is a symbol and 1 is the location in
the symbol table. 10 becomes ltconstant, 10gt
(or just lt10gt in your textbook)
Symbol Table
Sort of like recognizing the words in a sentence.
23
Lexemes for this string
speed speed 10 time
Lexical Analysis
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Symbol Table
The tool lex creates lexical analyzers
24
Lexical Analysis
The lexemes and their tokens will be determined
by the language.
sing func count rest prin pick "99
bottles " "no bottles " "1 bottle " count
"bottles " min 4 count 2 print
rest
REBOL
def bottles (_at_bottles.zero? ? "no more"
_at_bottles).to_s ltlt " bottle" ltlt ("s" unless
_at_bottles 1).to_s end
RUBY
Things that become lexemes punctuation,
symbols, keywords, constants, etc.
TT
25
Syntax Analysis
token stream
Parsing
Converting the token stream into a syntax tree.
In a syntax tree, the nodes are operations and
the children are the arguments to the operation.
syntax tree
What are the operations and arguments here?
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Sort of like diagramming a sentence in English
class.
26
Syntax Trees
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Heres an operation for sure
ltgt
ltid,1gt ltgt lt10gt ltgt ltid, 2gt
ltid, 1gt
27
A complete syntax tree
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Parsing
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Symbol Table
28
What about this code?
" bottle" ltlt ("s" unless _at_bottles 1).to_s
ltbottlegt ltinsertiongt lt(gt ltsgt ltunlessgt lt_at_gt
ltid,1gt ltgt lt1gt lt)gt lt.gt ltid,2gt
TT
29
Semantic Analysis
syntax tree
Semantic Analysis
  • Semantics are the meaning of the programming
    language.
  • Now were going to analyze our syntax tree to see
    if it is, or can be converted, to a tree that
    semantically meaningful.
  • Common checks
  • Valid arguments
  • Type checking

syntax tree
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Symbol Table
How were the types determined? Do we have any
type issues here?
30
Silly English analogies for semantic analysis
Jack said Bob is an idiot. Who does idiot refer
to? The rain in Spain stays mainly in the plain.
Where does it rain? Where is that soggy
plain? Jack left her homework at home. This is a
type mismatch (Jacks a guy).
31
Type Coercion
ltgt
We modify the syntax tree to fix semantic issues
are the fixable What if there are not
fixable? Whats an example of something not
fixable?
ltid, 1gt
ltgt
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
Coercion
lt10gt
Symbol Table
How were the types determined? Do we have any
type issues here?
32
Semantic Analysis
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Semantic Analysis
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
lt10gt
TT
33
Intermediate Code Generator
syntax tree
Intermediate Code Generate
intermediate code
Most compilers convert the syntax tree into some
intermediate code. This is then subject to
optimization and conversion to the final machine
code. Why an intermediate code?
34
Intermediate Code Generator
syntax tree
Intermediate Code Generate
intermediate code
Most compilers convert the syntax tree into some
intermediate code. This is then subject to
optimization and conversion to the final machine
code. Why an intermediate code?
  • Intermediate code is usually more general and
    easier to optimize.
  • Many compilers have the same back end for
    multiple front ends.

gcc compiles both C and C to the same
intermediate code, then uses a common back end
for both.
35
Intermediate code example
ltgt
ltid, 1gt
ltgt
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
Each operation became a line of intermediate
code. The t values are temporary variables.
lt10gt
The textbook refers to this as three-address
code. Each operation has up to 3 operands (some
have fewer). Can you see the three operands in
each of these statements?
36
Intermediate code example
ltgt
ltid, 1gt
ltgt
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
lt10gt
t2 t1 id2 Operands are t2, t1, id2 This of
this like an assembly instruction mult t1, id2,
t2 t1 inttofloat(10) Operands are t1, 10
This is designed as an easy to understand
assembly language.
TT
37
Optimization
intermediate code
Optimization
intermediate code
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
Optimization Making the code more
efficient. Any optimization ideas here?
38
Optimization
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
Optimization
t2 10.0 id2 id1 id1 t2
39
Code Generation
intermediate code
Code Generation
Translate the intermediate code into a target
code.
target machine code
t2 10.0 id2 id1 id1 t2
Code Generation
LDF R2, id2 MULF R2, R2, 60.0 LDF R1, t2 ADDF R1,
R1, R2 STF id1, R1
40
Other issues
Symbol tables are heavily uses. You need very
efficient data structures. Any ideas? What ways
might we be access the symbol table? Optimization
is a major area and may be done after final code
generation as well. Compilers are large, complex
pieces of software and a major task for software
engineers.
Write a Comment
User Comments (0)
About PowerShow.com