Title: Course Notes for CS1621 Structure of Programming Languages Part A By John C' Ramirez Department of C
1Course Notes forCS1621 Structure of Programming
LanguagesPart AByJohn C. RamirezDepartment of
Computer ScienceUniversity of Pittsburgh
2- These notes are intended for use by students in
CS1621 at the University of Pittsburgh and no one
else - These notes are provided free of charge and may
not be sold in any shape or form - Material from these notes is obtained from
various sources, including, but not limited to,
the textbooks - Concepts of Programming Languages, Seventh
Edition, by Robert W. Sebesta (Addison Wesley) - Programming Languages, Design and Implementation,
Fourth Edition, by Terrence W. Pratt and Marvin
V. Zelkowitz (Prentice Hall) - Compilers Principles, Techniques, and Tools, by
Aho, Sethi and Ullman (Addison Wesley)
3Goals of Course
- To survey the various programming languages,
their purposes and their histories - Why do we have so many languages?
- How did these languages develop?
- Are some languages better than others for some
things? - To examine methods for describing language syntax
and semantics
4Goals of Course
- Syntax indicates structure of program code
- How can language designer specify this?
- How can programmer learn language?
- How can compiler recognize this?
- Lexical analysis
- Parsing (syntax analysis)
- Brief discussing of parsing techniques
- Semantics indicate meaning of the code
- What code will actually do
- Can we effectively do this in a formal way?
- Static semantics
- Dynamic semantics
5Goals of Course
- To examine some language features and constructs
and how they are used and implemented in various
languages - Variables and constants
- Types, binding and type checking
- Scope and lifetime
- Data Types
- Primitive types
- Array types
- Structured data types
6Goals of Course
- Pointer (reference) types
- Assignment statements and expressions
- Operators, precedence and associativity
- Type coercions and conversions
- Boolean expressions and short-circuit evaluation
- Control statements
- Selection
- Iteration
- Unconditional branching goto
7Goals of Course
- Process abstraction procedures and functions
- Parameters and parameter-passing
- Generic subprograms
- Nonlocal environments and side-effects
- Implementing subprograms
- Subprograms in static-scoped languages
- Subprograms in dynamic-scoped languages
8Goals of course
- Data abstraction and abstract data types
- Object-oriented programming
- Design issues
- Implementations in various object-oriented
languages - Concurrency
- Concurrency issues
- Subprogram level concurrency
- Implementations in various languages
- Statement level concurrency
9Goals of Course
- Exception handling
- Issues and implementations
- IF TIME PERMITS
- Functional programming languages
- Logic programming languages
10Language Development Issues
- Why do we have high-level programming languages?
- Machine code is too difficult for us to read,
understand and debug - Machine code is not portable between
architectures - Why do we have many high-level programming
languages? - Different people and companies developed them
11Language Development Issues
- Different languages are either designed to or
happen to meet different programming needs - Scientific applications
- FORTRAN
- Business applications
- COBOL
- AI
- LISP, Scheme ( Prolog)
- Systems programming
- C
- Web programming
- Perl, PHP, Javascript
- General purpose
- C, Ada, Java
12Language Development Issues
- Programming language qualities and evaluation
criteria - Readability
- How much can non-author understand logic of code
just by reading it? - Is code clear and unambiguous to reader?
- These are often subjective, but sometimes is is
fairly obvious - Examples of features that help readability
- Comments
- Long identifier names
- Named constants
13Language Development Issues
- Clearly understood control statements
- Language orthogonality
- Simple features combine in a consistent way
- But it can go too far, as explained in the text
about Algol 68 - Writability
- Not dissimilar to readability
- How easy is it for programmer to use language
effectively? - Can depend on the domain in which it is being
used - Ex LISP is very writable for AI applications but
would not be so good for systems programming - Also somewhat subjective
14Language Development Issues
- Examples of features that help writability
- Clearly understood control statements
- Subprograms
- Also orthogonality
- Reliability
- Two different ideas of reliability
- Programs are less susceptible to logic errors
- Ex Assignment vs. comparison in C
- See assign.cpp and assign.java
- Programs have the ability to recover from
exceptional situations - Exception handling we will discuss more later
15Language Design Issues
- Many factors influence language design
- Architecture
- Most languages were designed for single processor
von Neumann type computers - CPU to execute instructions
- Data and instructions stored in main memory
- General language approach
- Imperative languages
- Fit well with von Neumann computers
- Focus is on variables, assignment, selection and
iteration - Examples FORTRAN, Pascal, C, Ada, C, Java
16Language Design Issues
- Imperative language evolution
- Simple straight-line code
- Top-down design and process abstraction
- Data abstraction and ADTs
- Object-oriented programming
- Some consider object-oriented languages not to be
imperative, but most modern oo languages have
imperative roots (ex. C, Java) - Functional languages
- Focus is on function and procedure calls
- Mimics mathematical functions
- Less emphasis on variables and assignment
- In strictest form has no iteration at all
recursion is used instead - Examples LISP, Scheme
- See example
17Language Design Issues
- Logic programming languages
- Symbolic logic used to express propositions,
rules and inferences - Programs are in a sense theorems
- User enters a proposition and system uses
programmers rules and propositions in an attempt
to prove it - Typical outputs
- Yes Proposition can be established by program
- No Proposition cannot be established by program
- Example Prolog see example program
- Cost
- What are the overall costs associated with a
given language? - How does the design affect that cost?
18Language Design Issues
- Training programmers
- How easy is it to learn?
- Writing programs
- Is language a good fit for the task?
- Compiling programs
- How long does it take to compile programs?
- This is not as important now as it once was
- Executing programs
- How long does program take to run?
- Often there is a trade-off here
- Ex Java is slower than C but it has many
run-time features (array bounds checking,
security manager) that are lacking in C
19Language Implementation Issues
- How is HLL code processed and executed on the
computer? - Compilation
- Source code is converted by the compiler into
binary code that is directly executable by the
computer - Compilation process can be broken into 4 separate
steps - Lexical Analysis
- Breaks up code into lexical units, or tokens
- Examples of tokens reserved words, identifiers,
punctuation - Feeds the tokens into the syntax analyzer
20Language Implementation Issues
- Syntax Analysis
- Tokens are parsed and examined for correct
syntactic structure, based on the rules for the
language - Programmer syntax errors are detected in this
phase - Semantic Analysis/Intermediate Code Generation
- Declaration and type errors are checked here
- Intermediate code generated is similar to
assembly code - Optimizations can be done here as well, for
example - Unnecessary statements eliminated
- Statements moved out of loops if possible
- Recursion removed if possible
21Language Implementation Issues
- Code Generation
- Intermediate code is converted into executable
code - Code is also linked with libraries if necessary
- Note that steps 1) and 2) are independent of the
architecture depend only upon the language
Front End - Step 3) is somewhat dependent upon the
architecture, since, for example, optimizations
will depend upon the machine used - Step 4) is clearly dependent upon the
architecture Back End
22Language Implementation Issues
- Interpreting
- Program is executed in software, by an
interpreter - Source level instructions are executed by a
virtual machine - Allows for robust run-time error checking and
debugging - Penalty is speed of execution
- Example Some LISP implementations, Unix shell
scripts and Web server scripts
23Language Implementation Issues
- Hybrid
- First 3 phases of compilation are done, and
intermediate code is generated - Intermediate code is interpreted
- Faster than pure interpretation, since the
intermediate codes are simpler and easier to
interpret than the source codes - Still much slower than compilation
- Examples Java and Perl
- However, now Java uses JIT Compilation also
- Method code is compiled as it is called so if it
is called again it will be faster
24Brief, Incomplete PL History
- Early 50s
- Early HLLs started to emerge
- FORTRAN
- Stands for FORmula TRANslating system
- Developed by a team led by John Backus at IBM for
the IBM 704 machine - Successful in part because of support by IBM
- Designed for scientific applications
- The root of the imperative language tree
25Brief PL History
- Lacked many features that we new take for granted
in programming languages - Conditional loops
- Statement blocks
- Recursive abilities
- Many of these features were added in future
versions of FORTRAN - FORTRAN II, FORTRAN IV, FORTRAN 77, FORTRAN 90
- Had some interesting features that are now
obsolete - COMMON, EQUIVALENCE, GOTO
- We may discuss what these are later
26Brief PL History
- Late 50s
- COBOL
- COmmon Business Oriented Language
- Developed by US DoD
- Separated data and procedure divisions
- But didnt allow functions or parameters
- Still widely used, due in part to the large cost
of rewriting software from scratch - Big companies would rather maintain COBOL
programs than rewrite them in a different language
27Brief PL History
- LISP
- LISt Processing
- Developed by John McCarthy of MIT
- Functional language
- Good for symbolic manipulation, list processing
- Had recursion and conditional expressions
- Not in original FORTRAN
- At one time used extensively for AI
- Today most widely used version, COMMON LISP, has
included some imperative features
28Brief PL History
- ALGOL
- ALGOL 58 and then ALGOL 60 both designed by
international committee - Goals for the language
- Syntax should be similar to mathematical notation
and readable - Should be usable for algorithms in publications
- Should be compilable into machine code
- Included some interesting features
- Pass by value and pass by name (wacky!)
parameters - Recursion (first in an imperative language)
- Dynamic arrays
- Block structure and local variables
29Brief PL History
- Introduced Backus-Naur Form (BNF) as a way to
describe the language syntax - Still commonly used today, but not well-accepted
at the time - Never widely used, but influenced virtually all
imperative languages after it
30Brief PL History
- Late 60s
- Simula 67
- Designed for simulation applications
- Introduced some interesting features
- Classes for data abstraction
- Coroutines for re-entrant subprograms
- ALGOL 68
- Emphasized orthogonality and user-defined data
types - Not widely used
31Brief PL History
- 70s
- Pascal
- Developed by Nicklaus Wirth
- No major innovations, but due to its simplicity
and emphasis of good programming style, became
widely used for teaching - C
- Developed by Dennis Ritchie to help implement the
Unix operating system - Has a great deal of flexibility, esp. with types
- Incomplete type checking
32Brief PL History
- Void pointers
- Coerces many types
- Many programmers (esp. systems programmers) love
it - Language purists hate it
- Easy to miss logic errors
- Prolog
- Logic programming
- We discussed it a bit already
- Still used somewhat, mostly in AI
- May discuss in more detail later
33Brief PL History
- 80s
- Ada
- Developed over a number of years by DoD
- Goal was to have one language for all DoD
applications - Especially for embedded systems
- Contains some important features
- Data encapsulation with packages
- Generic packages and subprograms
- Exception handling
- Tasks for concurrent execution
- We will discuss some of these later
34Brief PL History
- Very large language difficult to program
reliably, even though reliability was one of its
goals! - Early compilers were slow and error-prone
- Did not have the widespread general use that was
hoped - Eventually the government stopped requiring it
for DoD applications - Use faded after this
- Not used widely anymore
- Ada 95 added object-oriented features
- Still wasn't used much, especially with the
advent of Java and other OO languages
35Brief PL History
- Smalltalk
- Designed and developed by Alan Kay
- Concepts developed in 60s, but language did not
come to fruition until 1980 - Designed to be used on a desktop computer 15
years before desktop computers existed - First true object-oriented language
- Language syntax is geared toward objects
- messages passed between objects
- methods are invoked as responses to messages
- Always dynamically bound
- All classes are subclasses of Object
- Also included software devel. environment
- Had large impact on future OOLs, esp. Java
36Brief PL History
- C
- Developed largely by Bjarne Stroustrup as an
extension to C - Backward compatible
- Added object-oriented features and some
additional typing features to improve C - Very powerful and very flexible language
- But still has reliability problems
- Ex. no array bounds checking
- Ex. dynamic memory allocation
- Widely used and likely to be used for a while
longer
37Brief PL History
- Perl
- Developed by Larry Wall
- Takes features from C as well as scripting
languages awk, sed and sh - Some features
- Regular expression handling
- Associative arrays
- Implicit data typing
- Originally used for data extraction and report
generation - Evolved into the archetypal Web scripting
language - Has many proponents and detractors
38Brief PL History
- 90s
- Java
- Interestingly enough, just like Ada, Java was
originally developed to be used in embedded
systems - Developed at Sun by a team headed by James
Gosling - Syntax borrows heavily from C
- But many features (flaws?) of C have been
eliminated - No explicit pointers or pointer arithmetic
- Array bounds checking
- Garbage collection to reclaim dynamic memory
39Brief PL History
- Object model of Java actually more resembles that
of Smalltalk rather than that of C - All variables are references
- Class hierarchy begins with Object
- Dynamic binding of method names to operations by
default - But not as pure in its OO features as Smalltalk,
due to its imperative control structures - Interpreted for portability and security
- Also JIT compilation now
- Growing in popularity, largely due to its use on
Web pages
40Brief PL History
- 00's (aughts? oughts? naughts?)
- See http//www.randomhouse.com/wotd/index.pperl?da
te19990803 - C
- Main roots in C and Java with some other
influences as well - Used with the MS .NET programming environment
- Some improvements and some deprovements compared
to Java - Likely to succeed given MS support
41Program Syntax
- Recall job of syntax analyzer
- Groups (parses) tokens (fed in from lexical
analyzer) into meaningful phrases - Determines if syntactic structure of token stream
is legal based on rules of the language - Lets look at this in more detail
- How does compiler know what is legal and what
is not? - How does it detect errors?
42Program Syntax
- To answer these questions we must look at
programming language syntax in a more formal way - Language
- Set of strings of lexemes from some alphabet
- Lexemes are the lowest level syntactic elements
- Lexemes are made up of characters, as defined by
the character set for the language
43Program Syntax
- Lexemes are categorized into different tokens and
processed by the lexical analyzer - Ex
- Lexemes if, (, width, lt, height, ), , cout, ltlt,
width, ltlt, endl, , - Tokens iftok, lpar, idtok, lt, idtok, rpar,
lbrace, idtok, llt, idtok, llt, idtok, semi,
rbrace - Note that some tokens correspond to single
lexemes (ex. iftok) whereas some correspond to
many (ex. idtok)
if (width lt height) cout ltlt width ltlt endl
44Program Syntax
- How do we formally define a language?
- Assume we have a language, L, defined over an
alphabet, ?. - 2 related techniques
- Recognition
- An algorithm or mechanism, R, will process any
given string, S, of lexemes and correctly
determine if S is within L or not - Not used for enumeration of all strings in L
- Used by parser portion of compiler
45Program Syntax
- Generation
- Produces valid sentences of L
- Not as useful as recognition for compilation,
since the valid sentences could be arbitrary - More useful in understanding language syntax,
since it shows how the sentences are formed - Recognizer only says if sentence is valid or not
more of a trial and error technique
46Program Syntax
- So recognizers are what compilers need, but
generators are what programmers need to
understand language - Luckily there are systematic ways to create
recognizers from generators - Thus the programmer reads the generator to
understand the language, and a recognizer is
created from the generator for the compiler
47Language Generators
- Grammar
- A mechanism (or set of rules) by which a language
is generated - Defined by the following
- A set of non-terminal symbols, N
- Do not actually appear in strings
- A set of terminal symbols, T
- Appear in strings
- A set of productions, P
- Rules used in string generation
- A starting symbol, S
48Language Generators
- Noam Chomsky described four classes of grammars
(used to generate four classes of languages)
Chomsky Hierarchy - )Unrestricted
- Context-sensitive
- Context-free
- Regular
- More info on unrestricted and context-sensitive
grammars in a theory course - The last two will be useful to us
49Language Generators
- Regular Grammars
- Productions must be of the form
- ltnongt ? lttergtltnongt lttergt
- where ltnongt is a nonterminal, lttergt is a
terminal, and represents either or - Can be modeled by a Finite-State Automaton (FSA)
- Also equivalent to Regular Expressions
- Provide a model for building lexical analyzers
50Language Generators
- Have following properties (among others)
- Can generate strings of the form ?n, where ? is a
finite sequence and n is an integer - Pattern recognition
- Can count to a finite number
- Ex. an n 85
- But we need at least 86 states to do this
- Cannot count to arbitrary number
- Note that an for any n (i.e. 0 or more
occurrences) is easy do not have to count - Important to realize that the number of states is
finite cannot recognize patterns with an
arbitrary number of possibilities
51Language Generators
- Example Regular grammar to recognize Pascal
identifiers (assume no caps) - N Id, X T a..z, 0..9 S Id
- P
- Id ? aX bX a b z
- X ? aX bX 0X 9X a z 0
9 - Consider equiv. FSA
a
0
Id
z
9
52Language Generators
- Example Regular grammar to generate a binary
string containing an odd number of 1s - N A,B T 0,1 S A P
- A ? 0A 1B 1
- B ? 0B 1A 0
- Example Regular grammars CANNOT generate strings
of the form anbn - Grammar needs some way to count number of as and
bs to make sure they are the same - Any regular grammar (or FSA) has a finite number,
say k, of different states - If n gt k, not possible
53Language Generators
- If we could add a memory of some sort we could
get this to work - Context-free Grammars
- Can be modeled by a Push-Down Automaton (PDA)
- FSA with added push-down stack
- Productions are of the form
- ltnongt ? ?, where ltnongt is a nonterminal and ? is
any sequence of terminals and nonterminals - note rhs is more flexible now
54Language Generators
- So how to generate anbn ? Let a0, b1
- N A T 0,1 S A P
- A ? 0A1 01
- Note that now we can have a terminal after the
nonterminal as well as before - Can also have multiple nonterminals in a single
production - Example Grammar to generate sets of balanced
parentheses - N A T (,) S A P
- A ? AA (A) ()
55Language Generators
- Context-free grammars are also equivalent to BNF
grammars - Developed by Backus and modified by Naur
- Used initially to describe Algol 60
- Given a (BNF) grammar, we can derive any string
in the language from the start symbol and the
productions - A common way to derive strings is using a
leftmost derivation - Always replace leftmost nonterminal first
- Complete when no nonterminals remain
56Language Generators
- Example Leftmost derivation of nested parens
(()(())) - A ? (A)
- ? (AA)
- ? (()A)
- ? (()(A))
- ? (()(()))
- We can view this derivation as a tree, called a
parse tree for the string
57Language Generators
A
(
A
)
A
A
)
(
)
(
A
)
(
58Language Generators
- If, for a given grammar, a string can be derived
by two or more different parse trees, the grammar
is ambiguous - Some languages are inherently ambiguous
- All grammars that generate that language are
ambiguous - Many other languages are not themselves
ambiguous, but can be generated by ambiguous
grammars - It is generally better for use with compilers if
a grammar is unambiguous - Semantics are often based on syntactic form
59Language Generators
- Ambiguous grammar example Generate strings of
the form 0n1m, where n,m gt 1 - N A,B,C T 0,1 S A P
- A ? BC 0A1
- B ? 0B 0
- C ? 1C 1
- Consider the string 00011
A
A
C
B
0
A
1
B
0
C
1
B
C
B
0
1
1
B
0
0
0
60Language Generators
- We can easily make this grammar unambiguous
- Remove production A ? 0A1
- Note that nonterminal B can generate an arbitrary
number of 0s and nonterminal C can generate an
arbitrary number of 1s - Now only one parse tree
A
C
B
B
0
C
1
B
0
1
0
61Language Generators
- Lets look at a few more examples
- Grammar to generate WWR W ? 0,1
- N A T 0,1 S A P ?
- Grammar to generate strings in 0,1 of the form
WX such that W X but W ! X - This one is a little tricker
- How to approach this problem?
- We need to guarantee two things
- Overall string length is even
- At least one bit differs in the two halves
S ? 0A0 1A1 00 11
62Language Generators
- See board
- Ok, now how do we make a grammar to do this?
- Make every string (even length) the result of two
odd-length strings appended to each other - Assume odd-length strings are Ol and Or
- Make sure that either
- Ol has a 1 in the middle and Or has a 0 in the
middle or - Ol has a 0 in the middle and Or has a 1 in the
middle - Productions
In ? AB BA A ? 0A0 1A1 1A0 0A1 1 B ?
0B0 1B1 1B0 0B1 0
63Language Generators
- Lets look at an example more relevant to
programming languages - Grammar to generate simple assignment statements
in a C-like language (diff. from one in text) - ltassig stmtgt ltvargt ltarith exprgt
- ltarith exprgt lttermgt ltarith exprgt lttermgt
ltarith exprgt - lttermgt - lttermgt ltprimarygt lttermgt ltprimarygt
lttermgt / ltprimarygt - ltprimarygt ltvargt ltnumgt (ltarith exprgt)
- ltvargt ltidgt ltidgtltsubscript listgt
- ltsubscript listgt ltarith exprgt ltsubscript
listgt, ltarith exprgt
64Language Generators
- Parse tree for X (A2Y) 20
ltassig stmtgt
ltvargt ltarith exprgt
ltidgt
lttermgt
lttermgt ltprimarygt
ltnumgt
ltprimarygt
( ltarith exprgt )
ltarith exprgt lttermgt
lttermgt
ltprimarygt
ltprimarygt
ltvargt
ltvargt
ltidgt
ltidgt ltsubscript listgt
ltarith exprgt
lttermgt
ltprimarygt
ltnumgt
65Language Generators
- Wow that seems like a very complicated parse
tree to generate such a short statement - Extra non-terminals are often necessary to remove
ambiguity - Extra non-terminals are often necessary to create
precedence - Precedence in previous grammar has and / higher
than and / - They would be lower in the parse tree
- LOWER ABOVE IS CORRECT
- What about associativity
- Left recursive productions left associativity
- Right recursive productions right associativity
66Language Generators
- But Context-free grammars cannot generate
everything - Ex Strings of the form WW in 0,1
- Cannot guarantee that arbitrary string is the
same on both sides - Compare to WWR
- These we can generate from the middle and build
out in each direction - For WW we would need separate productions for
each side, and we cannot coordinate the two with
a context-free grammar - Need Context-Sensitive in this case
67Language Generators
- Lets look at one more grammar example
- Grammar to generate all postfix expressions
involving binary operators and -. Assume ltidgt
is predefined and corresponds to any variable
name - Ex v w x y - z -
- How do we approach this problem?
- Terminals easy
- Nonterminals/Start require some thought
- Productions require a lot of thought
68Language Generators
- T ltidgt, , -
- N A
- S A
- P
- A ? AA AA- ltidgt
- Show parse tree for previous example
- Is this grammar LL(1)?
- We will discuss what this means soon
69Parsers
- Ok, we can generate languages, but how to
recognize them? - We need to convert our generators into
recognizers, or parsers - We know that a Context-free grammar corresponds
to a Push-Down Automaton (PDA) - However, the PDA may be non-deterministic
- As we saw in examples, to create a parse tree we
sometimes have to guess at a substitution
70Parsers
- May have to guess a few times before we get the
correct answer - This does not lend itself to programming language
parsing - Wed like parser to never have to guess
- To eliminate guessing, we must restrict the PDAs
to deterministic PDAs, which restricts the
grammars that we can use - Must be unambiguous
- Some other, less obvious restrictions, depending
upon parsing technique used
71Parsers
- There are two general categories of parsers
- Bottom-up parsers
- Can parse any language generated by a
Deterministic PDA - Build the parse trees from the leaves up back to
the root as the tokens are processed - At each step, a substring that matches the
right-hand side of a production is substituted
with the left side of the production - Reduces input string all the way back to the
start symbol for the grammar - Also called shift-reduce parsing
72Parsers
- Correspond LR(k) grammars
- Left to right processing of string
- Rightmost derivation of parse tree (in reverse)
- k symbols lookahead required
- LR parsers are difficult to write by hand, but
can be produced systematically by programs such
as YACC (Yet Another Compiler Compiler). - Primary variations of LR grammars/parsers
- SLR (Simple LR)
- LALR (Look Ahead LR)
- LR most general but also most complicated to
implement - We'll leave details to CS 1622
73Parsers
- Top-down parsers
- Build the parse trees from the root down as the
tokens are processed - Also called predictive parsers, or LL parsers
- Left-to-right processing of string
- Leftmost derivation of parse tree
- The LL(1) that we saw before means we can parse
with only one token lookahead - More restrictive than LR parsers there are
grammars generated by Deterministic PDAs that are
not LL grammars (i.e. cannot be parsed by an LL
parser) - Some restrictions on productions allowed
- Cannot handle left-recursion we'll see why
shortly
74Parsers
- Implementing a top-down parser
- One technique is Recursive Descent
- Can think of each production as a function
- As string of tokens is parsed, terminal symbols
are consumed/processed and non-terminal symbols
generate function calls - Now we can see why left-recursive productions
cannot be handled - From Example 3.4
- ltexprgt ? ltexprgt lttermgt
- Recursion will continue indefinitely without
consuming any symbols
75Parsers
- Luckily, in most cases a grammar with left
recursion can be converted into one with only
right-recursion - Recursive Descent parsers can be written by hand,
or generated - Think of a program that processes the grammar by
creating a function shell for each non-terminal - Then details of function are filled in based upon
the various right-hand sides the non-terminal
generates - See example
76LL(1) Grammars
- So how can we tell if a grammar is LL(1)?
- Given the current non-terminal (or left side of a
production) and the next terminal we must be able
to uniquely determine the right side of the
production to follow - Remember that a non-terminal can have multiple
productions - As we previously mentioned, the grammar must not
be left recursive - However, not having left recursion is necessary
but not sufficient for an LL(1) grammar
77LL(1) Grammars
- Ex
- A ? aX aY
- We cannot determine which right side to follow
without more information than just "a" - How can we process a grammar to determine if this
situation occurs? - Calculate the First set for each RHS of
productions - First set of a sequence of symbols, S, is the set
of terminals that begin the strings derived from
S - Given multiple RHS for nonterminal N
- N ? ?1 ?2
- If First(?1) and First(?2) intersect, the grammar
is not LL(1)
78LL(1) Grammars
- So how do we calculate First() sets?
- Algorithm is given in Aho (see Slide 2)
- Consider symbol X
- If X is a terminal, First(X) X
- If X ? e is a production, add e to First(X)
- If X is a nonterminal and X ? Y1Y2Yk is a
production - Add a to First(X) if, for some i, a is in
First(Yi) and e is in all of First(Y1)
First(Yi-1) - Add e to First(X) if e is in First(Yj) for all j
1, 2, k - To calculate First(X1X2Xn) for some X1X2Xn
- Add non-e symbols of First(X1)
- If e is in First(X1), and non-e symbols of
First(X2) -
- If e is in all First(Xi), add e
79LL(1) Grammars
- A ? aB b cBB
- B ? aB bA aBb
- A ? aB CD E e
- B ? b
- C ? cA e
- D ? dA
- E ? dB
80Semantics
- Sematics indicate the meaning of a program
- What do the symbols just parsed actually say to
do? - Two different kinds of semantics
- Static Semantics
- Almost an extension of program syntax
- Deals with structure more than meaning, but at a
meta level - Handles structural details that are difficult or
impossible to handle with the parser - Ex Has variable X been declared prior to its
use? - Ex Do variable types match?
81Semantics
- Dynamic Semantics (often just called semantics)
- What does the syntax mean?
- Ex Control statements
- Ex Parameter passing
- Programmer needs to know meaning of statements
before he/she can use language effectively
82Semantics
- Static Semantics
- One technique for determining/checking static
semantics is Attribute Grammars - Start with a context-free grammar, and add to it
- Attributes (for the grammar symbols)
- Indicate some properties of the symbols
- Attribute computation functions (semantic
functions) - Allow attributes to be determined
- Predicate functions
- Indicate the static semantic rules
83Semantics
- Attributes made up of synthesized attributes
and inherited attributes - Synthesized Attributes
- Formed using attributes of grammar symbols lower
in the parse tree - Ex Result type of an expression is synthesized
from the types of the subexpressions - Inherited Attributes
- Formed using attributes of grammar symbols higher
in the parse tree - Ex Type of RHS of an assignment is expected to
match that of LHS the type is inherited from
the type of the LHS variable
84Semantics
- Semantic Functions
- Indicate how attributes are derived, based on the
static semantics of the language - Ex A B C
- Assume A, B and C can be integers or floats
- If B and C are both integers, RHS result type is
integer, otherwise it is float - Predicate functions
- Test attributes of symbols processed to see if
they match those defined by language - Ex A B C
- If RHS type attribute is not equal to LHS type
attribute, error (in some languages)
85Semantics
- Detailed Example in text
- Grammar Rules
- 1) ltassigngt ? ltvargt ltexprgt
- 2) ltexprgt ? ltvargt ltvargt
- 3) ltexprgt ? ltvargt
- 4) ltvargt ? A B C
- Attributes
- actual_type actual type of ltvargt or ltexprgt in
question (synthesized, but for a ltvargt we say
this is an intrinsic attribute) - expected_type associated with ltexprgt, indicating
the type that it SHOULD be inherited from
actual_type of ltvargt
86Semantics
- Semantic functions
- Parallel to syntax rules of the grammar
- See Ex. 3.6 in text
- 1) ltassigngt ? ltvargt ltexprgt
- ltexprgt.expected_type ? ltvargt.actual_type
- 2) ltexprgt ? ltvargt2 ltvargt3
- ltexprgt.actual_type ? if (ltvargt2.actual_type
int) and - (ltvargt3.actual_type int) then int
- else real
- end if
- 3) ltexprgt ? ltvargt
- ltexprgt.actual_type ? ltvargt.actual_type
- 4) ltvargt ? A B C
- ltvargt.actual_type ? look-up(ltvargt.string)
- Predicate functions
- Only one needed here do the types match?
- ltexprgt.actual_type ltexprgt.expected_type
87Semantics
ltassigngt
actual_type
expected type
ltvargt
ltexprgt
actual_type
ltvargt3
ltvargt2
A
actual_type
actual_type
B
C
88Semantics
- Attribute grammars are useful but not typically
used in their pure form for full-scale languages
makes the grammars more complicated and
compilers more difficult to generate
89Semantics
- Dynamic Semantics (semantics)
- Clearly vital to the understanding of the
language - In early languages they were simply informal
like manual pages - Efforts have been made in later years to
formalize semantics, just as syntax has been
formalized - But semantics tend to be more complex and less
precisely defined - More difficult to formalize
90Semantics
- Some techniques have gained support however
- Operational Semantics
- Define meaning by result of execution on a
primitive machine, examining the state of the
machine before and after the execution - Axiomatic Semantics
- Preconditions and postconditions define meaning
of statements - Used in conjunction with proofs of program
correctness - Denotational Semantics
- Map syntactic constructs into mathematical
objects that model their meaning - Quite rigorous and complex
91Identifiers, Reserved Words and Keywords
- Identifier
- String of characters used to name an entity
within a program - Most languages have similar rules for ids, but
not always - C and Java are case-sensitive, while Ada is not
- Can be a good thing mixing case allows for
longer, more readable names, ala Java
NoninvertibleTransformException - Can be a bad thing should that first i be upper
or lower case?
92Identifiers, Reserved Words and Keywords
- C, Ada and Java allow underscores, while
standard Pascal does not - FORTRAN originally allowed only 6 chars
- Reserved Word
- Name whose definition is part of the syntax of
the language - Cannot be used by programmer in any other way
- Most newer languages have reserved words
- Make parsing easier, since each reserved word
will be a different token
93Identifiers, Reserved Words and Keywords
- Ex end if in Ada
- Interesting extension topic
- If we extend a language and add new reserved
words, we may make some old programs
syntactically incorrect - Ex C subprogram using class as an id will not
compile with a C compiler - Ex Ada 83 program using abstract as an id will
not compile with an Ada 95 compiler - Keywords
- To some, keyword ? reserved word
- Ex C, Java
94Identifiers, Reserved Words and Keywords
- To others, there is a difference
- Keywords are only special in certain contexts
- Can be redefined in other contexts
- Ex FORTRAN keywords may be redefined
- Predefined Identifiers
- Identifiers defined by the language implementers,
which may be redefined - cin, cout in C
- real, integer in Pascal
- predefined classes in Java
95Identifiers, Reserved Words and Keywords
- Programmer may wish to redefine for a specific
application - Ex Change a Java interface to include an extra
method - Problem predefined version no longer applies, so
program segments that depend on it are invalid - Better to extend a class or compose a new class
than to redefine a predefined class - Ex Comparable interface can be implemented as we
see fit by a new class
96Variables
- Simple (naïve) definition a name for a memory
location - In fact, it is really much more
- Six attributes
- Name
- Address
- Value
- Type
- Lifetime
- Scope
97Variables
- Name
- Identifier
- In most languages the same name may be used for
different variables, as long as there is no
ambiguity - Some exceptions