Course Notes for CS1621 Structure of Programming Languages Part A By John C' Ramirez Department of C - PowerPoint PPT Presentation

1 / 191
About This Presentation
Title:

Course Notes for CS1621 Structure of Programming Languages Part A By John C' Ramirez Department of C

Description:

To examine some language features and constructs and how ... Simple features combine in ... used version, COMMON LISP, has included some imperative features ... – PowerPoint PPT presentation

Number of Views:433
Avg rating:3.0/5.0
Slides: 192
Provided by: johncr4
Category:

less

Transcript and Presenter's Notes

Title: Course Notes for CS1621 Structure of Programming Languages Part A By John C' Ramirez Department of C


1
Course Notes forCS1621 Structure of Programming
LanguagesPart AByJohn C. RamirezDepartment of
Computer ScienceUniversity of Pittsburgh
2
  • These notes are intended for use by students in
    CS1621 at the University of Pittsburgh and no one
    else
  • These notes are provided free of charge and may
    not be sold in any shape or form
  • Material from these notes is obtained from
    various sources, including, but not limited to,
    the textbooks
  • Concepts of Programming Languages, Seventh
    Edition, by Robert W. Sebesta (Addison Wesley)
  • Programming Languages, Design and Implementation,
    Fourth Edition, by Terrence W. Pratt and Marvin
    V. Zelkowitz (Prentice Hall)
  • Compilers Principles, Techniques, and Tools, by
    Aho, Sethi and Ullman (Addison Wesley)

3
Goals of Course
  • To survey the various programming languages,
    their purposes and their histories
  • Why do we have so many languages?
  • How did these languages develop?
  • Are some languages better than others for some
    things?
  • To examine methods for describing language syntax
    and semantics

4
Goals of Course
  • Syntax indicates structure of program code
  • How can language designer specify this?
  • How can programmer learn language?
  • How can compiler recognize this?
  • Lexical analysis
  • Parsing (syntax analysis)
  • Brief discussing of parsing techniques
  • Semantics indicate meaning of the code
  • What code will actually do
  • Can we effectively do this in a formal way?
  • Static semantics
  • Dynamic semantics

5
Goals of Course
  • To examine some language features and constructs
    and how they are used and implemented in various
    languages
  • Variables and constants
  • Types, binding and type checking
  • Scope and lifetime
  • Data Types
  • Primitive types
  • Array types
  • Structured data types

6
Goals of Course
  • Pointer (reference) types
  • Assignment statements and expressions
  • Operators, precedence and associativity
  • Type coercions and conversions
  • Boolean expressions and short-circuit evaluation
  • Control statements
  • Selection
  • Iteration
  • Unconditional branching goto

7
Goals of Course
  • Process abstraction procedures and functions
  • Parameters and parameter-passing
  • Generic subprograms
  • Nonlocal environments and side-effects
  • Implementing subprograms
  • Subprograms in static-scoped languages
  • Subprograms in dynamic-scoped languages

8
Goals of course
  • Data abstraction and abstract data types
  • Object-oriented programming
  • Design issues
  • Implementations in various object-oriented
    languages
  • Concurrency
  • Concurrency issues
  • Subprogram level concurrency
  • Implementations in various languages
  • Statement level concurrency

9
Goals of Course
  • Exception handling
  • Issues and implementations
  • IF TIME PERMITS
  • Functional programming languages
  • Logic programming languages

10
Language Development Issues
  • Why do we have high-level programming languages?
  • Machine code is too difficult for us to read,
    understand and debug
  • Machine code is not portable between
    architectures
  • Why do we have many high-level programming
    languages?
  • Different people and companies developed them

11
Language Development Issues
  • Different languages are either designed to or
    happen to meet different programming needs
  • Scientific applications
  • FORTRAN
  • Business applications
  • COBOL
  • AI
  • LISP, Scheme ( Prolog)
  • Systems programming
  • C
  • Web programming
  • Perl, PHP, Javascript
  • General purpose
  • C, Ada, Java

12
Language Development Issues
  • Programming language qualities and evaluation
    criteria
  • Readability
  • How much can non-author understand logic of code
    just by reading it?
  • Is code clear and unambiguous to reader?
  • These are often subjective, but sometimes is is
    fairly obvious
  • Examples of features that help readability
  • Comments
  • Long identifier names
  • Named constants

13
Language Development Issues
  • Clearly understood control statements
  • Language orthogonality
  • Simple features combine in a consistent way
  • But it can go too far, as explained in the text
    about Algol 68
  • Writability
  • Not dissimilar to readability
  • How easy is it for programmer to use language
    effectively?
  • Can depend on the domain in which it is being
    used
  • Ex LISP is very writable for AI applications but
    would not be so good for systems programming
  • Also somewhat subjective

14
Language Development Issues
  • Examples of features that help writability
  • Clearly understood control statements
  • Subprograms
  • Also orthogonality
  • Reliability
  • Two different ideas of reliability
  • Programs are less susceptible to logic errors
  • Ex Assignment vs. comparison in C
  • See assign.cpp and assign.java
  • Programs have the ability to recover from
    exceptional situations
  • Exception handling we will discuss more later

15
Language Design Issues
  • Many factors influence language design
  • Architecture
  • Most languages were designed for single processor
    von Neumann type computers
  • CPU to execute instructions
  • Data and instructions stored in main memory
  • General language approach
  • Imperative languages
  • Fit well with von Neumann computers
  • Focus is on variables, assignment, selection and
    iteration
  • Examples FORTRAN, Pascal, C, Ada, C, Java

16
Language Design Issues
  • Imperative language evolution
  • Simple straight-line code
  • Top-down design and process abstraction
  • Data abstraction and ADTs
  • Object-oriented programming
  • Some consider object-oriented languages not to be
    imperative, but most modern oo languages have
    imperative roots (ex. C, Java)
  • Functional languages
  • Focus is on function and procedure calls
  • Mimics mathematical functions
  • Less emphasis on variables and assignment
  • In strictest form has no iteration at all
    recursion is used instead
  • Examples LISP, Scheme
  • See example

17
Language Design Issues
  • Logic programming languages
  • Symbolic logic used to express propositions,
    rules and inferences
  • Programs are in a sense theorems
  • User enters a proposition and system uses
    programmers rules and propositions in an attempt
    to prove it
  • Typical outputs
  • Yes Proposition can be established by program
  • No Proposition cannot be established by program
  • Example Prolog see example program
  • Cost
  • What are the overall costs associated with a
    given language?
  • How does the design affect that cost?

18
Language Design Issues
  • Training programmers
  • How easy is it to learn?
  • Writing programs
  • Is language a good fit for the task?
  • Compiling programs
  • How long does it take to compile programs?
  • This is not as important now as it once was
  • Executing programs
  • How long does program take to run?
  • Often there is a trade-off here
  • Ex Java is slower than C but it has many
    run-time features (array bounds checking,
    security manager) that are lacking in C

19
Language Implementation Issues
  • How is HLL code processed and executed on the
    computer?
  • Compilation
  • Source code is converted by the compiler into
    binary code that is directly executable by the
    computer
  • Compilation process can be broken into 4 separate
    steps
  • Lexical Analysis
  • Breaks up code into lexical units, or tokens
  • Examples of tokens reserved words, identifiers,
    punctuation
  • Feeds the tokens into the syntax analyzer

20
Language Implementation Issues
  • Syntax Analysis
  • Tokens are parsed and examined for correct
    syntactic structure, based on the rules for the
    language
  • Programmer syntax errors are detected in this
    phase
  • Semantic Analysis/Intermediate Code Generation
  • Declaration and type errors are checked here
  • Intermediate code generated is similar to
    assembly code
  • Optimizations can be done here as well, for
    example
  • Unnecessary statements eliminated
  • Statements moved out of loops if possible
  • Recursion removed if possible

21
Language Implementation Issues
  • Code Generation
  • Intermediate code is converted into executable
    code
  • Code is also linked with libraries if necessary
  • Note that steps 1) and 2) are independent of the
    architecture depend only upon the language
    Front End
  • Step 3) is somewhat dependent upon the
    architecture, since, for example, optimizations
    will depend upon the machine used
  • Step 4) is clearly dependent upon the
    architecture Back End

22
Language Implementation Issues
  • Interpreting
  • Program is executed in software, by an
    interpreter
  • Source level instructions are executed by a
    virtual machine
  • Allows for robust run-time error checking and
    debugging
  • Penalty is speed of execution
  • Example Some LISP implementations, Unix shell
    scripts and Web server scripts

23
Language Implementation Issues
  • Hybrid
  • First 3 phases of compilation are done, and
    intermediate code is generated
  • Intermediate code is interpreted
  • Faster than pure interpretation, since the
    intermediate codes are simpler and easier to
    interpret than the source codes
  • Still much slower than compilation
  • Examples Java and Perl
  • However, now Java uses JIT Compilation also
  • Method code is compiled as it is called so if it
    is called again it will be faster

24
Brief, Incomplete PL History
  • Early 50s
  • Early HLLs started to emerge
  • FORTRAN
  • Stands for FORmula TRANslating system
  • Developed by a team led by John Backus at IBM for
    the IBM 704 machine
  • Successful in part because of support by IBM
  • Designed for scientific applications
  • The root of the imperative language tree

25
Brief PL History
  • Lacked many features that we new take for granted
    in programming languages
  • Conditional loops
  • Statement blocks
  • Recursive abilities
  • Many of these features were added in future
    versions of FORTRAN
  • FORTRAN II, FORTRAN IV, FORTRAN 77, FORTRAN 90
  • Had some interesting features that are now
    obsolete
  • COMMON, EQUIVALENCE, GOTO
  • We may discuss what these are later

26
Brief PL History
  • Late 50s
  • COBOL
  • COmmon Business Oriented Language
  • Developed by US DoD
  • Separated data and procedure divisions
  • But didnt allow functions or parameters
  • Still widely used, due in part to the large cost
    of rewriting software from scratch
  • Big companies would rather maintain COBOL
    programs than rewrite them in a different language

27
Brief PL History
  • LISP
  • LISt Processing
  • Developed by John McCarthy of MIT
  • Functional language
  • Good for symbolic manipulation, list processing
  • Had recursion and conditional expressions
  • Not in original FORTRAN
  • At one time used extensively for AI
  • Today most widely used version, COMMON LISP, has
    included some imperative features

28
Brief PL History
  • ALGOL
  • ALGOL 58 and then ALGOL 60 both designed by
    international committee
  • Goals for the language
  • Syntax should be similar to mathematical notation
    and readable
  • Should be usable for algorithms in publications
  • Should be compilable into machine code
  • Included some interesting features
  • Pass by value and pass by name (wacky!)
    parameters
  • Recursion (first in an imperative language)
  • Dynamic arrays
  • Block structure and local variables

29
Brief PL History
  • Introduced Backus-Naur Form (BNF) as a way to
    describe the language syntax
  • Still commonly used today, but not well-accepted
    at the time
  • Never widely used, but influenced virtually all
    imperative languages after it

30
Brief PL History
  • Late 60s
  • Simula 67
  • Designed for simulation applications
  • Introduced some interesting features
  • Classes for data abstraction
  • Coroutines for re-entrant subprograms
  • ALGOL 68
  • Emphasized orthogonality and user-defined data
    types
  • Not widely used

31
Brief PL History
  • 70s
  • Pascal
  • Developed by Nicklaus Wirth
  • No major innovations, but due to its simplicity
    and emphasis of good programming style, became
    widely used for teaching
  • C
  • Developed by Dennis Ritchie to help implement the
    Unix operating system
  • Has a great deal of flexibility, esp. with types
  • Incomplete type checking

32
Brief PL History
  • Void pointers
  • Coerces many types
  • Many programmers (esp. systems programmers) love
    it
  • Language purists hate it
  • Easy to miss logic errors
  • Prolog
  • Logic programming
  • We discussed it a bit already
  • Still used somewhat, mostly in AI
  • May discuss in more detail later

33
Brief PL History
  • 80s
  • Ada
  • Developed over a number of years by DoD
  • Goal was to have one language for all DoD
    applications
  • Especially for embedded systems
  • Contains some important features
  • Data encapsulation with packages
  • Generic packages and subprograms
  • Exception handling
  • Tasks for concurrent execution
  • We will discuss some of these later

34
Brief PL History
  • Very large language difficult to program
    reliably, even though reliability was one of its
    goals!
  • Early compilers were slow and error-prone
  • Did not have the widespread general use that was
    hoped
  • Eventually the government stopped requiring it
    for DoD applications
  • Use faded after this
  • Not used widely anymore
  • Ada 95 added object-oriented features
  • Still wasn't used much, especially with the
    advent of Java and other OO languages

35
Brief PL History
  • Smalltalk
  • Designed and developed by Alan Kay
  • Concepts developed in 60s, but language did not
    come to fruition until 1980
  • Designed to be used on a desktop computer 15
    years before desktop computers existed
  • First true object-oriented language
  • Language syntax is geared toward objects
  • messages passed between objects
  • methods are invoked as responses to messages
  • Always dynamically bound
  • All classes are subclasses of Object
  • Also included software devel. environment
  • Had large impact on future OOLs, esp. Java

36
Brief PL History
  • C
  • Developed largely by Bjarne Stroustrup as an
    extension to C
  • Backward compatible
  • Added object-oriented features and some
    additional typing features to improve C
  • Very powerful and very flexible language
  • But still has reliability problems
  • Ex. no array bounds checking
  • Ex. dynamic memory allocation
  • Widely used and likely to be used for a while
    longer

37
Brief PL History
  • Perl
  • Developed by Larry Wall
  • Takes features from C as well as scripting
    languages awk, sed and sh
  • Some features
  • Regular expression handling
  • Associative arrays
  • Implicit data typing
  • Originally used for data extraction and report
    generation
  • Evolved into the archetypal Web scripting
    language
  • Has many proponents and detractors

38
Brief PL History
  • 90s
  • Java
  • Interestingly enough, just like Ada, Java was
    originally developed to be used in embedded
    systems
  • Developed at Sun by a team headed by James
    Gosling
  • Syntax borrows heavily from C
  • But many features (flaws?) of C have been
    eliminated
  • No explicit pointers or pointer arithmetic
  • Array bounds checking
  • Garbage collection to reclaim dynamic memory

39
Brief PL History
  • Object model of Java actually more resembles that
    of Smalltalk rather than that of C
  • All variables are references
  • Class hierarchy begins with Object
  • Dynamic binding of method names to operations by
    default
  • But not as pure in its OO features as Smalltalk,
    due to its imperative control structures
  • Interpreted for portability and security
  • Also JIT compilation now
  • Growing in popularity, largely due to its use on
    Web pages

40
Brief PL History
  • 00's (aughts? oughts? naughts?)
  • See http//www.randomhouse.com/wotd/index.pperl?da
    te19990803
  • C
  • Main roots in C and Java with some other
    influences as well
  • Used with the MS .NET programming environment
  • Some improvements and some deprovements compared
    to Java
  • Likely to succeed given MS support

41
Program Syntax
  • Recall job of syntax analyzer
  • Groups (parses) tokens (fed in from lexical
    analyzer) into meaningful phrases
  • Determines if syntactic structure of token stream
    is legal based on rules of the language
  • Lets look at this in more detail
  • How does compiler know what is legal and what
    is not?
  • How does it detect errors?

42
Program Syntax
  • To answer these questions we must look at
    programming language syntax in a more formal way
  • Language
  • Set of strings of lexemes from some alphabet
  • Lexemes are the lowest level syntactic elements
  • Lexemes are made up of characters, as defined by
    the character set for the language

43
Program Syntax
  • Lexemes are categorized into different tokens and
    processed by the lexical analyzer
  • Ex
  • Lexemes if, (, width, lt, height, ), , cout, ltlt,
    width, ltlt, endl, ,
  • Tokens iftok, lpar, idtok, lt, idtok, rpar,
    lbrace, idtok, llt, idtok, llt, idtok, semi,
    rbrace
  • Note that some tokens correspond to single
    lexemes (ex. iftok) whereas some correspond to
    many (ex. idtok)

if (width lt height) cout ltlt width ltlt endl
44
Program Syntax
  • How do we formally define a language?
  • Assume we have a language, L, defined over an
    alphabet, ?.
  • 2 related techniques
  • Recognition
  • An algorithm or mechanism, R, will process any
    given string, S, of lexemes and correctly
    determine if S is within L or not
  • Not used for enumeration of all strings in L
  • Used by parser portion of compiler

45
Program Syntax
  • Generation
  • Produces valid sentences of L
  • Not as useful as recognition for compilation,
    since the valid sentences could be arbitrary
  • More useful in understanding language syntax,
    since it shows how the sentences are formed
  • Recognizer only says if sentence is valid or not
    more of a trial and error technique

46
Program Syntax
  • So recognizers are what compilers need, but
    generators are what programmers need to
    understand language
  • Luckily there are systematic ways to create
    recognizers from generators
  • Thus the programmer reads the generator to
    understand the language, and a recognizer is
    created from the generator for the compiler

47
Language Generators
  • Grammar
  • A mechanism (or set of rules) by which a language
    is generated
  • Defined by the following
  • A set of non-terminal symbols, N
  • Do not actually appear in strings
  • A set of terminal symbols, T
  • Appear in strings
  • A set of productions, P
  • Rules used in string generation
  • A starting symbol, S

48
Language Generators
  • Noam Chomsky described four classes of grammars
    (used to generate four classes of languages)
    Chomsky Hierarchy
  • )Unrestricted
  • Context-sensitive
  • Context-free
  • Regular
  • More info on unrestricted and context-sensitive
    grammars in a theory course
  • The last two will be useful to us

49
Language Generators
  • Regular Grammars
  • Productions must be of the form
  • ltnongt ? lttergtltnongt lttergt
  • where ltnongt is a nonterminal, lttergt is a
    terminal, and represents either or
  • Can be modeled by a Finite-State Automaton (FSA)
  • Also equivalent to Regular Expressions
  • Provide a model for building lexical analyzers

50
Language Generators
  • Have following properties (among others)
  • Can generate strings of the form ?n, where ? is a
    finite sequence and n is an integer
  • Pattern recognition
  • Can count to a finite number
  • Ex. an n 85
  • But we need at least 86 states to do this
  • Cannot count to arbitrary number
  • Note that an for any n (i.e. 0 or more
    occurrences) is easy do not have to count
  • Important to realize that the number of states is
    finite cannot recognize patterns with an
    arbitrary number of possibilities

51
Language Generators
  • Example Regular grammar to recognize Pascal
    identifiers (assume no caps)
  • N Id, X T a..z, 0..9 S Id
  • P
  • Id ? aX bX a b z
  • X ? aX bX 0X 9X a z 0
    9
  • Consider equiv. FSA

a
0


Id
z
9
52
Language Generators
  • Example Regular grammar to generate a binary
    string containing an odd number of 1s
  • N A,B T 0,1 S A P
  • A ? 0A 1B 1
  • B ? 0B 1A 0
  • Example Regular grammars CANNOT generate strings
    of the form anbn
  • Grammar needs some way to count number of as and
    bs to make sure they are the same
  • Any regular grammar (or FSA) has a finite number,
    say k, of different states
  • If n gt k, not possible

53
Language Generators
  • If we could add a memory of some sort we could
    get this to work
  • Context-free Grammars
  • Can be modeled by a Push-Down Automaton (PDA)
  • FSA with added push-down stack
  • Productions are of the form
  • ltnongt ? ?, where ltnongt is a nonterminal and ? is
    any sequence of terminals and nonterminals
  • note rhs is more flexible now

54
Language Generators
  • So how to generate anbn ? Let a0, b1
  • N A T 0,1 S A P
  • A ? 0A1 01
  • Note that now we can have a terminal after the
    nonterminal as well as before
  • Can also have multiple nonterminals in a single
    production
  • Example Grammar to generate sets of balanced
    parentheses
  • N A T (,) S A P
  • A ? AA (A) ()

55
Language Generators
  • Context-free grammars are also equivalent to BNF
    grammars
  • Developed by Backus and modified by Naur
  • Used initially to describe Algol 60
  • Given a (BNF) grammar, we can derive any string
    in the language from the start symbol and the
    productions
  • A common way to derive strings is using a
    leftmost derivation
  • Always replace leftmost nonterminal first
  • Complete when no nonterminals remain

56
Language Generators
  • Example Leftmost derivation of nested parens
    (()(()))
  • A ? (A)
  • ? (AA)
  • ? (()A)
  • ? (()(A))
  • ? (()(()))
  • We can view this derivation as a tree, called a
    parse tree for the string

57
Language Generators
  • Parse tree for (()(()))

A
(
A
)
A
A
)
(
)
(
A
)
(
58
Language Generators
  • If, for a given grammar, a string can be derived
    by two or more different parse trees, the grammar
    is ambiguous
  • Some languages are inherently ambiguous
  • All grammars that generate that language are
    ambiguous
  • Many other languages are not themselves
    ambiguous, but can be generated by ambiguous
    grammars
  • It is generally better for use with compilers if
    a grammar is unambiguous
  • Semantics are often based on syntactic form

59
Language Generators
  • Ambiguous grammar example Generate strings of
    the form 0n1m, where n,m gt 1
  • N A,B,C T 0,1 S A P
  • A ? BC 0A1
  • B ? 0B 0
  • C ? 1C 1
  • Consider the string 00011

A
A
C
B
0
A
1
B
0
C
1
B
C
B
0
1
1
B
0
0
0
60
Language Generators
  • We can easily make this grammar unambiguous
  • Remove production A ? 0A1
  • Note that nonterminal B can generate an arbitrary
    number of 0s and nonterminal C can generate an
    arbitrary number of 1s
  • Now only one parse tree

A
C
B
B
0
C
1
B
0
1
0
61
Language Generators
  • Lets look at a few more examples
  • Grammar to generate WWR W ? 0,1
  • N A T 0,1 S A P ?
  • Grammar to generate strings in 0,1 of the form
    WX such that W X but W ! X
  • This one is a little tricker
  • How to approach this problem?
  • We need to guarantee two things
  • Overall string length is even
  • At least one bit differs in the two halves

S ? 0A0 1A1 00 11
62
Language Generators
  • See board
  • Ok, now how do we make a grammar to do this?
  • Make every string (even length) the result of two
    odd-length strings appended to each other
  • Assume odd-length strings are Ol and Or
  • Make sure that either
  • Ol has a 1 in the middle and Or has a 0 in the
    middle or
  • Ol has a 0 in the middle and Or has a 1 in the
    middle
  • Productions

In ? AB BA A ? 0A0 1A1 1A0 0A1 1 B ?
0B0 1B1 1B0 0B1 0
63
Language Generators
  • Lets look at an example more relevant to
    programming languages
  • Grammar to generate simple assignment statements
    in a C-like language (diff. from one in text)
  • ltassig stmtgt ltvargt ltarith exprgt
  • ltarith exprgt lttermgt ltarith exprgt lttermgt
    ltarith exprgt - lttermgt
  • lttermgt ltprimarygt lttermgt ltprimarygt
    lttermgt / ltprimarygt
  • ltprimarygt ltvargt ltnumgt (ltarith exprgt)
  • ltvargt ltidgt ltidgtltsubscript listgt
  • ltsubscript listgt ltarith exprgt ltsubscript
    listgt, ltarith exprgt

64
Language Generators
  • Parse tree for X (A2Y) 20

ltassig stmtgt
ltvargt ltarith exprgt
ltidgt
lttermgt
lttermgt ltprimarygt
ltnumgt
ltprimarygt
( ltarith exprgt )
ltarith exprgt lttermgt
lttermgt
ltprimarygt
ltprimarygt
ltvargt
ltvargt
ltidgt
ltidgt ltsubscript listgt
ltarith exprgt
lttermgt
ltprimarygt
ltnumgt
65
Language Generators
  • Wow that seems like a very complicated parse
    tree to generate such a short statement
  • Extra non-terminals are often necessary to remove
    ambiguity
  • Extra non-terminals are often necessary to create
    precedence
  • Precedence in previous grammar has and / higher
    than and /
  • They would be lower in the parse tree
  • LOWER ABOVE IS CORRECT
  • What about associativity
  • Left recursive productions left associativity
  • Right recursive productions right associativity

66
Language Generators
  • But Context-free grammars cannot generate
    everything
  • Ex Strings of the form WW in 0,1
  • Cannot guarantee that arbitrary string is the
    same on both sides
  • Compare to WWR
  • These we can generate from the middle and build
    out in each direction
  • For WW we would need separate productions for
    each side, and we cannot coordinate the two with
    a context-free grammar
  • Need Context-Sensitive in this case

67
Language Generators
  • Lets look at one more grammar example
  • Grammar to generate all postfix expressions
    involving binary operators and -. Assume ltidgt
    is predefined and corresponds to any variable
    name
  • Ex v w x y - z -
  • How do we approach this problem?
  • Terminals easy
  • Nonterminals/Start require some thought
  • Productions require a lot of thought

68
Language Generators
  • T ltidgt, , -
  • N A
  • S A
  • P
  • A ? AA AA- ltidgt
  • Show parse tree for previous example
  • Is this grammar LL(1)?
  • We will discuss what this means soon

69
Parsers
  • Ok, we can generate languages, but how to
    recognize them?
  • We need to convert our generators into
    recognizers, or parsers
  • We know that a Context-free grammar corresponds
    to a Push-Down Automaton (PDA)
  • However, the PDA may be non-deterministic
  • As we saw in examples, to create a parse tree we
    sometimes have to guess at a substitution

70
Parsers
  • May have to guess a few times before we get the
    correct answer
  • This does not lend itself to programming language
    parsing
  • Wed like parser to never have to guess
  • To eliminate guessing, we must restrict the PDAs
    to deterministic PDAs, which restricts the
    grammars that we can use
  • Must be unambiguous
  • Some other, less obvious restrictions, depending
    upon parsing technique used

71
Parsers
  • There are two general categories of parsers
  • Bottom-up parsers
  • Can parse any language generated by a
    Deterministic PDA
  • Build the parse trees from the leaves up back to
    the root as the tokens are processed
  • At each step, a substring that matches the
    right-hand side of a production is substituted
    with the left side of the production
  • Reduces input string all the way back to the
    start symbol for the grammar
  • Also called shift-reduce parsing

72
Parsers
  • Correspond LR(k) grammars
  • Left to right processing of string
  • Rightmost derivation of parse tree (in reverse)
  • k symbols lookahead required
  • LR parsers are difficult to write by hand, but
    can be produced systematically by programs such
    as YACC (Yet Another Compiler Compiler).
  • Primary variations of LR grammars/parsers
  • SLR (Simple LR)
  • LALR (Look Ahead LR)
  • LR most general but also most complicated to
    implement
  • We'll leave details to CS 1622

73
Parsers
  • Top-down parsers
  • Build the parse trees from the root down as the
    tokens are processed
  • Also called predictive parsers, or LL parsers
  • Left-to-right processing of string
  • Leftmost derivation of parse tree
  • The LL(1) that we saw before means we can parse
    with only one token lookahead
  • More restrictive than LR parsers there are
    grammars generated by Deterministic PDAs that are
    not LL grammars (i.e. cannot be parsed by an LL
    parser)
  • Some restrictions on productions allowed
  • Cannot handle left-recursion we'll see why
    shortly

74
Parsers
  • Implementing a top-down parser
  • One technique is Recursive Descent
  • Can think of each production as a function
  • As string of tokens is parsed, terminal symbols
    are consumed/processed and non-terminal symbols
    generate function calls
  • Now we can see why left-recursive productions
    cannot be handled
  • From Example 3.4
  • ltexprgt ? ltexprgt lttermgt
  • Recursion will continue indefinitely without
    consuming any symbols

75
Parsers
  • Luckily, in most cases a grammar with left
    recursion can be converted into one with only
    right-recursion
  • Recursive Descent parsers can be written by hand,
    or generated
  • Think of a program that processes the grammar by
    creating a function shell for each non-terminal
  • Then details of function are filled in based upon
    the various right-hand sides the non-terminal
    generates
  • See example

76
LL(1) Grammars
  • So how can we tell if a grammar is LL(1)?
  • Given the current non-terminal (or left side of a
    production) and the next terminal we must be able
    to uniquely determine the right side of the
    production to follow
  • Remember that a non-terminal can have multiple
    productions
  • As we previously mentioned, the grammar must not
    be left recursive
  • However, not having left recursion is necessary
    but not sufficient for an LL(1) grammar

77
LL(1) Grammars
  • Ex
  • A ? aX aY
  • We cannot determine which right side to follow
    without more information than just "a"
  • How can we process a grammar to determine if this
    situation occurs?
  • Calculate the First set for each RHS of
    productions
  • First set of a sequence of symbols, S, is the set
    of terminals that begin the strings derived from
    S
  • Given multiple RHS for nonterminal N
  • N ? ?1 ?2
  • If First(?1) and First(?2) intersect, the grammar
    is not LL(1)

78
LL(1) Grammars
  • So how do we calculate First() sets?
  • Algorithm is given in Aho (see Slide 2)
  • Consider symbol X
  • If X is a terminal, First(X) X
  • If X ? e is a production, add e to First(X)
  • If X is a nonterminal and X ? Y1Y2Yk is a
    production
  • Add a to First(X) if, for some i, a is in
    First(Yi) and e is in all of First(Y1)
    First(Yi-1)
  • Add e to First(X) if e is in First(Yj) for all j
    1, 2, k
  • To calculate First(X1X2Xn) for some X1X2Xn
  • Add non-e symbols of First(X1)
  • If e is in First(X1), and non-e symbols of
    First(X2)
  • If e is in all First(Xi), add e

79
LL(1) Grammars
  • A ? aB b cBB
  • B ? aB bA aBb
  • A ? aB CD E e
  • B ? b
  • C ? cA e
  • D ? dA
  • E ? dB

80
Semantics
  • Sematics indicate the meaning of a program
  • What do the symbols just parsed actually say to
    do?
  • Two different kinds of semantics
  • Static Semantics
  • Almost an extension of program syntax
  • Deals with structure more than meaning, but at a
    meta level
  • Handles structural details that are difficult or
    impossible to handle with the parser
  • Ex Has variable X been declared prior to its
    use?
  • Ex Do variable types match?

81
Semantics
  • Dynamic Semantics (often just called semantics)
  • What does the syntax mean?
  • Ex Control statements
  • Ex Parameter passing
  • Programmer needs to know meaning of statements
    before he/she can use language effectively

82
Semantics
  • Static Semantics
  • One technique for determining/checking static
    semantics is Attribute Grammars
  • Start with a context-free grammar, and add to it
  • Attributes (for the grammar symbols)
  • Indicate some properties of the symbols
  • Attribute computation functions (semantic
    functions)
  • Allow attributes to be determined
  • Predicate functions
  • Indicate the static semantic rules

83
Semantics
  • Attributes made up of synthesized attributes
    and inherited attributes
  • Synthesized Attributes
  • Formed using attributes of grammar symbols lower
    in the parse tree
  • Ex Result type of an expression is synthesized
    from the types of the subexpressions
  • Inherited Attributes
  • Formed using attributes of grammar symbols higher
    in the parse tree
  • Ex Type of RHS of an assignment is expected to
    match that of LHS the type is inherited from
    the type of the LHS variable

84
Semantics
  • Semantic Functions
  • Indicate how attributes are derived, based on the
    static semantics of the language
  • Ex A B C
  • Assume A, B and C can be integers or floats
  • If B and C are both integers, RHS result type is
    integer, otherwise it is float
  • Predicate functions
  • Test attributes of symbols processed to see if
    they match those defined by language
  • Ex A B C
  • If RHS type attribute is not equal to LHS type
    attribute, error (in some languages)

85
Semantics
  • Detailed Example in text
  • Grammar Rules
  • 1) ltassigngt ? ltvargt ltexprgt
  • 2) ltexprgt ? ltvargt ltvargt
  • 3) ltexprgt ? ltvargt
  • 4) ltvargt ? A B C
  • Attributes
  • actual_type actual type of ltvargt or ltexprgt in
    question (synthesized, but for a ltvargt we say
    this is an intrinsic attribute)
  • expected_type associated with ltexprgt, indicating
    the type that it SHOULD be inherited from
    actual_type of ltvargt

86
Semantics
  • Semantic functions
  • Parallel to syntax rules of the grammar
  • See Ex. 3.6 in text
  • 1) ltassigngt ? ltvargt ltexprgt
  • ltexprgt.expected_type ? ltvargt.actual_type
  • 2) ltexprgt ? ltvargt2 ltvargt3
  • ltexprgt.actual_type ? if (ltvargt2.actual_type
    int) and
  • (ltvargt3.actual_type int) then int
  • else real
  • end if
  • 3) ltexprgt ? ltvargt
  • ltexprgt.actual_type ? ltvargt.actual_type
  • 4) ltvargt ? A B C
  • ltvargt.actual_type ? look-up(ltvargt.string)
  • Predicate functions
  • Only one needed here do the types match?
  • ltexprgt.actual_type ltexprgt.expected_type

87
Semantics
  • Ex A B C

ltassigngt
actual_type
expected type
ltvargt
ltexprgt
actual_type

ltvargt3

ltvargt2
A
actual_type
actual_type
B
C
88
Semantics
  • Attribute grammars are useful but not typically
    used in their pure form for full-scale languages
    makes the grammars more complicated and
    compilers more difficult to generate

89
Semantics
  • Dynamic Semantics (semantics)
  • Clearly vital to the understanding of the
    language
  • In early languages they were simply informal
    like manual pages
  • Efforts have been made in later years to
    formalize semantics, just as syntax has been
    formalized
  • But semantics tend to be more complex and less
    precisely defined
  • More difficult to formalize

90
Semantics
  • Some techniques have gained support however
  • Operational Semantics
  • Define meaning by result of execution on a
    primitive machine, examining the state of the
    machine before and after the execution
  • Axiomatic Semantics
  • Preconditions and postconditions define meaning
    of statements
  • Used in conjunction with proofs of program
    correctness
  • Denotational Semantics
  • Map syntactic constructs into mathematical
    objects that model their meaning
  • Quite rigorous and complex

91
Identifiers, Reserved Words and Keywords
  • Identifier
  • String of characters used to name an entity
    within a program
  • Most languages have similar rules for ids, but
    not always
  • C and Java are case-sensitive, while Ada is not
  • Can be a good thing mixing case allows for
    longer, more readable names, ala Java
    NoninvertibleTransformException
  • Can be a bad thing should that first i be upper
    or lower case?

92
Identifiers, Reserved Words and Keywords
  • C, Ada and Java allow underscores, while
    standard Pascal does not
  • FORTRAN originally allowed only 6 chars
  • Reserved Word
  • Name whose definition is part of the syntax of
    the language
  • Cannot be used by programmer in any other way
  • Most newer languages have reserved words
  • Make parsing easier, since each reserved word
    will be a different token

93
Identifiers, Reserved Words and Keywords
  • Ex end if in Ada
  • Interesting extension topic
  • If we extend a language and add new reserved
    words, we may make some old programs
    syntactically incorrect
  • Ex C subprogram using class as an id will not
    compile with a C compiler
  • Ex Ada 83 program using abstract as an id will
    not compile with an Ada 95 compiler
  • Keywords
  • To some, keyword ? reserved word
  • Ex C, Java

94
Identifiers, Reserved Words and Keywords
  • To others, there is a difference
  • Keywords are only special in certain contexts
  • Can be redefined in other contexts
  • Ex FORTRAN keywords may be redefined
  • Predefined Identifiers
  • Identifiers defined by the language implementers,
    which may be redefined
  • cin, cout in C
  • real, integer in Pascal
  • predefined classes in Java

95
Identifiers, Reserved Words and Keywords
  • Programmer may wish to redefine for a specific
    application
  • Ex Change a Java interface to include an extra
    method
  • Problem predefined version no longer applies, so
    program segments that depend on it are invalid
  • Better to extend a class or compose a new class
    than to redefine a predefined class
  • Ex Comparable interface can be implemented as we
    see fit by a new class

96
Variables
  • Simple (naïve) definition a name for a memory
    location
  • In fact, it is really much more
  • Six attributes
  • Name
  • Address
  • Value
  • Type
  • Lifetime
  • Scope

97
Variables
  • Name
  • Identifier
  • In most languages the same name may be used for
    different variables, as long as there is no
    ambiguity
  • Some exceptions
Write a Comment
User Comments (0)
About PowerShow.com