Language Translation Issues - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Language Translation Issues

Description:

Language Translation Issues. 9. General Syntactic Criteria. Ease of verifiability ... Language Translation Issues. 11. Overall Program-Subprogram Structure ... – PowerPoint PPT presentation

Number of Views:1714
Avg rating:3.0/5.0
Slides: 45
Provided by: UFO
Category:

less

Transcript and Presenter's Notes

Title: Language Translation Issues


1
Language Translation Issues
2
Text References
  • Sections 3.1 to 3.3

3
Content
  • Programming language syntax
  • Stages in translation
  • Formal translation models

4
Syntax
  • Syntax, which is defined as the arrangement of
    words as elements in a sentence to show their
    relationship, describes the sequence of symbols
    that make up valid programs.
  • Syntax
  • provides significant information needed for
    understanding a program and
  • provides much-needed information toward the
    translation of source program into an object
    program.

5
Syntax
  • Syntax alone is insufficiently to unambiguously
    specify the structure of a statement.

x2.453.67
Was X declared? Was X declared as type
real? Integer or real addition?
6
Semantics
  • We need more than just syntactic structures for
    the full description of a programming language.
  • Semantics is concerned with
  • the use of declarations, operations, sequence
    control, and referencing environments

7
General Syntactic Criteria
  • Readability
  • understandable without any separate documentation
  • Enhanced by such language features as
  • natural statement formats
  • structured statements
  • liberal use of keywords and noise words,
  • provision for embedded comments,
  • unrestricted length identifiers
  • mnemonic operator symbols
  • free-field formats, and
  • complete data declarations
  • Enhanced if syntactic differences reflecting
    underlying semantic difference

8
General Syntactic Criteria
  • Writeability
  • Often in conflict with readability
  • Default rules reducing redundancy if inferrable
    from the context
  • E.g., FORTRANs implicit typing

9
General Syntactic Criteria
  • Ease of verifiability
  • Ease of translation
  • Lack of ambiguity

if e1 then if e2 then s1 else s2
if e1 then if e2 then s1 else s2
if e1 then if e2 then s1 else s2
10
Syntactic Elements of a Language
  • Character set
  • Identifiers
  • Operator symbols
  • Keywords and reserved words
  • Noise words
  • Comments
  • Blanks (spaces)
  • Delimiters and brackets
  • Free- and fixed-field formats
  • Expressions
  • Statements

11
Overall Program-Subprogram Structure
  • Separate subprogram definitions
  • Each is defined as a separate syntactic unit.
  • Compiled separately and linked at load time
  • Separate data definitions
  • The class mechanism in OO languages
  • Nested subprogram definitions
  • Pascal
  • Separate interface definitions
  • Data descriptions separated from executable
    statements
  • Unseparated subprogram definitions

12
Stages in Translation
Source program
Lexical analysis
SOURCE PROGRAM RECIGNITION PAHSES
Lexical tokens
Syntactic analysis
Symbol table
Executable code
Other tables
Parse tree
Object code from other compilation
Semantic analysis
Linking
Object code
Intermediate code
Optimization
Code generation
OBJECT CODE GENERATION PAHSES
Optimized intermediate code
13
Lexical Analysis
  • Group the source program, a long undifferentiated
    sequence of symbols, into its elementary
    constituents
  • identifier, delimiters, operator symbols,
    numbers, keywords,noise words, blanks, comments,
    etc.
  • The basic model used to design lexical analyzer
    is the finite-state automata.

DO 10 I1,5 DO 10 I1.5 DO10I1.5
14
Syntactic Analysis (Parsing)
  • Larger program structures are identified
  • statements,
  • declarations,
  • expressions

15
Semantic Analysis
  • The bridge between analysis and synthesis
  • Common functions of semantic analyzers
  • Symbol-table maintenance
  • Insertion of implicit information
  • Error detection
  • Macro processing and compile-time operations

16
Synthesis of the Object Program
  • Optimization
  • Code generation
  • Linking and loading

17
Formal Translation Models
  • The formal definition of the syntax of a
    programming language is called a grammar.
  • The two classes of grammars useful in compiler
    technology include
  • the BNF grammar (or context-free grammar) and
  • the regular grammar.

18
BNF Grammars
  • A sentence may be a simple declarative sentence
    or a simple interrogative sentence.

ltsentencegt ltdeclarativegt ltinterrogativegt ltde
clarativegt ltsubjectgt ltverbgt
ltobjectgt. ltsubjectgt ltarticlegt
ltnoungt ltinterrogativegt ltauxiliary verbgt
ltsubjectgtltpredicategt?
Backus-Naur form
19
Syntax BNF Grammars
  • A BNF grammar is composed of a finite set of BNF
    grammar rules, which define a language, in our
    case, programming language.
  • Because syntax is concerned only with form rather
    than meaning, a language, considered
    syntactically, consists of a set of syntactically
    correct programs.

The home / ran / the girl.
Syntactically correct, but does not make
sense under nominal interpretations of these
words.
20
Syntax BNF Grammars
  • Natural language is incomplete in describing the
    syntax rules of programming language.
  • Formal mathematical set of rules is given to
    solve the problems on using natural language.

ltdigitgt 1123456789 ltconditional
statementgt if ltBoolean expressiongt then
ltstatementgt else ltstatementgt if
ltBoolean expressiongt then ltstatementgt ltunsigned
integergt ltdigitgt ltunsigned integergt
ltdigitgt
21
Parse Trees
  • Given a grammar, we can use a single-replacement
    rule to generate strings in out language.
  • For example, all balanced parentheses can be
    generated by the grammar

S ? SS(S)() S ? (S) ? (SS) ? (()S) ? (()())
is derived from
sentential form
22
Parse Trees
  • To determine if a given string represents a
    syntactically valid program in the language
    defined by a BNF grammar, we must use the
    grammar rules to construct a syntactic analysis
    or parse of the string.
  • If the string can be successfully parsed, then it
    is in the language.
  • If no way can be found of parsing the string with
    the given grammar rules, then the string is not
    in the language.

Grammar for simple assignment statements Parse
tree for an assignment statement
23
ltassignment statementgt ltvariablegtltarithmeti
c expressiongt ltarithmetic expressiongt
lttermgt ltarithmetic expressiongtlttermgt
ltarithmetic expressiongt-lttermgt lttermgt
ltprimarygt lttermgtltprimarygt
lttermgt/ltprimarygt ltprimarygt ltvariablegt
ltnumbergt (ltarithmetic expressiongt) ltvariablegt
ltidentifiergt ltidentifiergtltsubscript
listgt ltsubscribe listgt ltarithmetic
expressiongt ltsubscribe listgt,ltarithmetic
expressiongt
Grammar for simple assignment statement
24
ltassignment statementgt

ltarithmetic expressiongt
ltvariablegt
lttermgt
ltidentifiergt
ltprimarygt
lttermgt

W
)
ltprimarygt
(
ltarithmetic expressiongt
ltvariablegt
ltarithmetic expressiongt

lttermgt
lttermgt
ltprimarygt
ltidentifiergt
ltprimarygt
ltvariablegt
Y
ltvariablegt
ltidentifiergt
Parse tree for an assignment statement
ltidentifiergt
V
U
25
Ambiguity
G1S?SS01
G2T?0T1T01
26
Extension to BNF
27
Syntax Charts
28
Finite-State Automata
  • The lexical analysis phase of the compiler breaks
    down the source program into a stream of tokens,
    such as identifiers, integers, IF, etc.
  • A simple model, called a finite-state automaton
    (FSA), recognizes such tokens.

0
0
1
FSA to recognize an odd number of 1s
A
1
Input 100101
29
Finite-State Automata
FSA to recognize optionally signed integers
digit
digit

digit

-digitdigit
30
Finite-State Automata
  • Deterministic FSA
  • For each state, and for each input symbol, we
    have a unique transition to the same or different
    state.
  • Nondeterministic FSA
  • The presence of multiple arcs from a state with
    the same label so that you have a choice as which
    way to go

31
Regular Grammars
  • Regular grammar is a special case of BNF, which
    turns out to be equivalent to the FSA language.
  • Form

ltnonterminalgtltterminalgtltnonterminalgtltterminalgt
A ? 0A 1A 0
32
Regular Expressions
  • A third form of language definition that is
    equivalent to the FSA and regular grammar
  • Defined recursively as
  • Individual terminal symbols are regular
    expressions.
  • If a and b are regular expressions, then so are
    a?b, ab, (a), and a.
  • Nothing else is a regular expression.
  • We can use regular expressions to represent any
    language defined by a regular grammar or FSA,
    although converting any FAS to a regular
    expression is not always obvious.

33
Regular Expressions
34
Regular Expressions
0
0
0
1
1
1
Converting (0?1)01(0?1) to an FSA
35
Pushdown Automata
  • A pushdown automata (PDA) is equivalent to the
    BNF grammar.
  • A PDA is an abstract model machine similar to the
    FSA.
  • It has a finite set of states.
  • In addition, it has a pushdown stack.

36
Pushdown Automata
  • Moves of the PDA
  • An input symbol is read and the top symbol on the
    stack is read.
  • Based upon both inputs, the machine enters a new
    state and write zero or more symbols onto the
    pushdown stack.
  • Acceptance of a string occurs if the stack is
    even empty. (Alternatively, acceptance can be if
    the PDA is in a final state. Both models can be
    shown to be equivalent.)

37
Pushdown Automata
  • PDAs are more powerful than FSAs by examining the
    recognition of anbn.
  • It can not be recognized by an FSA but can be
    easily recognized by a PDA.
  • Simply stack the initial a symbols, and for each
    b, pop an a off the stack.
  • If the end of input is reached at the same time
    that the stack becomes empty, the string is
    accepted.

38
Efficient Parsing Algorithms
  • From Chomskys work, each type of formal grammar
    is closely related to a type of automaton,
  • a simple abstract machine that usually defined to
    be
  • capable of reading an input tape containing a
    sequence of characters and
  • producing an output tape containing another
    sequence of characters.

39
Efficient Parsing Algorithms
  • Problem
  • Because a BNF grammar may be ambiguous, the
    automaton must be nondeterministic
  • For programming language translation, a more
    restricted automaton that never has to guess
    called a deterministic automaton is needed.
  • For unambiguous BNG grammars straightforward
    parsing techniques has been discovered.
  • Recursive descent parser
  • LR grammars (left-to-right parsing algorithms)
    describe all BNF grammars recognized
    deterministic pushdown automata.
  • LR(1), SLR, LALR

40
Recursive Descent Parsing
  • Basic idea, for example
  • ltarithmetic expressiongtlttermgt-lttermgt
  • This states that we first recognize a lttermgt, and
    then as long as the next symbol is either or -,
    we recognize another lttermgt.
  • Assumption
  • the variable nextchar always contains the first
    character of the respective nonterminal and
  • the function getchar reads in a character, then
  • we may directly rewrite the above extended BNF
    rule as the following recursive procedure

41
(No Transcript)
42
Semantic Modeling
  • The following slides, which were selected from
    the Sebestas teaching material, describe the
    operational semantics.

43
Imperative or Operational Models
  • Operational Semantics
  • Describe the meaning of a program by executing
    its statements on a machine, either simulated or
    actual. The change in the state of the machine
    (memory, registers, etc.) defines the meaning of
    the statement
  • To use operational semantics for a high-level
    language, a virtual machine in needed
  • A hardware pure interpreter would be too
    expensive
  • A software pure interpreter also has problems
  • The detailed characteristics of the particular
    computer would make actions difficult to
    understand
  • Such a semantic definition would be
    machine-dependent

44
Imperative or Operational Models
  • A better alternative A complete computer
    simulation
  • The process
  • Build a translator (translates source code to the
    machine code of an idealized computer)
  • Build a simulator for the idealized computer
  • Evaluation of operational semantics
  • Good if used informally
  • Extremely complex if used formally (e.g., VDL)
Write a Comment
User Comments (0)
About PowerShow.com