Title: Yacc
1Yacc
BNF grammar example.y
example.tab.c
YACC
C compiler linker
Executable
Other modules
2Yacc what is it?
Yacc a tool for automatically generating a
parser given a grammar written in a yacc
specification (.y file). The grammars accepted
are LALR(1) grammars with disambiguating rules. A
grammar specifies a set of production rules,
which define a language. A production rule
specifies a sequence of symbols, sentences, which
are legal in the language.
3Structure of Yacc
- Usually Lex/Yacc work together
- yylex() to get the next token
- To call the parser, the function yyparse()is
invoked
4How the parser works
- The parser produced by Yacc consists of a finite
state machine with a stack - A move of the parser is done as follows
- Calls to yylex to obtain the next token when
needed - Using the current state, and the lookahead token,
the parser decides on its next action (shift,
reduce, accept or error) and carries it out
5Skeleton of a yacc specification (.y file)
- declarations
-
- rules
-
- user code
- Rules ltproductiongt action
- Grammar type 2 productions
- Action C code that specifies what to do when a
production is reduced
6Skeleton of a yacc specification (.y file)
lt C global variables, prototypes, comments
gt DEFINITION SECTION PRODUCTION RULES
SECTION lt C auxiliary subroutinesgt
This part will be embedded into .c
contains token declarations. Tokens are
recognized in lexer.
define how to understand the input language,
and what actions to take for each sentence.
any user code. For example, a main function to
call the parser function yyparse()
7Structure of yacc file Definition
section declarations of tokens type of values
used on parser stack Rules section list of
grammar rules with semantic routines User code
8The declaration section
- Terminal and non terminals
- token symbol
- type symbol
- Operator precedence and operator associability
- noassoc symbol
- left symbolo
- right symbol
- Axiom
- start symbol
9The declaration section terminals
- They are returned by the yylex()function which is
called be the yyparse() - They become define in the generated file
- They are numbered starting from 257. But a
concrete number can be associated with a token - token T_Key 345
- Terminals that consist of a single character can
be directly used (they are implicit). The
corresponding tokens have values lt257
10The declaration sectionexamples
include ltstdio.hgt token NUMBER, PLUS,
MINUS, MUL, DIV, L_PAR, R_PAR start expr
11The declaration sectionexamples
include "expressions_tab.h" digit
0-9 \t digit yylvalatoi(yytext)
return NUMBER "" return PLUS "-" return
MINUS "" return MUL "/" return DIV "(" return
L_PAR ")" return R_PAR . printf("token
erroneous\n")
12The declaration sectionexamples
. . . token NUMBER, , -, , /, (, ) . . .
YACC
. . . digit 0-9 \t digit
yylvalatoi(yytext) return NUMBER "" return
"-" return - "" return "/" return
/ "(" return ( ")" return ) . . .
Lex
13Flex/Yacc communication
file.y
file.l
header
yacc -d file.y
lex file.l
file.tab.h
file.tab.c
lex.yy.c
cc file.tab.c -c
cc lex.yy.c -c
file.tab.o
lex.yy.o
gcc lex.yy.o file.tab.o -o calc
calc
14Lex/Yacc lex file
include "expressions.tab.h" digit
0-9 option noyywrap \t
digito yylvalatoi(yytext) /printf("lex
s, d\n ",yytext, yylval)/ return
NUMERO "" return PLUS "-" return MINUS .
printf("token erroneous\n")
Generated by Yacc
no main()
15Flex/Yacc communication
expressions.tab.h ifndef YYSTYPE define
YYSTYPE int endif define NUMBER 258 define PLUS
259 define MINUS 260 define MUL 261 define DIV
262 define L_PAR 263 define R_PAR 264
16The Production Rules Section
production symbol1 symbol2 action
symbol3 symbol4 action
production symbol1
symbol2 action
17Semantic values
statement expression printf ( g\n,
1) expression expression expression
1 3 expression
- expression 1 - 3
NUMBER 1
According these two productions, 5 4 3 2
is parsed into
18Defining Values
- expr expr '' term 1 3
- term 1
-
- term term '' factor 1 3
- factor 1
-
- factor '(' expr ')' 2
- ID
- NUM
-
19Defining Values
1
- expr expr '' term 1 3
- term 1
-
- term term '' factor 1 3
- factor 1
-
- factor '(' expr ')' 2
- ID
- NUM
-
20Defining Values
- expr expr '' term 1 3
- term 1
-
- term term '' factor 1 3
- factor 1
-
- factor '(' expr ')' 2
- ID
- NUM
-
2
21Defining Values
- expr expr '' term 1 3
- term 1
-
- term term '' factor 1 3
- factor 1
-
- factor '(' expr ')' 2
- ID
- NUM
-
3
Default 1
22The declaration section
- Support for arbitrary value types
- union
- int intval
- char str
23The declaration section
- Use of union
- terminal declaration
- token ltintvalgt NATURAL
- non terminal declaration
- type lttypegt NO_TERMINAL
- in productions
- expr NAT NAT ltintvalgt1ltintvalgt3
- In the lex file
- -?digit yyval.intvalatoi(yytext)
- return INTEGER
24Ambiguity
- By default yacc does the following
- s/r chooses reduce over shift
- r/r reduce the production that appears first
- Better to solve the conflicts by setting
precedence
25Error recovery
- Yacc detects errors
- To inform of errors a function needs to be
implemented - int yyerror (char s) fprintf (stderr, s,s)
- Panic mode recovery
- E IF ( cond )
- IF ( error ) yyerror(condition missing)
26Error recovery
- After detecting an error, the parser will scan
ahead looking for three legal tokens. yyerrork
resets the parser to its normal mode - yyclearin allows the token that caused the error
to be discarded