Finite State Machinery I - PowerPoint PPT Presentation

About This Presentation
Title:

Finite State Machinery I

Description:

The following machine recognises two words in Italian. ... Start in the initial state and at the first symbol of the word. ... either accept or reject a word. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 25
Provided by: michael307
Category:

less

Transcript and Presenter's Notes

Title: Finite State Machinery I


1
Finite State Machinery - I
  • Fundamentals
  • Recognisers and Transducers

2
Reference Outline
  • Websites
  • Xerox www.xrce.xerox.com/research/mltt/fst/
  • Groningen grid.let.rug.nl/vannoord/FSA/fsa.html
  • AT T www.research.att.com/sw/tools/fsm
  • Books/Collections
  • Karttunen Oflazer (2000)
  • Jurafsky Martin (2000)
  • Hopcraft and Ullman (1979)
  • Roche and Schabes (1977)
  • Classic Articles
  • Kaplan and Kay (1994)
  • Koskenniemi (1983)
  • Johnson (1972)
  • Tools
  • Van Noord et al.
  • Mohri et al.
  • Daciuk.
  • Karttunen Beesley

4
3
Acknowledgements to
  • Lauri Karttunen, Ken Beesley and colleagues at
    Xerox.
  • Most materials in this tutorial are from their
    website.
  • Forthcoming book Finite State Morphology Xerox
    Tools and Techniques.

5
4
FS Motivation
  • Chomsky hierarchy of language classes based on
    classes of descriptive notation, and also on
    asociated classes of machine.
  • Chomsky (1957) dismissed FS grammars, and
    associated machinery, as fundamentally inadequate
    for the description of NL.

5
Embedding
  • Basic problem is not that sentences can grow to
    arbitrary length, it is that the description of a
    syntactic constitutent may embed any other
    constituents including the sentence itelf.
  • The dog bit the cat.
  • The dog that the man saw bit the cat.
  • The dog that the man that the horse kicked saw
    bit the cat.
  • etc

6
On the other hand ...
  • Plenty of language just ain't like that.
  • Words
  • Orthographic spelling.
  • Phonological spelling.
  • Morphology.
  • Fixed expression types (e.g dqtes).
  • Gross constitutent structures (e.g. the big, bad,
    blue wolf).

7
Recent Application Areas for FS Technology Include
  • POS Tagging
  • Spell Checking
  • Information Extraction
  • Speech Recognition
  • Text to Speech
  • Spoken Dialogue
  • Parsing

8
Recognition of Italian Words
  • The coke machine recognises words in the coke
    machine language.
  • The following machine recognises two words in
    Italian.
  • Recognition mechanism is language independent.

I
N
Q
U
E
21
9
The Process of Analysis
  • Start in the initial state and at the first
    symbol of the word.
  • If there is an arc labelled with that symbol, the
    machine transitions to the next state, and the
    symbol is consumed.
  • The process continues with successive symbols
    until .....

22
10
The Process of Analysis
  • One or more of these conditions holds
  • A. A final state is reached
  • B. All symbols are consumed
  • C. There are no transitions out of a state for
    the current symbol.
  • If both A and B, analysis succeeds and the word
    is recognised.
  • Otherwise recognition fails.

23
11
Success and Failure
C
A
S
A
I
N
Q
U
E
L
E
E
N
T
LE CASA CINQUANTA LENTEMENTE
24
12
Transducers
  • Recognisers either accept or reject a word.
  • Although this is useful, networks can actually
    return more substantial information.
  • This is achieved by providing networks with the
    ability to write as well as to read.

27
13
Basic Transducer
  • Each transition of a transducer is labelled with
    a pair of symbols rather than with a single
    symbol.
  • Analysis proceeds as before, except that input
    symbols are matched against the lower-side
    symbols on transitions.
  • If analysis succeeds, return the string of
    upper-side symbols on the path to the final state

28
14
Confusing Terminology
  • Lower side surface side.
  • Upper side "deep" side.
  • Analysis proceeds from lower to upper.
  • Synthesis (generation) proceeds from upper to
    lower.

15
Lexical Transducers
  • In common parlance, a transducer is a device
    which converts one form of energy into another,
    e.g. a microphone converts from sound to
    electrical signals.
  • Next we look at lexical transducers which convert
    one string of symbols into another.

29
16
Lexical Transducer Example
lexical string
C
A
S
A
surface string
C
A
S
E
  • Input CASE
  • Output CASA

30
17
Morphological Analysis
A
T
N
O
C
e
O
C
N
T
R
e
  • Input CONTO
  • Output CONTARE V 1P SG

31
18
Remarks
  • e stands for "epsilon". During analysis, epsilon
    transitions are taken freely without consuming
    any input.
  • Note also single symbols with multi-character
    print names (e.g. SG).
  • The order of these symbols, and the choice of
    infinitive as baseform, is determined by
    linguists.

32
19
Exercise
  • The word "conto" in Italian is also a masculine
    noun meaning (a) story and (b) bank account
  • Draw the corresponding 2-level networks.
  • How can the different meanings be incorporated
    into the same network

33
20
Conto N SG
A
e
O
T
N
O
C
O
C
N
T
O
N
e
e
SG
  • Input CONTO
  • Output CONTO NSG

31
21
Synthesis
  • Transducers are reversible. This means that they
    can be used to perform the inverse transduction
    from an transducers.
  • The process of synthesis is the inverse of
    analysis

34
22
The Process of Synthesis
  • Start at the start state and at the beginning of
    the input string.
  • Match the input symbols against the upper-side
    symbols of the arcs, consuming symbols until a
    final state is reached.
  • If successful, return the string of lower-side
    symbols (else nothing).

35
23
Morphological Synthesis
A
T
N
O
C
e
O
C
N
T
R
e
  • Input CONTARE V 1P SG
  • Output CONTO
  • N.B. e symbols are ignored on output

36
24
Analysis and Synthesis
  • Upper Side Language (Lexical Strings).
  • Lower Side Language (Surface Strings).
  • Transducer maps between the two.
  • However large the lexical transducer may become,
    analysis and synthesis are performed by the same
    language-independent matching techniques.

37
Write a Comment
User Comments (0)
About PowerShow.com