Overview of Language Model Classes and Release Progress - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Overview of Language Model Classes and Release Progress

Description:

Human and Systems Engineering. Department of Electrical and Computer Engineering. min ... ABNF rules are processed one at a time ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 34
Provided by: jennifer194
Category:

less

Transcript and Presenter's Notes

Title: Overview of Language Model Classes and Release Progress


1
Overview of Language Model Classes and Release
Progress
XML
ABNF
IHD
BNF
JSGF
BNF
Daniel May Intelligent Electronic Systems Human
and Systems Engineering Department of Electrical
and Computer Engineering
2
  • Overview
  • Language Model Classes
  • LanguageModelIHD Explanation of IHD-gtBNF and
    BNF-gtIHD conversions.
  • LanguageModelABNF Explanation and example of
    ABNF-gtBNF conversion algorithm.
  • LanguageModelBNF Explanation of graph
    minimization algorithm
  • LanguageModelXML and LanguageModelJSGF
  • Network Utilities isip_network_builder,
    isip_network_converter
  • Release Progress
  • Outstanding Issues
  • Plan
  • Deadline

3
  • Class LanguageModelIHD
  • What is Normalized BNF?
  • Normalized BNF consists only of the following
    three rule forms
  • 1. (RULE_NAME) ?(TERMINAL),(NON_TERMINAL)
  • 2. (RULE_NAME) ?(NON_TERMINAL)
  • 3. (RULE_NAME) ?(EPSILON)
  • IHD?BNF
  • Straightforward conversion process
  • Each IHD arc is converted to a normalized BNF
    rule
  • Example

IHD
BNF
4
  • Class LanguageModelIHD
  • BNF ? IHD
  • Straightforward conversion process
  • Simply the reverse of the IHD?BNF process
  • Unique nodes identified by unique instances of
  • (RULE_NAME)?(TERMINAL)
  • Concatenation tokens (,) correspond to arcs and
    are weighted
  • Example

BNF
IHD
5
  • Class LanguageModelABNF
  • ABNF ? BNF
  • Complicated!
  • Accomplished using a recursive algorithm that
    extracts sets of right symbols and left
    symbols and builds a set of normalized BNF
    rules.
  • A set of right and left symbols is found when a
    concatenation, Kleene star () or Kleene plus
    () is encountered.
  • If n left symbols and m right symbols are found,
    n x m BNF rules are created.
  • ABNF rules are processed one at a time
  • We iterate over the tokens in each rule from
    left to right and look for concatenation, Kleene
    star, and Kleene plus tokens.
  • When one of these tokens is encountered, the
    recursive methods findLeftSymbols() and
    findRightSymbols() are called. Each returns a
    set of symbols.

6
  • Class LanguageModelABNF
  • Example
  • We must first construct a set of nodes using
    unique combinations of
  • (RULE_NAME)?(TERMINAL)

IHD
ABNF
Nodes
7
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

This rule contains no tokens of interest, so we
move on to the next rule.
8
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

As we iterate from left to right, we encounter a
concatenation token. The findLeftSymbols method
returns A.
9
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

When findRightSymbols is called, we encounter a
Kleene star.
10
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The findRightSymbols method must be called on the
token following the next concatenation at this
nesting level.
11
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

Next, findRightSymbols is called on the token
following the Kleene star. In this case, its an
opening parenthesis.
12
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

For an opening parenthesis, we call
findRightSymbols on the token following it.
13
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

We also look for alternation tokens, and call
findRightSymbols on tokens following the them.
14
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The Kleene plus is ignored since it isnt
currently relevant, and findRightSymbols is
called on the open parenthesis.
15
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

Now we can construct a set of BNF rules from the
right and left symbols.
16
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The next token of interest is a Kleene star. For
these, we want a self loop on all rule segments
following alternations.
17
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

Since the following token is an open parenthesis,
we find all rule segments separated by
alternation tokens.
18
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

A different set of rules is created for each
segment.
19
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
20
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
21
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
22
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

findRightSymbols is called on the first token of
each segment, and findLeftSymbols is called on
the last.
23
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The next token of interest is another
concatenation. Again, we find a set of right and
left symbols and build rules.
24
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The next token of interest is another
concatenation. Again, we find a set of right and
left symbols and build rules.
25
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

The next token of interest is another
concatenation, but this time, the right symbol is
a non terminal.
26
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

When findRightSymbols is called on a non
terminal, findRightSymbols is called on the first
token of the rule referenced.
27
  • Class LanguageModelABNF
  • Example

IHD
ABNF
BNF Rules

BNF start rules are found by calling
findRightSymbols on the first token of the ABNF
start rules.
28
  • Class LanguageModelABNF
  • Weights
  • ABNF does not have a mechanism for defining
    weights on arcs because ABNF has no knowledge of
    arcs. Arcs are just implied by the grammar
    representation.
  • When converting from IHD to any other format
    that uses ABNF as an intermediate, weights are
    included on the open parenthesis tokens preceding
    non terminal and terminal symbols.
  • In some cases, the ABNF rules must be
    restructured to support weights. This will only
    be the case if the source of the grammar is not
    ISIP internal.
  • Testing
  • The ABNF?BNF algorithm has been thoroughly
    tested on ABNF grammars derived from XML, but
    more testing needs to be done on arbitrary ABNF
    grammars.

29
  • Class LanguageModelBNF
  • Graph Minimization
  • Converting from XML introduces redundancy.
    Although resulting graphs are equivalent to the
    originals, theyre much larger and nearly
    impossible to interpret visually.
  • The minimize method in LanguageModelBNF can be
    used to remove redundancy once the language model
    is in BNF representation.
  • The algorithm iterates over all rule pairs and
    determines whether or not the rules can be merged
    into a single rule.
  • Rules can be merged if the non terminal of both
    rules reference the same terminal and if the
    weights on the concatenation tokens are the same.
    When two rules are merged, the other rules must
    all be updated.
  • Example

30
  • Class LanguageModelBNF
  • Example
  • Testing
  • Currently, this minimization algorithm has been
    tested by visually inspecting the original graph
    and resulting graph and verifying that they are
    equivalent.
  • The isip_lm_tester tool will be able to test it
    more thoroughly once the language model parsing
    capability is complete.

31
  • Class LanguageModelXML and LanguageModelJSGF
  • LanguageModelXML
  • Wesley has completed this class and checked it
    in. Minor changes are made every once and a
    while, but overall, the conversions from BNF to
    XML and XML to ABNF are working fine.
  • LanguageModelJSGF
  • This class will be implemented similarly to
    LanguageModelXML.
  • The underlying JSGF representation is ABNF.
  • JSGF parsing algorithms already exist, but
    currently, the JSGF tokens are converted directly
    to IHD.
  • This was supposed to be finished several weeks
    ago, but issues regarding ABNF to BNF conversion
    and graph minimization have caused delays.

32
  • Other Language Model Related Utilities
  • isip_network_converter
  • Changes have been made to incorporate XML, BNF,
    and ABNF.
  • A minimize option has been added that invokes the
    minimization routine when the language model is
    in BNF representation.
  • isip_network_builder
  • The changes to allow network_builder to save in
    other formats are pending
  • isip_lm_tester
  • Won is in the process of adding parsing
    capability to this tool. Currently, the tool can
    only generate random transcriptions.
  • Soon, it will be able to parse transcriptions and
    verify that they are valid given a particular
    language model.

33
  • Release Progress
  • Outstanding Issues
  • LanguageModelJSGF (Daniel)
  • Diagnose methods and documentation (Daniel,
    Seungchan, Ted)
  • isip_lm_tester parsing capability (Won)
  • isip_transform and isip_transform_builder
    (Sridhar)
  • Varmint backlog (Everyone)
  • Schedule/Deadline
  • March 10 All code and documentation will be
    completed, tested, and checked in (code freeze).
  • After March 10, we will begin running regression
    and code integrity tests.
  • March 31 Release Date
Write a Comment
User Comments (0)
About PowerShow.com