Learning to Transform Natural to Formal Language - PowerPoint PPT Presentation

About This Presentation
Title:

Learning to Transform Natural to Formal Language

Description:

Learning to Transform Natural to Formal Language. Presented by Ping Zhang ... Natural Language Processing (NLP) Natural Language human language. English ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 28
Provided by: csc61
Learn more at: http://csc.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning to Transform Natural to Formal Language


1
Learning to Transform Natural to Formal Language
Rohit J. Kate, Yuk Wah Wong, and Raymond J.
Mooney
  • Presented by Ping Zhang

2
Overview
  • Background
  • SILT
  • CLANG and GEOQUERY
  • Semantic Parsing using Transformation rules
  • String-based learning
  • Tree-based learning
  • Experiments
  • Future work
  • Conclusion

3
Natural Language Processing (NLP)
  • Natural Languagehuman language.
  • English
  • The reason to process NL
  • To provide a much user-friendly interface
  • Problems
  • NL is too complex.
  • NL has many ambiguities.
  • Until now, NL cannot be used to program a
    computer.

4
Classification of Language
  • Traditionally classification (Chomsky Hierarchy)
  • Regular grammar
  • Context-free grammarFormal Language
  • Context-sensitive grammar
  • Unrestricted grammarNatural Language
  • All programming languages are less flexible than
    context-sensitive languages currently.
  • For example, C is a restricted
    context-sensitive language.

5
An Approach to process NL
  • Map a natural language to a formal query or
    command language.
  • Therefore, NL interfaces to complex computing and
    AI systems can be more easily developed.

English Formal Language
Map
Compiler Interpreter
6
Grammar Terms
  • Grammar
  • G (N, T, S, P)
  • N finite set of Non-terminal symbols
  • T finite set of Terminal symbols
  • S Starting non-terminal symbol, S?N
  • P finite set of productions
  • Production x-gty
  • For example,
  • Noun -gt computer
  • AssignmentStatement -gt i 10
  • Statements -gt Statement Statements

7
SILT
  • SILTSemantic Interpretation by Learning
    Transformations
  • Transformation rules Map substrings in NL
    sentences or subtrees in their corresponding
    syntactic parse trees to subtrees of the
    formal-language parse tree.
  • SILT learns transformation rules from training
    datapairs of NL sentences and manual translated
    formal language statements.
  • Two target formal languages
  • CLANG
  • GEOQUERY

8
CLANG
  • A formal language used in coaching robotic soccer
    in the RoboCup Coach Competition.
  • CLANG grammar consists of 37 non-terminals and
    133 productions.
  • All tactics and behaviors are expressed in terms
    of if-then rules
  • An example
  • ( (bpos (penalty-area our) )
  • (do (player-except our 4 ) (pos (half our) )
    ) )
  • If the ball is in our penalty area, all our
    players except player 4 should stay in our half.

9
GEOQUERY
  • A database query language for a small database of
    U.S. geography.
  • The database contains about 800 facts.
  • Based on Prolog with meta-predicates
    augmentations.
  • An example
  • answer(A, count(B, (city(B), loc(B, C),
  • const(C, countryid(usa) ) ),A) )
  • How many cities are there in the US?

10
Two methods
  • String-based transformation learning
  • Directly maps strings of the NL sentences to the
    parse tree of formal languages
  • Tree-based transformation learning
  • Maps subtrees to subtrees between two languages.
  • Assumes the syntactic parse tree and parser of
    the NL sentences are provided

11
Semantic Parsing
  • Pattern matching
  • Patterns found in NL lt-gt Templates based on
    productions
  • NL phrases lt-gt Formal expression
  • Rule representation for two methods

TEAM UNUM has the ball CONDITION ?(bowner
TEAM UNUM)
12
Examples of Parsing
  1. If our player 4 has the ball, our player 4
    should shoot.
  2. If TEAM UNUM has the ball, TEAM UNUM should
    ACTION. our 4
    our 4 (shoot)
  3. If CONDITION , TEAM UNUM should ACTION.
    (bowner our 4) our 4
    (shoot)
  4. If CONDITION , DIRECTIVE . (bowner our
    4) (do our 4 (shoot) )
  5. RULE( (bowner our
    4) (do our 4 (shoot) ))

13
Variations of Rule Representation
  • SILT allows patterns to skip some words or nodes
  • if CONDITION, lt1gt DIRECTIVE. lt1gt -gt then
  • To deal with non-compositionality
  • SILT allows to apply constrains
  • in REGION matches CONDITION -gt (bpos REGION)
    if in REGION follows the ball lt1gt.
  • SILT allows to use templates with multi
    productions
  • TEAM player UNUM has the ball in REGION
    CONDITION ? (and (bowner TEAM UNUM) (bpos
    REGION))

14
Learning Transformation Rules
15
Issues of SILT Learning
  • Non-compositionality
  • Rule cooperation
  • Rules are learn in order.
  • Therefore an over-general ancestor will lead to a
    group of over-general child rules. Further, no
    rule can cooperate with that kind of rules.
  • Two approaches can solve
  • Find the single best rule for all competing
    productions in each iteration.
  • Over generate rules then find a subset which can
    cooperate

16
FindBestRule() For String-based Learning
  • Input A set of productions in the formal
    grammar sets of
  • positive P and negative examples N for each
    in
  • Output The best rule BR
  • Algorithm
  • R Ø
  • For each production p? ?
  • Let Rp be the maximally-specific rules derived
    from P.
  • Repeat for k 1000 times
  • Choose r1, r2 ? Rp at random.
  • g GENERALIZE(r1, r2, p)
  • Add g to R.
  • R R ? R
  • BR argmax r ? R goodness(r)
  • Remove positive examples covered by BR from P .

17
FindBestRule() Cont.
  • Goodness (r)
  • GENERALIZE
  • r1, r2 two transformation rules based on the
    same production
  • For example
  • p Region -gt (penalty-area TEAM)
  • pattern 1 TEAM s penalty box
  • pattern 2 TEAM penalty area
  • Generalization TEAM lt1gt penalty

18
Tree-based Learning
  • Similar FindBestRules() algorithm
  • GENERALIZE
  • Find the largest common subgraphs of two rules.
  • For example
  • p Region -gt (penalty-area TEAM)
  • Pattern 1 Pattern 2
    Generalization

19
Experiment
  • As for CLANG
  • 300 pieces selected randomly from log files of
    2003 RoboCup Coach Competition.
  • Each formal instruction was translated into
    English by human.
  • Average length of a NL sentence is 22.52 words.
  • As for GEOQUERY
  • 250 questions were collected from undergraduate
    students.
  • All English queries were translated manually.
  • Average length of a NL sentence is 6.87 words.

20
Result for CLANG
21
Result for CLANG (Cont.)
22
Result for GEOQUERY
23
Result for GEOQUERY (Cont.)
24
Time Consuming
Time consuming in minutes.
25
Future Work
  • Though improved, SILT still lacks robustness of
    statistical parsing.
  • The hard-matching symbolic rules of SILT are
    sometimes too brittle.
  • A more unified implementation of tree-based SILT
    which allows to directly compare and evaluate the
    benefit of using initial syntactic parsers.

26
Conclusion
  • A novel approach, SILT, can learn transformation
    rules that maps NL sentences into a formal
    language.
  • It shows better overall performance than previous
    approaches.
  • NLP, still a long way to go.

27

Thank you!
Questions or comments?
Write a Comment
User Comments (0)
About PowerShow.com