An Introduction to Using Semgrex - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

An Introduction to Using Semgrex

Description:

An Introduction to Using Semgrex. Chlo Kiddon. What is Semgrex? ... that should work but doesn't) or need more help, email chloe_at_cs.stanford.edu ... – PowerPoint PPT presentation

Number of Views:343
Avg rating:3.0/5.0
Slides: 20
Provided by: gate6
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Using Semgrex


1
An Introduction to Using Semgrex
Chloé Kiddon
2
What is Semgrex?
  • A java utility (in javanlp) for identifying
    patterns in Stanford JavaNLP SemanticGraph
    structure
  • Much like Tregex, which does this for tree
    structures (Levy, Andrew 2006) and is based on
    tgrep-2 style syntax and functionality. (These
    slides adapted from the structure of theirs)
  • Applied the same way you use regular expressions
    to find patterns in strings

Ex.
tag/VB./ gtdobj ( gtamod lemmared)
3
Semgrex Overview
  • SemgrexPatterns are composed of nodes,
    representing IndexedFeatureLabels, and relations
    between them, representing edges in a
    SemanticGraph
  • SemgrexMatchers can be used on singular
    SemanticGraphs OR on two SemanticGraphs and an
    Alignment between them
  • Ex. an RTE problem has the hypothesis graph, the
    text graph, and the alignment from the hypothesis
    graphs IndexedFeatureLabels to the text graphs
    IndexedFeatureLabels
  • SemgrexPatterns return matches for
    IndexedFeatureLabels in a SemanticGraph

4
Syntax - Nodes
  • Nodes are represented as attr1value1attr2value
    2
  • Attributes are regular strings values can be
    strings or regular expressions marked by /s
  • lemmarunpos/VB./ gt any verb form of the
    word run
  • is any node in the graph
  • is any root in the graph
  • is the empty word (IndexedFeatureLabel.NO_WORD
    )
  • Comes up when working with alignments
  • Descriptions can be negated with !
  • !lemmaboy gt any word that isnt boy

5
Grouping Nodes
  • Perhaps you want a node that is either word with
    an ner TIME tag, or the lemma when. The node
    nerTIMElemmawhen does not accomplish this OR
    operation
  • Can use brackets and (or ) to specify these
    groupings
  • lemmalocate nerLOCATION
  • A node that is either a word with a lemma
    locate or a word with LOCATION ner
  • Can also be negated by putting a ! In front
  • By default, takes precedence over , but has
    no reason to be used

6
Syntax - Relations
  • Relationships between nodes can be specified
  • Relations in Semgrex have two parts the relation
    symbol and the relation type i.e. ltnsubj
  • A ltreln B A is the dependent of a reln
    relation with B
  • A gtreln B A is the governor of a reln relation
    with B
  • A ltltreln B There is some node in a dep-gtgov
    chain from A that is the dependent of a reln
    relation with B
  • A gtgtreln B There is some node in a govgtdep
    chain from A that is the governor of a reln
    relation with B
  • A _at_ B A is aligned to B through an Alignment
    object
  • Relation types can be regular strings or regular
    expressions encased by /

7
Building complex expressions
  • Relations can be strung together for and
  • All relations are relative to first node in
    string
  • gtnsubj gtdobj
  • A node that is the governor of both an nsubj
    relation and a dobj relation
  • symbol is optional gtnsubj gtdobj
  • Nodes can be grouped w/ parentheses
  • posNN _at_ ( ltnsubj )
  • A noun that is aligned to a node that is the
    dependent of an nsubj relation
  • Not the same as posNN _at_ ltnsubj

8
Other Operators on Relations
  • Operators can be combined via or with
  • Ex ltagent ltnsubj
  • A node that is either an agent or a nsubj in the
    graph
  • Like with nodes, takes precedence over
  • Ex ltagent ltnsubj gtamod lemmared
  • An agent node OR a subject modified by the word
    red
  • Equivalent operators are left-associative
  • Any relation can be negated with ! prefix
  • Ex tag/VB./ !_at_ tag/VB./
  • An verb that is not aligned to another verb

9
Other Operators on Relations
  • For times when the pattern will be being matched
    on a pair of graphs and their alignment, the
    default search point is the graph that where the
    alignments are from
  • To override this, place a _at_ at the beginning of
    the pattern
  • Ex for a hypGraph, txtGraph and alignment
    hyp-gttxt
  • nerLOCATION _at_
  • Represents all LOCATION nodes in the hypGraph
    aligned to nodes in the txtGraph
  • _at_ nerLOCATION _at_
  • Represents all LOCATION nodes in the txtGraph
    that are aligned to nodes in the hypGraph

10
Grouping relations
  • To specify operation order, use and
  • Ex tagnn ltprep_in ltprep_on _at_
  • A noun that is the dependent of either a prep_in
    or prep_on relation and is aligned to NO_WORD
  • Grouped relations can be negated
  • Just put ! before the

11
Named Relations
  • Suppose we want to find two nodes connected by
    any relation which have a pair of nodes aligned
    to them with the same relation
  • Name relations with
  • The first showing of a named relation in a
    pattern is the one that is stored as the relation
  • ( gt/.subjagent/reln ) _at_ ( gtreln )
  • We can retrieve the string form of the relation
    found in the graph later by using that name

12
Named Nodes
  • We can name nodes as well as relations
  • Name nodes with and if the node matches, we can
    retrieve node by that name
  • Ex ltnsubj verb
  • Verb with subject found by this pattern is stored
    by the name verb
  • The first showing of a named node in the pattern
    is the one stored under that name. All others
    must be equal to that first one
  • Ex. ( gtnsubj subject _at_ ( gtnsubj ( _at_
    subject))
  • Finds a node that is both the governor of an
    nsubj relation to a node called subject and
    aligned to a node that is the governor of an
    nsubj relation to a node aligned to the node
    labeled as subject

13
Optional Relations to Nodes
  • Sometimes we want to try to match a
    sub-expression to retrieve named nodes if they
    exist, but still match if sub-expression fails.
  • Use optional relation prefix ?
  • Ex gt/nsubjagent/ subject ?gt/.obj/
    object
  • Matches nodes that are governors of nsubj or
    agent relations
  • If the node also is the governor of some sort of
    object relation, then, we can retrieve the object
    using the key object
  • If there is no object, the expression will still
    match
  • Cannot be combined with negation
  • Can be used in front of bracketed relations ?.

14
Use of Semgrex classes
  • Semgrex usage is like java.util.regex
  • Two ways of calling the matcher for a single
    SemanticGraph
  • or for two SemanticGraphs and an Alignment
    between them

String s ( gtnsubj subject _at_ (
gtnsubj ( _at_ subject)) SemgrexPattern p
SemgrexPattern.compile(s)
SemgrexMatcher m p.matcher(graph)
SemgrexMatcher m p.matcher(hypGraph, alignment,
txtGraph) while (m.find())
System.out.println(m.getMatch().word())
15
Use of Semgrex classes
  • Named nodes are retrieved w/ getNode()
  • Named relations are retrieved w/ getRelnString()

IndexedFeatureLabel subj m.getNode(subject)
String subjReln m.getRelnString(subjReln
)
16
A Real Code Example - Before
  • private void checkCopula(Problem problem,
    SemanticGraph hypGraph, SemanticGraph txtGraph)
  • IndexedFeatureLabel root hypGraph.getFirstRo
    ot()
  • IndexedFeatureLabel subj
    hypGraph.getChildWithReln(root, "nsubj")
  • if (subj ! null)
  • IndexedFeatureLabel alignedRoot
    problem.getTxtWord(root)
  • if (alignedRoot ! IndexedFeatureLabel.NO_
    WORD)
  • IndexedFeatureLabel appos
    txtGraph.getChildWithReln(alignedRoot, "appos")
  • ListltIndexedFeatureLabelgt
    appositionList
  • try
  • appositionList txtGraph.getChildrenW
    ithReln(problem.getTxtWord(subj), "nn")
  • catch (IllegalArgumentException e)
  • appositionList new
    ArrayListltIndexedFeatureLabelgt()
  • if(appos ! null)
  • if(problem.getTxtWord(subj).equals(app
    os))
  • problem.addFeature(this,
    Feature.APPOSITION_MATCH, "apposition in text
    between " root.word() " and " subj.word())
  • else

17
A Real Code Example - After
  • private void checkCopula(Problem problem,
    SemanticGraph hypGraph, SemanticGraph txtGraph)
  • IndexedFeatureLabel root hypGraph.getFirstRo
    ot()
  • if (checkAttributiveStructure(hypGraph)
    !checkAttributiveStructure(txtGraph))
  • if(VERBOSE) System.err.println("in check
    copula")
  • SemgrexPattern copulaPat
    SemgrexPattern.compile("(subj ltnsubj (root
    _at_ alignedRoot)) _at_ ( gtnn alignedRoot
    ltappos alignedRoot)")
  • SemgrexMatcher copulaMatcher
    copulaPat.matcher(hypGraph, problem.getAlignment()
    , txtGraph)
  • if (copulaMatcher.find())
  • problem.addFeature(this,
    Feature.APPOSITION_MATCH, "apposition in text
    between " copulaMatcher.getNode("root").word()
    " and " copulaMatcher.getNode("subj").word())
  • else
  • problem.addFeature(this,
    Feature.APPOSITION_MISMATCH, "no apposition in
    text between " copulaMatcher.getNode("root").wor
    d() " and " copulaMatcher.getNode("subj").word
    ())

18
For More Help
  • There is a JUnitTest in the Semgrex package
    called SemgrexPatternTest that can be used to
    test patterns for validity and view what their
    parses are
  • If you find a bug (i.e. a pattern that should
    work but doesnt) or need more help, email
    chloe_at_cs.stanford.edu

19
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com