OntoNotes: A Unified Relational Semantic Representation - PowerPoint PPT Presentation

About This Presentation
Title:

OntoNotes: A Unified Relational Semantic Representation

Description:

Using the API: Creating Full-fledged Objects (I) 11. Using the API: Creating Full-fledged Objects (II) 12. Using the API: Writing to the database ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 32
Provided by: spyr1
Category:

less

Transcript and Presenter's Notes

Title: OntoNotes: A Unified Relational Semantic Representation


1
OntoNotes A Unified Relational Semantic
Representation
Sameer Pradhan, Eduard Hovy, Mitchell Marcus,
Martha Palmer, Lance Ramshaw, and Ralph
Weischedel
http//www.bbn.com/ontonotes
2
Outline
  • An integrated relational database representation
  • Enforces consistency across the different
    annotations
  • Supports integrated models that can combine
    evidence from different layers
  • Some practical issues
  • Sensitivity to changes in layers
  • Adding a new layer to the data
  • Few lessons learned

3
Problems with Multiple Layers of Annotation
  • Not previously available
  • A number of these layers have not been available
    in significant quantity before
  • Word Sense
  • Coreference
  • Not previously integrated
  • Each layer encoded separately as individual
    files, requiring supporting documentation for
    interpretation
  • Not previously completely consistent
  • Mismatches between Treebank and PropBank
  • Not previously user friendly
  • Raw text format

4
Unified Representation
  • Provide a bare-bones representation independent
    of the individual layers semantics that can
  • Efficiently capture intra- and inter- layer
    semantics
  • Maintain component independence (facilitate
    collaboration)
  • Provide mechanism for flexible integration (for
    an application)
  • Integrate information at the required level of
    granularity
  • Data storage as close as possible to an
    application backend
  • Adaptable in face of incremental representational
    changes
  • API extremely accessible (dont need to be a
    hacker to use it)
  • Ability to easily perform cross-layer queries
  • Easily extensible
  • Capable of maintaining version information
    Ideally at different possible levels

Relational Database
Object Oriented API
5
Relational Representation
6
Example Database Representation of Syntax
  • Treebank tokens (stored in the Token table)
    provide the common base
  • The Tree table stores the recursive tree nodes,
    each with its span
  • Subsidiary tables define the sets of function
    tags, phrase types, etc.

7
Object Oriented API
8
Using the API Importing the modules
9
Using the API Creating Skeleton Objects
10
Using the API Creating Full-fledged Objects (I)
11
Using the API Creating Full-fledged Objects (II)
12
Using the API Writing to the database
13
Using the API Reading form the Database
14
Data Loading Life-cycle
Database
15
OntoNotes Data Current and Future
OntoNotes 1.0
OntoNotes 2.0
OntoNotes 3.0
NW BN BC
Eng 300
Chi 250
Ara
16
Advantages of an Integrated Representation
  • Clean, consistent layers
  • Resolve the inconsistencies and problems that
    this reveals
  • Well defined relationships
  • Database schema defines the merged structure
    efficiently
  • Extract individual views
  • Treebank, PropBank, etc.
  • SQL queries can extract examples based on
    multiple layers or define new views
  • Python Object-oriented API allows for
    programmatic access to tables and queries

17
Example of Database Query Function
What is the distribution of named entities that
are ARG0s of the predicate say?
for a_proposition in a_proposition_bank
if(a_proposition.lemma ! "say")
arg_in_p_q "select from
argument where proposition_id 's'"
(a_proposition.id)
a_cursor.execute(arg_in_p_query)
argument_rows a_cursor.fetchall()
for a_argument_row in argument_rows
a_argument_id
a_argument_row"id"
a_argument_type a_argument_row"type"
if(a_argument_type ! "ARG0")
n_in_arg_q "select
from argument_node where argument_id 's'"
(a_argument_id)
a_cursor.execute(n_in_arg_q)
argument_node_rows a_cursor.fetchall()
for a_argument_node_row
in argument_node_rows
a_node_id a_argument_node_row"node_id"
a_ne_node_query
"select from name_entity where subtree_id
's'" (a_node_id)
a_cursor.execute(a_ne_node_query)
ne_rows a_cursor.fetchall()
for a_ne_row in
ne_rows
a_ne_type a_ne_row"type"
ne_hasha_ne_type
ne_hasha_ne_type 1
a_tree a_tree_document.get_tree(a_tree_id)
a_node
a_tree.get_subtree(a_node_id)
for a_child in a_node.subtrees()

a_ne_subtree_query "select from name_entity
where subtree_id 's'" (a_child.id)
subtree_ne_rows
a_cursor.execute(a_ne_subtree_query)
ne_subtree_rows
a_cursor.fetchall()
for a_ne_subtree_row in ne_subtree_rows

a_subtree_ne_type a_ne_subtree_row"type"

ne_hasha_subtree_ne_type ne_hasha_subtree_ne_
type 1
18
Reconciling Treebank and PropBank
  • We found several mis-matches between syntax and
    propositions
  • Sometimes PropBank was right
  • Sometimes Treebank was right
  • Guidelines modified to bring the two in line
  • Now each argument points to a single node in the
    tree
  • Secondary connections are made using Treebank
    trace chains
  • Almost no discontinuous arguments
  • Non-trace connections are explicitly identified
  • This greater consistency will make it easier to
    train models that predict argument structure

19
Sensitivity to Changes PropBank changes
S
NP
NP
PP
PP
JJ
NNS
CC
NNS
IN
NP
IN
NP
NNS
JJ
NNP
... major reductions and realignments of troops
in central Europe ...
20
Sensitivity to Changes Treebank changes
  • If the node got deleted, remove associated
    annotation
  • if any node has a change in children or parent
    node, then
  • update associated annotation. Print new propbank

S
NP
NP
PP
PP
JJ
NNS
CC
NNS
IN
NP
IN
NP
NNS
JJ
NNP
... major reductions and realignments of troops
in central Europe ...
21
Adding a new layer
  • What information do you want to capture?
  • Define relationship with the required layer
  • Design tables
  • Superimpose on existing machinery with respect to
    the anchor
  • Create a class in the corpora package
  • Define a few specific functions
  • Create object from original annotation (Text
    Reader)
  • Write object to database (DB Writer)
  • Create object from database (DB Reader)
  • Write database to original format (Text Writer)
  • Pretty print function (Pretty Printer)
  • Write at least one alignment function at the
    level where the enrichment is required, or even
    multiple levels
  • Enrich Treebank/Document/

22
Few Errors Found
  • Missing co-indices in Trees (found during
    loading)
  • Invalid sense numbers (while checking against
    repository)
  • Multiple sense definitions (in the repository)
  • Validation errors in schemas
  • Dead pointers in ontology
  • Multiple coreference chain memberships
  • Missing/Invalid predicate/argument pointers
  • Invalid PB/TB merges
  • Filename/Content mismatches
  • Pinyin/Unicode inconsistencies
  • Varying sentence breaks
  • SLINK Errors
  • Inconsistent TB Empty specifications in the merge
    process
  • Typos (found through Type Tables)
  • ..
  • And, a few annotation Errors

23
Some Interesting Problems Addressed
  • Word sense annotation transferred from old
    Treebank to new Treebank
  • Coreference annotation transferred to new
    Treebank
  • Treebank/PropBank with or without NMLs reside in
    harmony
  • Various levels of data quality identified in the
    database
  • Varying styles of marking traces normalized
  • Language specific idiosyncrasies in inventories
    and frames normalized
  • Data generated for annotation
  • Eventive nouns
  • Coreference

24
Few Lessons Learned
  • Each layer should
  • abide by a minimum dependency principle
  • adhere to a well defined schema
  • Try to maintain consistency across representation
    of similar components
  • Use a centralized, version controlled repository
  • Need for single-point, push-button loading
    philosophy

25
Conclusion
  • Lot of annotation layers available, integrated
    using a relational schema
  • A extensible, relational/object oriented
    architecture available to the community
  • Easily Accessible
  • Through Python API
  • SQL queries
  • OntoNotes Release 2.0 available from LDC

26
Backup
27
Syntax Layer
  • Identifies meaningful phrases in the text
  • Lays out the structure of how they are related

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
28
Propositional Structure
  • Tells who did what to whom
  • For both verbs and nouns

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
29
Predicate Frames
  • Predicate frames define the meanings of the
    numbered arguments

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
reduction
reduce.01 Make less
reduce.01 Make less
ARG0 Agent ARG1 Thing falling ARG2 Amount
fallen ARG3 Starting point ARG4 Ending point
- the troops major - -
30
Word Sense and Ontology
  • Meaning of nouns and verbs are specified using a
    catalog of possible senses
  • All the senses are annotatable at 90 ITA
  • Ontology links (currently being added) capture
    similarities between related senses of different
    words

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
  1. Enter into an official record
  1. Wish, purpose or intend to achieve something

31
Coreference
  • Identifies different mentions of the same entity
    within a document especially links definite,
    referring noun phrases, and pronouns to their
    antecedents
  • Two types tagged Identity and Attributive

Concerns about the pace of the Vienna talks --
which are aimed at the destruction of some
100,000 weapons , as well as major reductions and
realignments of troops in central Europe also
are being registered at the Pentagon .
Write a Comment
User Comments (0)
About PowerShow.com