The CrossLingual Reuse and Extension of Knowledge Resources in Ontological Semantics - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The CrossLingual Reuse and Extension of Knowledge Resources in Ontological Semantics

Description:

REQUEST-ACTION-69. AGENT HUMAN-72. THEME ACCEPT-70. BENEFICIARY ORGANIZATION-71 ... HUMAN-72. HAS-NAME Colin Powell. AGENT-OF REQUEST-ACTION-69 ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 27
Provided by: margem
Category:

less

Transcript and Presenter's Notes

Title: The CrossLingual Reuse and Extension of Knowledge Resources in Ontological Semantics


1
The Cross-Lingual Reuse and Extension of
Knowledge Resources in Ontological Semantics
  • Marjorie McShane, Sergei Nirenburg, Stephen
    Beale, Margalit Zabludowski
  • The Institute for Language and Information
    Technologies (ILIT)
  • University of Maryland Baltimore County

2
Plan of the Talk
  • Background
  • the OntoSem environment
  • the OntoSem processors and their parameterization
    potential
  • language-independent and language-dependent
    knowledge resources
  • Focus of the talk the benefits and challenges of
    using our current, English, lexicon to seed the
    building of lexicons for other languages
  • For comparison (one of many possible) the
    European Unions SIMPLE lexicon-building project

3
What is OntoSem?
  • Ontological Semantic (OntoSem) text processing
    takes as input open text and returns text-meaning
    representations (TMRs), which are structured
    representations of its meaning (Nirenburg and
    Raskin, Ontological Semantics, MIT Press,
    forthcoming).
  • TMRs are the basis for all other applications
    MT, QA, Knowledge Extraction
  • OntoSem is a practical, non-toy system that
    concentrates on encoding and interpreting text
    meaning (in contrast with most stochastic methods
    that focus on surface strings)

4
OntoSem Processors
  • Processors include
  • pre-processor (tokenization, POS tagging,
    morphological analysis, etc.)
  • syntactic analyzer
  • semantic analyzer
  • All processors are parametrizable i.e., basic
    functionality can apply to any language, but
    certain language-specific information must be
    recorded
  • Recent experiments port analyzers to Arabic and
    Persian (CRL) successful for small experiment

5
Language Independent Knowledge Sources
  • the TMR language
  • an ontology, currently containing 5500 concepts
    designed to support many-to-one lexical mappings
  • a fact repository, which is a database of
    real-world facts that represent instances of
    ontological concepts
  • Brief descriptions of each follow.

6
Static Resources 1 The TMR Language
  • He asked the UN to authorize the war.
  • REQUEST-ACTION-69   AGENT HUMAN-72
    THEME ACCEPT-70   BENEFICIARY
    ORGANIZATION-71   SOURCE-ROOT-WORD ask
    TIME (lt (FIND-ANCHOR-TIME)) ACCEPT-70  
    THEME WAR-73   THEME-OF REQUEST-ACTION-69
      SOURCE-ROOT-WORD authorizeORGANIZATION-71
      HAS-NAME United-Nations  BENEFICIARY-OF
    REQUEST-ACTION-69   SOURCE-ROOT-WORD
    UNHUMAN-72   HAS-NAME Colin Powell 
    AGENT-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD
    he reference resolution has been carried
    outWAR-73   THEME-OF ACCEPT-70
      SOURCE-ROOT-WORD war

7
Static Resources 2 The Ontology
8
Ontology (cont) Why our ontology is not just a
word net it contains many properties and their
values(an average of 16 per concept), scripts,
etc.
  • BIOLOGICAL-WEAPON
  • MADE-OF MICRO-ORGANISM
  • INSTRUMENT-OF DESTROY, ANIMAL-DISEASE,
  • PLANT-DISEASE
  • GUN
  • INSTRUMENT-OF DISCHARGE, HUNTING-EVENT
  • PRODUCED-BY GUNSMITH
  • MADE-OF METAL (PLASTIC, WOOD)
  • COLON-CANCER
  • LOCATION COLON
  • ESTABLISHED-BY COLONOSCOPY, SIGMOIDOSCOPY
  • HAS-SYMPTOM CONSTIPATION
  • REMEDIED-BY DRUG, PERFORM-SURGERY
  • EXPERIENCER ANIMAL

9
Static Resources 3 Fact Repository
10
Language Dependent Knowledge Sources
  • Lexicons and onomastica (lexicons of proper
    names) are language dependent
  • For lexicons, however, the semantic
    representation is largely transferable across
    languages, and the syntactic patterns that
    realize a given semantic meaning often are as
    well this is the source of the cross-linguistic
    portability well explore here

11
Example of a Basic Lexicon Entry
  • watch
  • watch-v1
  • synonyms observe
  • anno
  • definition to observe, look at
  • example Hes watching the demolition
    team.
  • syn-struc
  • subject var1
  • v var0
  • directobject var2
  • sem-struc
  • VOLUNTARY-VISUAL-EVENT
  • agent var1
  • theme var2

12
Methods of Expressing Word / Phrase Meaning in
OntoSem
  • mapping directly to an ontological concept (dog
    maps to DOG)
  • mapping to an ontological concept with
    modification by properties e.g.,
  • Zionist maps to POLITICAL-ROLE
  • AGENT-OF SUPPORT
  • THEME Israel
  • asphalt (v.) maps to COVER
  • INSTRUMENT ASPHALT
  • recall (v., as in They recalled the high chairs)
    maps to
  • RETURN-OBJECT
  • THEME ARTIFACT,
    INGESTIBLE, MATERIAL
  • CAUSED-BY
    FOR-PROFIT-CORPORATION

13
Methods of Expressing Meaning (Cont.)
  • using modality or aspect
  • (might-aux1
  • (def "expresses the possibility of something
    happening - epistemic .5")
  • (ex "he might come over")
  • (syn-struc
  • ((subject ((root var1) (cat n)))
  • (root var0) (cat v)
  • (inf-cl ((root var2) (cat v)))))
  • (sem-struc
  • (var2
  • (epistemic .5)
  • (agent (value var1)))))
  • (meaning-procedure
  • (fix-case-role (value var1) (value
    var2))))

14
Methods of Expressing Meaning (Cont.)
  • 4. using our non-ontological methods of
    expressing time, sets, etc
  • (yesterday-adv1
  • (def the day before the speech time")
  • (ex "he admitted that yesterday")
  • (syn-struc
  • ((root var1) (cat v)
  • (mods ((root var0) (cat adv) (type
    pre-verb-post-clause)))))
  • (sem-struc
  • (var1
  • (time (combine-time (find-anchor-time)
    (day 1) before)))))

15
Methods of Expressing Meaning (Cont.)
  • 5. calling a meaning procedure
  • (she-pro1
  • (def "the pronoun 'she'") (ex "she kicked
    the can.")
  • (syn-struc
  • ((root var0) (cat n) (type pro)))
  • (sem-struc
  • (animal))
  • (meaning-procedure
  • (trigger-reference
  • (person third) (number sing) (gender
    female)
  • (same-clause .1) (preceding-clause
    .7) (pre-preceding-clause .5)
    (preceding-sent .5) (sentence-minus-2 .2)
    (sentence-minus-3 .1) (para-break .5)
    (repeat-collocation .7) (synonym-collocation
    .6)
  • (agent-theme .8) (pp-embedded .2)
    (function-match .7) (coord .7))))

16
Why Port Semantic Representations?
  • Once a semantic representation of a word sense
    has been created along with the concurrent
    extension, if necessary, of other resources it
    not only can but should be used to represent the
    same sense in any language. Why?
  • Time (we want to save it!)
  • Paraphrase (weapons of mass destruction a)
    weapons that can kill more than people b)
    nuclear and/or bio weapons)
  • Options for Resource Development (e.g., one can
    ontologize a fine-grained notion or describe its
    properties in the lexicon either is fine)

17
A Driving Principle The Principle of Practical
Effability
  • What can be expressed in one language can be
    expressed in every language so the sem-strucs,
    by definition, must be portable (apart from their
    variables)

18
What Can Be Involved in Editing a Word/Phrase
Sense for a New L (on the example of Polish)
  • No modification required, just a new translation
    dog gt pies. This may rely on global syntactic
    rules for parameterization e.g., subject in
    English maps to Nominative case in Polish as a
    default so a basic transitive frame in English
    can generally map to a basic transitive frame in
    Polish
  • Manual syntactic modification required e.g., an
    object in Polish can have quirky case-marking an
    xcomp in English might be realized as a comp in
    Polish a category that is optional in English
    might be required in Polish or vice versa, etc.
  • Linking of variables might be different in
    different languages
  • Semantic distinctions in one language that are
    missing in another (e.g., the English hand/arm
    distinction is missing in many languages)

19
From Porting Senses to Porting a Whole Lexicon
  • Porting senses is fairly straight forward one
    can provide as many translations of a given sense
    as needed using the synonyms field or a new
    entry, if there are any syntactic differences or
    semantic nuances to be capture translations can
    be words or phrases of any complexity
  • However, porting a whole lexicon introduces
    difficulties of a more organizational nature, to
    which we turn now

20
Organizational Issues
  • Leave the base lexicon as is or attempt to
    improve its quality while building L2 (e.g., add
    more distinguishing properties and values)?
  • Expand lexicons simultaneously (e.g., add more
    senses to English words and their corresponding
    L2 equivalents add more words in general)?
  • Be driven by correspondences in head words or
    simply by sem-struc meanings? (e.g., all English
    senses of table will be in one head entry should
    all senses of all L2 translations of table be
    handled at once during L2 acquisition?)
  • To what extent should the regular acquisition
    process including ontology supplementation be
    carried out on L2?
  • Automate? If so, how and how much?

21
Insight from an Experiment
  • We attempted to do a fast port of part of the
    English lexicon to Polish to determine time
    savings, problems, automation potential
  • The experiment was by carried out by one
    bilingual English and Polish speaker working for
    about a week.
  • A portion of the lexicon was ported,
    problems/successes were reported.

22
As Regards Automation
  • If on-line lexicon bilingual English-L2 lexicon
    has 1 sense for a given word form, it is
    reasonable to assume identity (big time savings
    for technical terms, real-world objects (frying
    pan), etc.). Presenting results in quickly
    inspectable format would help
  • If there is a many-to-one correspondence, would
    need an interface to really exploit time savings
    otherwise, somewhat difficult to keep track of
    senses.

23
As Regards Content of The English and L2 Lexicons
  • How the lexicons would best develop
    simultaneously depends in large part upon the
    capabilities of acquirers, their training, etc.
    Driving English acquisition from the L2 side is
    perfectly fine, since the sem-struc is language
    independent.
  • Need to divide tasks according to their
    difficulty a relatively untrained informant
    could do simple nouns, whereas a trained
    informant is necessary for polysemous verbs,
    phrasals, etc.
  • Time savings clearly depends upon organization of
    efforts getting bogged down in simultaneous
    development of multiple lexicons and the ontology
    is a real risk

24
For Comparison The SIMPLE Project
  • Goal develop 10K-sense semantic lexicons for 12
    European Union languages (the earlier PAROLE
    project developed 20K-sense morphological and
    syntactic ones)
  • Each lexicon is built separately the word list
    for each L is based on corpus evidence for that L
  • Each L must cover a given inventory of high-level
    concepts in EuroWordNet to ensure some overlap
  • Each L uses the same inventory of template
    types, which indicates which types of properties
    should be described for different types of words

25
SIMPLE Aims
  • Apparently, translation is the main aim, since
    semantic description is shallow (wouldnt support
    reasoning)
  • Semantic description is limited to
  • mapping to EuroWordNet concepts (which are
    iconic, not descriptive few properties are used)
  • using a slightly expanded version of qualia,
    which are properties that support reasoning about
    generative properties of words (Pustejovsky, The
    Generative Lexicon)
  • qualia represent just the generative corner of
    lexical description they do not provide breadth
    of description

26
Generalized vs. Application-Specific Resources
  • A dichotomy at stake here is the one between
    generality of a LR lexical resource vs.
    usefulness for applications. In principle, only
    when we know the actual specific use we intend to
    do of a LR can we build the very best LR for
    that use, but this has proved to be too expensive
    and not realistic. In practice, however, there
    exists a large core of information that can be
    shared by many applicative uses, and this leads
    to the concept of generic LR, which is at the
    basis for the EAGLES initiative and of the
    PAROLE/SIMPLE projects, to be then enhanced and
    tuned with other means (Syntactic/semantic
    lexicons for the European languages towards a
    standardised infrastructure, Calzolari 1999 42)
  • However, rebuilding resources for every
    application is not cost effective either, which
    brings us back to the approach were taking in
    OntoSem
Write a Comment
User Comments (0)
About PowerShow.com