Comments%20on%20Guillaume%20Pitel:%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Comments%20on%20Guillaume%20Pitel:%20

Description:

There are numerous monolingual and cross-lingual applications. ... Monolingual and cross-lingual similarities will fall out'. Identify and transfer Frame Elements ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 20
Provided by: gerhardf
Category:

less

Transcript and Presenter's Notes

Title: Comments%20on%20Guillaume%20Pitel:%20


1
Comments on Guillaume Pitel Using bilingual LSA
for FrameNet annotation of French text from
generic resources
  • Gerd Fliedner
  • Computational Linguistics
  • Saarland University

2
Comments/Thoughts
  • Useful approach, as it can potentially speed up
    and support annotation and thus making new
    FrameNets.
  • Uses only few resources, therefore extendable to
    other language pairs (in principle).
  • First experiments only.

3
Multilingual FrameNets
  • Having FrameNet for as many languages as possible
    would be nice.
  • There are numerous monolingual and cross-lingual
    applications.
  • BUT Building a FrameNet is knowledge and
    labour intensive work, and thus expensive,
    funding may be a problem.

4
Bootstrapping Multilingual FNs
LSA
  • (Re-) Use as much knowledge from existing
    FrameNets as possible.
  • Ease the task of annotators by making useful
    suggestions.
  • Use automatic methods for knowledge acquisition.

Swamp of Language
5
More than one strand of hair may be needed
  • By the way Change_hair_configuration is not yet
    in FN.

6
FR.FrameNet
  • In FR.FrameNet, several methods have been
    explored that could reduce time and costs of
    building new FrameNets.
  • Tasks explored
  • Lexical Unit (Frame Evoking Element) transfer
  • Identify Frame Elements
  • Disambiguating LU-Frame Assignment

7
Lexical Unit Transfer
  • Can be seen as the task of finding and
    disambiguating translation pairs (links to
    Machine Translation, lexicography).
  • Extract disambiguated translations from existing
    cluster-based dictionary.
  • Some manual annotation required, but relatively
    fast and simple way of acquiring a solid core
    lexicon.

8
Manual Filtering
  • Is frame information currently used for
    disambiguation?
  • How is the manual annotation done? Sounds like
    rules of thumb. Guidelines?
  • How is it evaluated?

9
Resources needed
  • Lexical unit transfer
  • English FrameNet ?
  • Large coverage bi-lingual dictionary
    (source?target language, optimally
    sense-disambiguated) ?
  • Corpus in target language ?
  • (Some) manual annotation ?
  • (Read ? OK, ? may be problem for small
    languages, ? may be problem for small projects)

10
Lexical Unit Transfer Other Possibilities
  • Using human readable resources
  • Use existing dictionaries
  • Problem Disambiguation
  • Using machine readable resources
  • Use Euro WordNet or similar
  • Problem again Disambiguation
  • Use parallel corpora
  • PadóLapata, AAAI-05

11
Identify Frame Elements
  • Core idea The same semantic restrictions/preferen
    ces should apply to Frame Elements in source and
    target language.
  • How can these semantic preferences be learned?
  • First step Learn cross-lingual semantic
    similarity
  • Second step Identify Frame Elements in one
    language and transfer.

12
Bilingual Infomap/Latent Semantic Analysis (LSA)
  • Originally used for crosslingual information
    retrieval.
  • Use bilingual, parallel core corpus.
  • Parallel documents/paragraphs/ are put together
    and count as one text.
  • Build vector space.
  • Monolingual and cross-lingual similarities will
    fall out.

13
Identify and transfer Frame Elements
  • Use Berkeley FrameNet corpus as training corpus
    (English) Frame Elements (content wordsPOS)
    from annotated examples are used as starting
    point.
  • Use semantic space (generated by LSA) to find
    good (hopefully semantically related) translation
    candidates for words making up Frame Element.
  • To identify French Frame Element Find closest
    vector.
  • Several good examples, some less good ones.

14
Add Clustering
  • Inspection of data shows Frame Elements may have
    semantically different fillers.
  • Thus, clustering of LSA vectors seems promising.
  • Identifying French Frame Elements Instead of
    finding closest vector, check whether word vector
    belong to one of the clusters.
  • Problems Identify optimal number of clusters,
    sparse data,

15
Resources Needed
  • Frame Element identification/transfer
  • English FrameNet ?
  • Parallel corpus source/target language ?
  • Additional corpora in both languages ?
  • Corpus in target language ?
  • (Tagger in source/target language ?)
  • (Not so little) manual annotation ?
  • (Read ? OK, ? may be problem for small
    languages, ? may be problem for small projects)

16
Use information from WordNet?
  • For French
  • Use (Euro) WordNet alternatively/in addition
  • Use Euro WordNet links (translations)
  • Use WordNet to expand queries
  • Use similarity measures such as JiangConrath 97.
  • For other languages that do not have WordNet ???

17
Syntax
  • Certain Frame Elements are semantically totally
    heterogeneous, but syntactically (relatively)
    easy to identify
  • For example Statement.Message (engl. say that
    X, fr. dire que X)
  • Problem Semantic transfer can be learned using
    LSA, syntactic transfer (thatque) cannot.
  • Could (partially) parsed parallel corpora be used
    to learn syntactic transfer? Can syntactic and
    semantic Frame Element identification be
    combined? Alternatively Can syntactic Frame
    Elements be recognised and left to annotators
    altogether?

18
Frame Element Preferences
  • Knowing more about Frame Elements (explicitly)
    would be very helpful.
  • Automatic Frame/Frame Element assignment.
  • Manual annotation/guidelines.
  • Transfer to other languages.
  • Encoding preferences as links within FrameNet
  • Encoding preferences as links with external
    resources (WordNet? SUMO/MILO?), cf. work by
    Aljoscha Burchardt
  • Cf. yesterdays talk by Michael Ellsworth

19
Conclusions
  • (Some) more research required.
  • Optimising the annotation process probably very
    important, e.g.
  • Use several cycles (start with more certain
    cases, re-train with the additional data, )
  • Integrate different strategies, e.g. syntax and
    semantics.
  • Which decisions can be made automatically? Can
    suggestions be made? How good are they? Recall
    vs. precision optimisations
Write a Comment
User Comments (0)
About PowerShow.com