CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 503 Computational Linguistics

Description:

bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: giuseppe7
Category:

less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics


1
CPSC 503Computational Linguistics
  • Word-Sense Disambiguation
  • Information Retrieval
  • Lecture 18
  • Giuseppe Carenini

2
Semantics Summary
  • What meaning is and how to represent it
  • How to map sentences into their meaning
  • Meaning of individual words
  • Tasks
  • Information Extraction
  • Word Sense Disambiguation
  • Information Retrieval

3
Today 24/3
  • Word-Sense Disambiguation
  • Machine Learning Approaches
  • Information Retrieval (ad hoc)

4
Supervised ML Approaches to WSD
5
Training Data Example
((word context) ? sense)i
  • ..after the soup she had bass with a big salad

6
WordNet Bass music vs. fish
  • The noun bass'' has 8 senses in WordNet
  • bass - (the lowest part of the musical range)
  • bass, bass part - (the lowest part in polyphonic
    music)
  • bass, basso - (an adult male singer with )
  • sea bass, bass - (flesh of lean-fleshed saltwater
    fish of the family Serranidae)
  • freshwater bass, bass - (any of various North
    American lean-fleshed )
  • bass, bass voice, basso - (the lowest adult male
    singing voice)
  • bass - (the member with the lowest range of a
    family of musical instruments)
  • bass -(nontechnical name for any of numerous
    edible marine and freshwater spiny-finned
    fishes)

7
Representations for Context
  • GOAL Informative characterization of the window
    of text surrounding the target word
  • TASK Select relevant linguistic information,
    encode them as a feature vector

8
Relevant Linguistic Information(1)
  • Collocational info about the words that appear
    in specific positions to the right and left of
    the target word

Typically words and their POS
word in position -n, part-of-speech position -n,
word in position n, part-of-speech position
n,
Assume a window of /- 2 from the target
  • Example text (WSJ)
  • An electric guitar and bass player stand off to
    one side not really part of the scene,

guitar, NN, and, CJC, player, NN, stand, VVB
9
Relevant Linguistic Information(2)
  • Co-occurrence info about the words that occur
    anywhere in the window regardless of position
  • Find k content words that most frequently
    co-occur with target in corpus (for bass
    fishing, big, sound, player, fly , guitar, band))

Vector for one case c(fishing), c(big),
c(sound), c(player), c(fly), , c(guitar),
c(band)
  • Example text (WSJ)
  • An electric guitar and bass player stand off to
    one side not really part of the scene,

0,0,0,1,0,0,0,0,0,0,1,0
10
ML for Classifiers
  • Training Data
  • Co-occurrence
  • Collocational
  • Naïve Bayes
  • Decision lists
  • Decision trees
  • Neural nets
  • Support vector machines
  • Nearest neighbor methods

Machine Learning
Classifier
11
Naïve Bayes
12
Naïve Bayes Evaluation
  • Experiment comparing different classifiers
    Mooney 96
  • Naïve Bayes and Neural Network achieved highest
    performance
  • 73 in assigning one of six senses to line

13
Bootstrapping
  • What if you dont have enough data to train a
    system

14
Bootstrapping how to pick the seeds
  • Hand-labeling
  • Likely correct
  • Likely to be prototypical
  • One sense per collocation search for words or
    phrases strongly associates with target senses.
    Then automatic labeling.
  • E.g., bass
  • play is strongly associated with the music sense
    whereas fish is strongly associated the fish sense

15
Unsupervised Methods Schultze 98
Training Data
Machine Learning
(word vector)1 (word vector)n
K Clusters ci
16
Agglomerative Clustering
  • Assign each instance to its own cluster
  • Repeat
  • Merge the two clusters that are more similar
  • Until (specified of clusters is reached)
  • If there are too many training instances -gtrandom
    sampling

17
Problems
  • Given these general ML approaches, how many
    classifiers do I need to perform WSD robustly
  • One for each ambiguous word in the language
  • How do you decide what set of tags/labels/senses
    to use for a given word?
  • Depends on the application

18
Recent Work on WSD
  • Word Sense Disambiguation Recent Successes and
    Future Directions
  • A SIGLEX/SENSEVAL Workshop at ACL 2002 University
    of Pennsylvania

19
Today 24/3
  • Word-Sense Disambiguation
  • Machine Learning Approaches
  • Information Retrieval (ad hoc)

20
Information Retrieval
  • Retrieving relevant documents from document
    repositories
  • Sub-Areas
  • Ad hoc retrieval (Query-gt List of documents)
  • Text Categorization (Document -gt Category)
  • Eg BusinessNews (OIL, ACQ, )
  • Filtering (special case of TC, with 2 categories
    - relevant/non-relevant)

21
Information Retrieval
  • Bag of words assumption in modern IR the
    meanings of documents is captured by analyzing
    (counting) the words that occur in them.
  • Efficiency
  • Works in practice

Tobias Scheffer and Stefan Wrobel. Text
classification beyond the bag-of-words
representation In Proceedings of the
ICML-Workshop on Text Learning. 2002.
22
IR Terminology
  • Documents
  • Any contiguous bunch of text (E.g. News article,
    Web page, paragraph)
  • Collection
  • A bunch of documents
  • Terms
  • Words that occur in a collection (but it may
    include common phrases E.g. car insurance)
  • Query
  • Terms that express an information need

23
Terms Selection and Creation
  • Stop list? a list of frequent largely
    content-free words that are not considered (of,
    the, a, to, etc.)
  • Stemming? Are terms stems or words?
  • Eg. Are dog and dogs separate terms or are they
    collapsed to dog?
  • Phrases? Include most frequent biagrams as
    phrases

24
Ad hoc Ranked Retrieval
Documents in collection ranked by relevance
  • query

d1 d2 dM
What should a t express?
25
First approximation bit vector
  • ti 1 if the corresponding word type occurs in
    the document ( ti 0 otherwise )

Is this a satisfying solution?
26
Better Term Weighting
  • Local weight How important is this term to the
    meaning of this document
  • Global weight How well does this term
    discriminate among the documents in the collection

The more documents a term occurs in the less
important it is
  • SOLUTION combine Local and Global

27
New Similarity the cosine measure
normalized
28
Ad Hoc Retrieval Summary
  • Given a users query find all the documents that
    contain any of the terms in the query

Why only those documents?
  • Convert the query to a vector
  • Compute the cosine between the query vector and
    all the candidate documents and sort

29
IR Evaluation (1)
  • What do we want?
  • We want documents relevant to the query to be
    near the top of the list

d1 d2 dM
  • Use a test collection where you have
  • A set of documents
  • A set of queries
  • A set of relevance judgments that tell you which
    documents are relevant to each query

30
IR Evaluation (2)
  • Can we use Precision and Recall?
  • Precision relevant docs returned/docs returned
  • Recall relevant docs returned/relevant docs
    total
  • Not directly...

31
Precision and Recall Plots
Higher cut-off
1
precision
0
1
recall
32
IR Current Research
  • TREC (Text Retrieval Conference)
  • large document sets for testing
  • uniform scoring systems
  • Different Tracks
  • Interactive Track studying user interaction with
    text retrieval systems.
  • Question Answering Track
  • Web Track
  • Terabyte Track
  • ...

33
Next Time
  • Discourse and Dialog Chp. 18 and 19
Write a Comment
User Comments (0)
About PowerShow.com