Word Association - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Word Association

Description:

Likelihood ratios can be very large/small depending on the evidence ... It is the unweighed contribution of a particular pair of values x and y of the ... – PowerPoint PPT presentation

Number of Views:1833

Avg rating:3.0/5.0

Slides: 23

Provided by: VasileiosH9

Category:

more less

Transcript and Presenter's Notes

Title: Word Association

1
Word Association

Vasileios Hatzivassiloglou
University of Texas at Dallas

2
Why use log likelihood?

Likelihood ratios can be very large/small
depending on the evidence
Computational precision factors
-2log? is asymptotically ?2-distributed when H1
is a subspace and H2 the entire space
Convergence is faster than with Pearsons
chi-square statistic
Thus likelihood ratio tests work with more
extreme distributions

3
Chi-square vs. log-likelihood

Chi-square assumptions often violated for rare
words
Experimental study 32,000 words of Swiss
financial text
Bigrams scored by both ?2 and log-likelihood
Huge values of ?2 for rare words
2,682 of 2,693 bigrams violate the approximation
assumptions

4
Fishers exact test

Based on expanding the possible configurations of
a 22 contingency table
If the cell totals are a, b, c, and d, their
probability under independence is

5
Hypergeometric distribution

This is the distribution applicable to
contingency tables
Similar to the multinomial but
without replacement
Fishers P can be calculated exactly
But this expensive for large numbers
Fortunately, this is when other tests (e.g.,
chi-square) are suitable

6
Word association

A central concept in computational linguistics,
and more specifically in lexical semantics
Syntagmatic association
Two words occur together (e.g., strong tea)
Paradigmatic association
Two words occur in similar contexts, i.e., with
the same other words (e.g., doctor nurse)
can be based on syntagmatic association with
those other words

7
Measures of association

We examine measures of syntagmatic association
These involve the probabilities (marginal and
joint) of just the two words involved

8
Desirable properties

An interpretable measure
defined minimum and maximum
Measure is not particularly sensitive to rare
events
Robustness in estimation
Relationship to a probabilistic model

9
The Dice measure

Defined as
Only the probabilities of x and y occurring
appear in the formula explicitly

10
Dice and conditional probabilities

It is the harmonic mean of p(xy) and p(yx)

11
Mutual information

Recall that joint entropy relates to entropy and
conditional entropy
H(X, Y) H(X) H(YX) H(Y) H(X Y)
The difference H(X) H(XY) is called the mutual
information I(XY) between X and Y
It is a symmetric measure since

12
Mutual information and independence

If X and Y are independent, then
H(X)H(XY) so I(XY)H(X)-H(XY)0
It can be shown that I(XY)0 only if X and Y are
independent
Thus, mutual information can be thought of as a
measure of independence

13
Mutual information and dependence

If X and Y are perfectly dependent, then
H(XY) H(XX) 0, so I(XY) H(X)-H(XY)
H(X) 0 H(X)
Thus mutual informations maximum value depends
on the entropy of X and Y

14
Formula for mutual information
15
Pointwise mutual information

Sometimes just one term from the above sum is
used as a measure of association
When this is used in the literature, it is most
often referred to as mutual information
To distinguish it from the information-theoretic
MI, others have called it specific mutual
information or pointwise mutual information

16
Relationship between MI and SI

Specific mutual information is only one component
of mutual information
It is the unweighed contribution of a particular
pair of values x and y of the corresponding
random variables X and Y
It assigns prominence to the occurrence of the
words (versus their non-occurrence at an
opportunity to do so)

17
SI and independence

If X and Y are perfectly independent
If X and Y are perfectly associated
As p(x) decreases, SI increases

18
SI and conditional probabilities

We can relate SI(x,y) to p(x) and p(xy) (or p(y)
and p(yx))

19
Properties of mutual information

For both MI and SI
The measure is symmetric in X and Y
The measure has a known value (0) if the words
are independent
There is no bound for dependent words
The actual value depends on both the marginal and
the conditional probabilities
With fixed p(xy), SI will increase by reducing
p(x)

20
Dice and independence

If x and y are perfectly independent
D(x,y) has no special value (except for being the
harmonic mean of p(x) and p(y))
If x and y are perfectly associated

21
Properties of Dice

The measure is symmetric in X and Y
It has a known maximum value (1), attained when X
and Y are perfectly correlated in the positive
direction
It has a known minimum value
0, attained when the variables are perfectly
correlated in the negative direction
It only depends on the conditional probabilities,
not the marginals

22
Reading