Word Association - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Word Association

Description:

Likelihood ratios can be very large/small depending on the evidence ... It is the unweighed contribution of a particular pair of values x and y of the ... – PowerPoint PPT presentation

Number of Views:1833
Avg rating:3.0/5.0
Slides: 23
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Word Association


1
Word Association
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
Why use log likelihood?
  • Likelihood ratios can be very large/small
    depending on the evidence
  • Computational precision factors
  • -2log? is asymptotically ?2-distributed when H1
    is a subspace and H2 the entire space
  • Convergence is faster than with Pearsons
    chi-square statistic
  • Thus likelihood ratio tests work with more
    extreme distributions

3
Chi-square vs. log-likelihood
  • Chi-square assumptions often violated for rare
    words
  • Experimental study 32,000 words of Swiss
    financial text
  • Bigrams scored by both ?2 and log-likelihood
  • Huge values of ?2 for rare words
  • 2,682 of 2,693 bigrams violate the approximation
    assumptions

4
Fishers exact test
  • Based on expanding the possible configurations of
    a 22 contingency table
  • If the cell totals are a, b, c, and d, their
    probability under independence is

5
Hypergeometric distribution
  • This is the distribution applicable to
    contingency tables
  • Similar to the multinomial but
  • without replacement
  • Fishers P can be calculated exactly
  • But this expensive for large numbers
  • Fortunately, this is when other tests (e.g.,
    chi-square) are suitable

6
Word association
  • A central concept in computational linguistics,
    and more specifically in lexical semantics
  • Syntagmatic association
  • Two words occur together (e.g., strong tea)
  • Paradigmatic association
  • Two words occur in similar contexts, i.e., with
    the same other words (e.g., doctor nurse)
  • can be based on syntagmatic association with
    those other words

7
Measures of association
  • We examine measures of syntagmatic association
  • These involve the probabilities (marginal and
    joint) of just the two words involved

8
Desirable properties
  • An interpretable measure
  • defined minimum and maximum
  • Measure is not particularly sensitive to rare
    events
  • Robustness in estimation
  • Relationship to a probabilistic model

9
The Dice measure
  • Defined as
  • Only the probabilities of x and y occurring
    appear in the formula explicitly

10
Dice and conditional probabilities
  • It is the harmonic mean of p(xy) and p(yx)

11
Mutual information
  • Recall that joint entropy relates to entropy and
    conditional entropy
  • H(X, Y) H(X) H(YX) H(Y) H(X Y)
  • The difference H(X) H(XY) is called the mutual
    information I(XY) between X and Y
  • It is a symmetric measure since

12
Mutual information and independence
  • If X and Y are independent, then
  • H(X)H(XY) so I(XY)H(X)-H(XY)0
  • It can be shown that I(XY)0 only if X and Y are
    independent
  • Thus, mutual information can be thought of as a
    measure of independence

13
Mutual information and dependence
  • If X and Y are perfectly dependent, then
  • H(XY) H(XX) 0, so I(XY) H(X)-H(XY)
    H(X) 0 H(X)
  • Thus mutual informations maximum value depends
    on the entropy of X and Y

14
Formula for mutual information
15
Pointwise mutual information
  • Sometimes just one term from the above sum is
    used as a measure of association
  • When this is used in the literature, it is most
    often referred to as mutual information
  • To distinguish it from the information-theoretic
    MI, others have called it specific mutual
    information or pointwise mutual information

16
Relationship between MI and SI
  • Specific mutual information is only one component
    of mutual information
  • It is the unweighed contribution of a particular
    pair of values x and y of the corresponding
    random variables X and Y
  • It assigns prominence to the occurrence of the
    words (versus their non-occurrence at an
    opportunity to do so)

17
SI and independence
  • If X and Y are perfectly independent
  • If X and Y are perfectly associated
  • As p(x) decreases, SI increases

18
SI and conditional probabilities
  • We can relate SI(x,y) to p(x) and p(xy) (or p(y)
    and p(yx))

19
Properties of mutual information
  • For both MI and SI
  • The measure is symmetric in X and Y
  • The measure has a known value (0) if the words
    are independent
  • There is no bound for dependent words
  • The actual value depends on both the marginal and
    the conditional probabilities
  • With fixed p(xy), SI will increase by reducing
    p(x)

20
Dice and independence
  • If x and y are perfectly independent
  • D(x,y) has no special value (except for being the
    harmonic mean of p(x) and p(y))
  • If x and y are perfectly associated

21
Properties of Dice
  • The measure is symmetric in X and Y
  • It has a known maximum value (1), attained when X
    and Y are perfectly correlated in the positive
    direction
  • It has a known minimum value
  • 0, attained when the variables are perfectly
    correlated in the negative direction
  • It only depends on the conditional probabilities,
    not the marginals

22
Reading
  • Section 2.2.3 on mutual information in
    information theory
  • Section 5.4 on pointwise mutual information
  • Introduction to Section 8.5 on semantic
    similarity
  • Section 8.5.1 (up to top of page 300) on measures
    of similarity
Write a Comment
User Comments (0)
About PowerShow.com