CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick - PowerPoint PPT Presentation

About This Presentation
Title:

CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick

Description:

CS626/449 : Speech, NLP and the Web/Topics in AI Programming ... dwelling, home, domicile, abode, habitation, dwelling house -- (housing that ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 24
Provided by: avishe
Category:

less

Transcript and Presenter's Notes

Title: CS626/449 : Speech, NLP and the Web/Topics in AI Programming (Lecture 9: Resnick


1
CS626/449 Speech, NLP and the Web/Topics in AI
Programming(Lecture 9 Resnicks measures of
word Similarity coverage of Jiang and Conrath,
1997)
  • Pushpak BhattacharyyaCSE Dept., IIT Bombay

2
Path length based similarity between house and
lock
  • House belongs-to 12 senses

Sense-1 House
study
wall
Has-part
Has-part
Has-part
door
doorway
lock
Has-part
Has-part
3
Properties that a Path Length based measure
should satisfy
  • Zero property
  • self distance is 0 (d(A,A)0)
  • Symmetric property
  • d(A,B)d(B,A)
  • Positive property
  • d is always non-negative, and
  • Triangular inequality
  • d(A,C) lt d(A,B)d(B,C).

4
Motivating Resnicks measure through hypernymy
(is-a) hierarchy
  • Sense 1
  • lock -- (a fastener fitted to a door or drawer to
    keep it firmly closed)
  • gt fastener, fastening, holdfast, fixing
    -- (restraint that attaches to something or holds
    something in place)
  • gt restraint, constraint -- (a device
    that retards something's motion "the car did not
    have proper restraints fitted")
  • gt device -- (an instrumentality
    invented for a particular purpose "the device is
    small enough to wear on your wrist" "a device
    intended to conserve water")
  • gt instrumentality,
    instrumentation -- (an artifact (or system of
    artifacts) that is instrumental in accomplishing
    some end)
  • gt artifact, artefact --
    (a man-made object taken as a whole)
  • gt whole, unit -- (an
    assemblage of parts that is regarded as a single
    entity "how big is that part compared to the
    whole?" "the team is a unit")
  • gt object,
    physical object -- (a tangible and visible
    entity an entity that can cast a shadow "it was
    full of rackets, balls and other objects")
  • gt physical
    entity -- (an entity that has physical existence)
  • gt entity
    -- (that which is perceived or known or inferred
    to have its own distinct existence (living or
    nonliving))

5
House sense 1
  • house -- (a dwelling that serves as living
    quarters for one or more families "he has a
    house on Cape Cod" "she felt she had to get out
    of the house")
  • gt dwelling, home, domicile, abode,
    habitation, dwelling house -- (housing that
    someone is living in "he built a modest dwelling
    near the pond" "they raise money to provide
    homes for the homeless")
  • gt housing, lodging, living
    accommodations -- (structures collectively in
    which people are housed)
  • gt structure, construction -- (a
    thing constructed a complex entity constructed
    of many parts "the structure consisted of a
    series of arches" "she wore her hair in an
    amazing construction of whirls and ribbons")
  • gt artifact, artefact -- (a
    man-made object taken as a whole)
  • gt whole, unit -- (an
    assemblage of parts that is regarded as a single
    entity "how big is that part compared to the
    whole?" "the team is a unit")
  • gt object, physical
    object -- (a tangible and visible entity an
    entity that can cast a shadow "it was full of
    rackets, balls and other objects")
  • gt physical entity
    -- (an entity that has physical existence)
  • gt entity --
    (that which is perceived or known or inferred to
    have its own distinct existence (living or
    nonliving))

Overlap
6
House sense 2
  • Sense 2
  • house -- (an official assembly having legislative
    powers "a bicameral legislature has two houses")
  • gt legislature, legislative assembly,
    legislative, general assembly, law-makers --
    (persons who make or amend or repeal laws)
  • gt assembly -- (a group of persons
    gathered together for a common purpose)
  • gt gathering, assemblage -- (a
    group of persons together in one place)
  • gt social group -- (people
    sharing some social relation)
  • gt group, grouping -- (any
    number of entities (members) considered as a
    unit)
  • gt abstraction -- (a
    general concept formed by extracting common
    features from specific examples)
  • gt abstract entity
    -- (an entity that exists only abstractly)
  • gt entity --
    (that which is perceived or known or inferred to
    have its own distinct existence (living or
    nonliving))

7
House sense 11
  • Sense 11
  • sign of the zodiac, star sign, sign, mansion,
    house, planetary house -- ((astrology) one of 12
    equal areas into which the zodiac is divided)
  • gt region, part -- (the extended spatial
    location of something "the farming regions of
    France" "religions in all parts of the world"
    "regions of outer space")
  • gt location -- (a point or extent in
    space)
  • gt object, physical object -- (a
    tangible and visible entity an entity that can
    cast a shadow "it was full of rackets, balls and
    other objects")
  • gt physical entity -- (an
    entity that has physical existence)
  • gt entity -- (that which
    is perceived or known or inferred to have its own
    distinct existence (living or nonliving))

Overlap
8
Measures of Semantic Relatedness Resnick
  • The Resnik Measure
  • Information content based relatedness measure
  • Higher information content specific to particular
    topics, lower ones specific to more general
    topics
  • Carving fork HIGH IC, entity LOW IC
  • The Idea is that two concepts are semantically
    related proportional to the amount of information
    shared

9
Sense marked corpora semcor
  • lts snum3gt
  • ltwf cmdignore posPRPgtHelt/wfgt
  • ltwf cmddone posVB lemmasucceed wnsn2
    lexsn24101gtsucceedslt/wfgt
  • ltwf cmddone rdfperson posNNP lemmaperson
    wnsn1 lexsn10300 pnpersongtBuck_Shawlt/wfgt
  • ltpuncgt,lt/puncgt
  • ltwf cmdignore posWPgtwholt/wfgt
  • ltwf cmddone posVB lemmaretire wnsn1
    lexsn24101gtretiredlt/wfgt
  • ltwf cmdignore posINgtatlt/wfgt
  • ltwf cmdignore posDTgtthelt/wfgt
  • ltwf cmddone posNN lemmaend wnsn2
    lexsn12800gtendlt/wfgt
  • ltwf cmdignore posINgtoflt/wfgt
  • ltwf cmddone posJJ lemmalast wnsn1
    lexsn50000past00gtlastlt/wfgt
  • ltwf cmddone posNN lemmaseason wnsn1
    lexsn12802gtseasonlt/wfgt
  • ltpuncgt.lt/puncgt
  • lt/sgt

10
Measures of Semantic Relatedness
  • Considers position of nouns in is-a hierarchy
  • SR is determined by information content of lowest
    common concept which subsumes both concept
  • For example Nickel and Dime subsumed by Coin,
    Nickel and Credit card by Medium of Exchange
  • P(c) is probability of encountering concept c.
  • If a is-a b, then p(a) lt p(b)
  • Information content calculated by formula-
  • IC (concept) log (P (concept))

11
Measures of Semantic Relatedness
  • Thus relatedness is given by-
  • Simres (c1, c2) IC (LCS (c1, c2))
  • Does not consider information content of the
    concepts themselves nor path length
  • Problems faced is that many concepts might have
    the same subsumer thus having same score
  • May get high measures on the basis of some
    inappropriate word senses. E.g tobacco and horse
  • Newer methods such as Jiang-Conrath, Lin and
    Leacock-Chodorow measures

12
In case of multiple senses
where sen(w) denotes the set of possible senses
for word w.
13
Relevant formulae
Classes(W) is no. of senses the word has
Words(c) is the set of words subsumed (directly
or indirectly) by the class c
14
Example of Resnick Similarity in action
15
Structural Characteristics of a hierarchical n/w
  • Local network density (the number of child links
    that span out from a parent node)
  • In the plant/flora section of WordNet, the
    hierarchy is very dense
  • Depth of a node in the hierarchy
  • distance shrinks as one descends the hierarchy,
    since differentiation is based on finer and finer
    details
  • Type of link
  • The strength of an edge link corpus statistics
    has to play role theoretical soundness and
    computational efficiency are needed

16
Link Strength Probability and IC theoretic
  • The strength of a child link is proportional to
    the conditional probability of encountering an
    instance of the child concept ci given an
    instance of its parent concept p
  • P(ci p)

17
Link strength
Intuition
Formulation
Actual formula
18
What does all this buy us?
19
Correlations
20
Page Rank
  • Developed by Larry Page and Sergei Brinn
  • Link analysis algorithm assigns numerical
    weighting to hyperlinked set of documents
  • Measures relative importance of page in a set
  • Link to a page is a vote of support which
    increases the rank of that particular page
  • It is a probability distribution representing the
    likelihood of a person randomly clicking
    ultimately ending up on a specific page

21
Pagerank based Algorithm
  • Assume universe has 4 pages A, B, C and D
  • Initial values of all the pages is 0.25
  • Now suppose B, C and D link only to A
  • Rank of A given by-
  • If B links to other pages also then rank of A-
  • L(B) is the number of outbound links from B

22
Pagerank based Algorithm (contd.)
  • Page rank of U depends on rank of page V linking
    to U divided by number of links from V
  • Page Rank can be given by general formula-
  • Formula applicable for pages which link to U
  • Thus we can see that the page ranks of all pages
    in corpus will be equal to 1

23
Pagerank based Algorithm (contd.)
  • Damping Factor Imaginary surfer will stop
    clicking at links after some time.
  • d is probability that user will continue clicking
  • Damping factor is estimated at 0.85 here
  • The new page rank formula using this is-
  • Now to get actual rank of a page we will have to
    iterate this formula many times
  • Problem of Dangling Links
Write a Comment
User Comments (0)
About PowerShow.com