TagSense Marrying Folksonomy and Ontology - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

TagSense Marrying Folksonomy and Ontology

Description:

'sculture' should be 'sculpture' _at_pub-travel. Europe2005. Tag Normalization ... Sculture - sculpture. Dict.cn: more words and compound words ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 87
Provided by: zixi
Category:

less

Transcript and Presenter's Notes

Title: TagSense Marrying Folksonomy and Ontology


1
TagSense- Marrying Folksonomy and Ontology
By Zixin Wu
Advisor Amit P. Sheth Committee John A.
Miller Prashant Doshi
2
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

3
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

4
Folksonomy
Web page and photos from Flick.com
Web page from del.icio.us
5
Folksonomy Definitions
  • The behavior of massive tagging in social context
    and its product tags for Web resources. It is
    collaborative metadata extraction and annotation.
  • (from Thomas Vander Wal) Folksonomy is the
    result of personal free tagging of information
    and objects (anything with a URL) for one's own
    retrieval. The tagging is done in a social
    environment (usually shared and open to others).
    1
  • (from Tom Gruber) the emergent labeling of lots
    of things by people in a social context. 2

6
Features of Folksonomy
  • Makes metadata extraction from multimedia Web
    resources easier.
  • Extract information from the perspective of
    information consumer, e.g. put tags about the
    house in a photo but not the dog in it.
  • Popular tags prevail and tags for a Web resource
    converge over time.

7
The Long Tail
8
Power Law Distribution of Tags 3
9
Folksonomy Triad 4,5
  • The person tagging
  • The Web resource being tagged
  • The tag(s) being used on that Web resource
  • We can use two of the elements to find a third
    element.
  • e.g. find persons with similar interests by
    comparing the Web resources they tagged and the
    tags they used

10
Motivation Scenarios Ambiguous Words
Search for turkey
Search for apple
11
Disambiguation
  • What people usually do add more keywords for
    disambiguation
  • Trade off between precision and recall rates

12
Motivation Scenarios Background Knowledge
  • Task Find photos about cities in Europe
  • Solution1 Search city Europe
  • Solution2 try the name of cities in Europe one
    by one
  • Could be improved if the system knows
  • Which term/concept is a city
  • Which city is in Europe

13
Significant Drawbacks of Folksonomy
  • Keyword ambiguity
  • Lack of background knowledge

14
Ontology
  • Ontology is an important term in Knowledge
    Representation and the key enabler of the
    Semantic Web
  • A formal specification of a conceptualization 6
  • Ontologies state knowledge explicitly by using
    URIs and relationships, e.g. Paris
    is_located_in Europe
  • Current Specifications RDF(s) 7,8, OWL 9,
    etc.

15
Semantic Annotation
Figure from 10
16
Multiple Ontologies
  • One Ontology cannot be always comprehensive
    enough
  • Ontologies may be incompatible
  • If multiple ontologies are used, we need to
    select and rank ontologies for a query.

17
Objectives
  • Shorten the time and effort for information
    retrieval in folksonomy
  • improve recall rates by considering synonyms and
    enabling semantic search
  • improve result ranking by putting the most
    appropriate items on the top of query results

18
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

19
Approach Overview
  • Do not add any burden to our users they should
    be able to use only tags to describe and search
    Web resources
  • Do not expect our users have Semantic Web
    background
  • Utilize ontologies as background knowledge in
    information retrieval

20
Approach Overview
Folksonomy
Ontologies
21
Some Terms
  • Web resource anything with a URL
  • Label one or more keywords, e.g. air ticket
  • Tag a label tagged to a Web resource. Two
    different tags may have the same label
  • Sense Cluster (or cluster) where tags with
    similar meanings are put together. Ideally, a
    cluster corresponds to a meaning. But often
    times, a meaning is represented by multiple
    clusters together.
  • Semantic annotation to associate a cluster with
    ontological concepts

22
Approach Overview
Ontology 2
Ontology 1
(a dot is a tag, a circle in blue is a sense
cluster, a circle in yellow is an actual meaning)
23
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

24
Data Cleanup Dirty Tags
  • bird and birds
  • ebook and e-book, air-ticket, airticket,
    and air ticket
  • freephotos should be free,photo
  • travelagent should be travel agent
  • sculture should be sculpture
  • _at_pub-travel
  • Europe2005

25
Tag Normalization
  • Check 2 online dictionaries Webster.com and
    Dict.cn
  • Webster.com stemming and misspelling
  • Swimming -gt swim, dogs -gt dog
  • Sculture -gt sculpture
  • Dict.cn more words and compound words
  • ibm not in Webster, but in Dict.cn
  • open source
  • Try to split tags
  • freephotos -gt free and photo
  • Ignore pure numbers, such as 2005, 07_01_2005

26
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

27
Sense Indexing
Senses
Keywords
Access Permit
Ticket
For an offender
Fine
Good
28
Sense Indexing
  • The mappings between keywords and senses are nm
  • Index Web resources by senses instead of
    keywords. Put tags with similar meaning into the
    same cluster
  • Need to disambiguate each node when indexing

29
Differences from Word Sense Disambiguation 11-15
  • No sentence no sentence structure, no
    part-of-speech analysis.
  • The order of the labels in a Web resource are not
    necessary relevant.
  • Produced in a social context significant number
    of terms are not in lexicons. Terms change more
    frequently. That means we need to create senses
    for those terms.
  • Relatively less noise.

30
Why Clustering (1)
  • Since we will match the clusters to ontological
    concepts, why not annotate each tag?
  • Some terms are not in any ontology
  • By aggregating the contexts of the tags in the
    same cluster, we know which contexts are
    important, which are noise (especially for narrow
    folksonomy)

powerbook
powerbook
mac
mac
ajax
light
apple
apple
web
paint
design
long
31
Why Clustering (2)
  • We get more context for semantic annotation

Athens
Athens
?
University
Georgia
University
Greece
32
Synonym
  • Seems impossible to automatically detect synonym
    ONLY based on the context of tags
  • Reason that contexts are similar enough does not
    imply synonyms
  • Solution use WordNets 16 synsets as synonym
    lists

33
Polysemy
  • Cluster tags which have the same labels (or
    synonyms) into sense clusters based on the
    similarity of their contexts.

34
Context of Tags
  • Context of a tag T
  • Other tags co-occur with T in a Web resource
  • The co-occurrence frequencies
  • e.g. User1 turkey,istanbul,mosque User2
    turkey,istanbul, tour
  • In narrow folksonomy, all co-occurrence
    frequencies are 1

turkey
1
1
tour
mosque
2
1
1
istanbul
35
Relatedness of Tags
  • Basic idea TF-IDF

turkey
turkey
1
1
tour
mosque
tour
mosque
2
2/4
2/4
1/2
1
1
1/4
istanbul
istanbul
TF
Co-occurrence
And then times IDF
36
Context of a Cluster
  • Other clusters whose tags connect (co-occur) to
    the tags in this cluster
  • The co-occurrence frequency of two clusters is
    the aggregation of the co-occurrence frequencies
    of the tags in the clusters

2
5
3
37
Relatedness of Clusters
  • The same calculation as relatedness of nodes

38
Important Context of a Cluster
Relatedness
Important Context Level 1
Context
Important Context Level 3
Important Context Level 2
39
Motivation for Building Senses
  • In order to search photos about turkey bird, some
    people use bird besides turkey, some use
    animal, some use food, wild, etc.
  • Can we include all these tags, and then use them
    to build a sense?
  • The clue to recognize these tags is that they
    co-occur with each others more often than with
    other tags (which are also the context of
    turkey)

40
Tag Disambiguation Process
  • Put all tags with the same label (or synonyms)
    into one cluster.
  • Do the following 3 phases to build senses.

41
Tag Disambiguation Phase 1
  • Identify Important Context Level 1
  • Create a undirected weighted graph called Context
    Graph
  • Each node in the graph is a cluster in the
    Important Context Level 1
  • The weight of an edge is the relatedness of the
    two clusters. (relatedness is asymmetric, we take
    the larger one).
  • Apply a threshold to the edges of the Context
    Graph, so that the graph becomes one or more
    disconnected component.
  • Create a sense corresponding to each component,
    and use the clusters in a component as the
    context of the corresponding sense.

42
Tag Disambiguation Phase 1
We are disambiguating turkey, so the cluster
turkey is hidden for better illustration.
43
Tag Disambiguation Phase 2
  • The purpose of this phase is to find missing
    senses in Phase 1, which are not used often in
    the dataset
  • Identify Important Context Level 2
  • For each cluster in Important Context Level 2,
    find the most related sense built in Phase 1 (and
    also above a threshold).
  • If there is such a sense, merge the cluster being
    considered to that senses context.
  • Otherwise, build a new sense and use the cluster
    as the context.

44
Tag Disambiguation Phase 2
The red clusters are newly discovered in Phase 2
45
Tag Disambiguation Phase 3
  • Identify Important Context Level 3
  • Similar to Phase 2, but do not create any new
    sense just enrich the context of the senses
    built in Phase 1 and Phase 2.

46
Tag Disambiguation Process - continue
  • Compare each tag we are considering with the
    senses. Select the best matched sense and assign
    the tag to it.
  • Do step 2 and step 3 again when the number of the
    tags we are considering is increased to a certain
    percentage.

47
Tag Disambiguation Process
x
y
turkey
istanbul
turkish
MatchScorexy
48
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

49
Utilizing Ontologies
  • Match each cluster to ontological concepts where
    appropriate
  • But there is no named relationships between tags
  • That means we cannot compare by the names of
    relationships
  • We will need relatedness of ontological concepts
  • We will also need similarity of ontological
    concepts in semantic search

50
Relatedness of Ontological Concepts
  • Basic idea TF-IDF
  • 0 for any pair of concepts without relationship.
  • TF-IDF(c1,c2)TF(c1,c2)IDF(c1)

51
Relatedness of Ontological Concepts
  • TF (c1 to c2)
  • Issue query c1 c2 to Yahoo! Search Engine
  • Get the hit count h
  • Issue queries for each concept cx connected to
    c2cx c2
  • Get the hit counts hx
  • TF(c1,c2)h/?hx
  • IDF(c)
  • Issue query c to Yahoo! Search Engine
  • Get the hit count h
  • Yahoo! current index size 20 billion pages
  • IDF(C)-log(h/20 billion)

52
Similarity of Ontological Concepts
  • First, only consider the taxonomy in the ontology
  • Information Content 17 IC(c) -log(prob(c))
  • Sim(c1, c2)2IC(ancestor)/(IC(c1)IC(c2)) 18

Information Content
Hit Counts
Sum
Probability
1040 M
1174 M
0.0587
2.23
Car
Car
Car
Car
Sedan
Coupe
Sedan
Coupe
Sedan
Coupe
Sedan
Coupe
58 M
76 M
58 M
76 M
0.0029
0.0038
2.54
2.42
Sim (Sedan, Coupe) 22.23/(2.542.42) 0.899
53
Similarity of Ontological Concepts
  • Also consider other types of relationships by
    using Jaccard (COSINE) Similarity coefficient

Athens
Atlanta
Georgia
Is_located_in
Is_located_in
54
Matching Clusters to Ontologies
  • Compare the important context of a cluster with
    the context (concepts) of an ontological concept
  • Sum up the relatedness of matched context
    clusters
  • Select the best ontological concept which gets
    the best matching score and also above a threshold

55
Matching Clusters to Ontologies
  • A context cluster x is considered matched to a
    context concept y if
  • They have the same label (or synonym), or
  • If x is matched to y and the relatedness (or
    similarity) of y to y is above a threshold, or
  • If the relatedness of x (which is matched to y)
    to x is above a threshold

56
Matching Clusters to Ontologies - example
turkey
bird
turkey
bird
turkey
bird
Rel(bird,animal)gtthreshold Sim(bird,animal)gtthresh
old
Semantic annotation
animal
Semantic annotation
bird
Rel(bird,animal)gtthreshold
bird
turkey
animal
turkey
animal
turkey
Case 2
Case 3
Case 1
57
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

58
Semantic Search 19,20
  • Search by the ontological relationships
  • Currently only consider subclass and type
    relationships
  • Map to the corresponding clusters by semantic
    annotations
  • Expand the corresponding clusters by including
    other clusters with the same label, because some
    clusters may not have semantic annotation but
    they should have.

59
Semantic Search
Geography Domain Ontology
Politics Domain Ontology
Seoul
Ottawa
Seoul
Ottawa
Seoul
Madrid
Ottawa
Madrid
Madrid
Madrid
60
Most-Desired Senses Ranking
  • We need to rank the candidate clusters
  • The system show one photo for each candidate
    cluster
  • The user select the best photo from the samples
  • The system ranks other clusters based on the
    selection

61
Most-Desired Senses Ranking
  • The basic idea is finding shortest paths in a
    graph from a single source
  • Put a constant energy on the source cluster, and
    distribute the energy to other clusters
  • The weight of an edge is the similarity of the
    clusters

62
Geography Domain Ontology
Politics Domain Ontology
0.32185575
0.24315993
Seoul
Ottawa
Seoul
Ottawa
Seoul
Madrid
0.46428478
Ottawa
0.3705629
0.08457597
1
0.09375922
Madrid
Madrid
0.19367748
0.21556036
Madrid
0.05285901
63
Clusters Similarity
  • If the semantic annotations of two clusters refer
    to the same ontology, use the similarity of
    corresponding ontological concepts.
  • Otherwise, calculate cluster similarity by the
    context of the two clusters.

64
Cluster Similarity by Context
  • A modified version of Dice similarity
  • Lets say we are comparing cluster1 and cluster2
  • Compare only the important context of cluster1
    and cluster2
  • Calculate the percentage of overlapped context
  • Decide if context cluster c1 of cluster1 and
    context cluster c2 of cluster2 is matched by the
    way in matching clusters to ontologies

65
Ontology Ranking 21,23
  • Ontologies come from a repository
  • If multiple ontology is used for a query, we need
    to give a weight to each ontology
  • The ontology with higher weight has higher
    power to decide the similarity/relatedness of
    two ontological concepts
  • Rank ontologies by using the 4 most recent
    queries of the same user

66
Ontology Ranking
  • Centrality Measure 22

Thing
D(c)
H(c)
C
67
Ontology Ranking
  • Density Measure 22

68
Ontology Ranking
69
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

70
System overview
Tag Cleanup Module
Query History
Ontology Ranking Module
Ontology Measures
Ontology Ranks
Semantic Query Module
Ontologies
Ontology Measuring Module
Queries
Search Engine
Query Result
Sense Index
Ontology Mapping
Photos with Tags
Sense Indexing Module
Ontology Mapping Module
71
Evaluation Measures
  • Compare with Google Desktop on the same datasets
  • How much time a user has to spend in order to
    find the required photos.
  • How many clicks of the mouse a user has to do in
    order to find the required photos.
  • How many different queries a user has to issue in
    order to find the required photos. The user may
    change the query at any time if he feels
    necessary.

72
Evaluations (1)
  • Experiment set 1 for disambiguation
  • DataSets
  • 500 photos with a tag apple
  • 500 photos with a tag turkey

73
User Case 1
  • Task1 find 50 photos about Apple electronic
    products



74
User Case 2
  • Task2 find 30 photos about the fruit apple

75
User Case 3
  • Task3 50 photos about the country Turkey

76
User Case 4
  • Task4 find 10 photos about turkey birds

77
Evaluations (2)
  • Experiment set 2 for semantic search
  • DataSets
  • About 300 photos for each of the following tag
    Beijing, Madrid, Ottawa, Rome, Seoul, Tokyo,
    Baltimore, New York, Pittsburgh, Washington D.C.,
    Amsterdam, Florence, Venice, Athens Greece,
    Athens Georgia
  • Ontologies
  • An ontology in travel domain (partially from
    Realtravel.com)
  • Modified AKTiveSA 24 project ontology in
    geography domain
  • An ontology in politics domain (partially from
    SWETO25)

78
Use Case 5
  • Task 5 find up to 5 photos for 5 cities in Europe

79
Evaluation
  • Most-Desire Senses Ranking approach may involve
    time overhead in selecting the most wanted photo
    sense
  • Changing query involves time overhead in thinking
    and typing
  • Overall, users spent significantly less time and
    effort in finding the information they want

80
Outline
  • Background and Motivation
  • Approach Overview
  • Tag Normalization
  • Sense Indexing
  • Utilizing ontologies
  • Semantic Search and Ranking
  • Implementation and Evaluations
  • Conclusions
  • Demo

81
Conclusions
  • We proposed an approach to combine folksonomies
    and ontologies
  • Index Web resources by senses into sense clusters
  • Match sense clusters to ontological concepts
  • Semantic search based on ontological
    relationships
  • Most-Desired Sense Ranking approach
  • Multiple ontologies ranking
  • Evaluation users spent significant less time and
    effort in finding the information they want

82
Demo
83
Questions and comments
84
References (1)
  • 1 Wal, T.V. Folksonomy Coinage and Definition.
    2004 cited Available from http//vanderwal.net
    /folksonomy.html.
  • 2 Gruber, T., Ontology of Folksonomy A Mash-up
    of Apples and Oranges. International Journal on
    Semantic Web and Information Systems, 2007. 3(1).
  • 3 Halpin, H., V. Robu, and H. Shepherd. The
    Complex Dynamics of Collaborative Tagging. in WWW
    '07 Proceedings of the 16th international
    conference on World Wide Web. 2007 ACM.
  • 4 Wal, T.V. Folksonomy Definition and
    Wikipedia. 2005 cited Available from
    http//www.vanderwal.net/random/entrysel.php?blog
    1750.
  • 5 Mika, P., Ontologies are us A unified model
    of social networks and semantics. Journal of Web
    Semantics, 2007. 5(1) p. 5-15.
  • 6 Gruber, T.R., A Translation Approach to
    Portable Ontology Specifications. Knowledge
    Acquisition, 1993. 5(2) p. 199-220.
  • 7 Resource Description Framework (RDF).
    cited Available from http//www.w3.org/RDF/.
  • 8 RDF Vocabulary Description Language 1.0 RDF
    Schema. 2004 cited Available from
    http//www.w3.org/TR/rdf-schema/.

85
References (2)
  • 9 McGuinness, D.L. and F.v. Harmelen. OWL Web
    Ontology Language. 2004 cited Available from
    http//www.w3.org/TR/owl-features/.
  • 10 Kiryakov, A., et al., Semantic annotation,
    indexing, and retrieval. Web Semantics Science,
    Services and Agents on the World Wide Web, 2004.
    2(1) p. 49-79.
  • 11 Ide, N. and J. VĂ©ronis, Word sense
    disambiguation The state of the art.
    Computational Linguistics, 1998. 1(24) p. 1-40.
  • 12 Wilks, Y. and M. Stevenson. Sense Tagging
    Semantic Tagging with a Lexicon. in the SIGLEX
    Workshop Tagging Text with Lexical Semantics
    What, why and how? 1997. Washington, D.C.
  • 13 Diab, M. and P. Resnik. An Unsupervised
    method for Word Sense Tagging using Parallel
    Corpara. in the 40th Annual Meeting of the
    Association for Computational Linguistics. 2002.
    Philadelphia, Pennsylvania.
  • 14 Molina, A., et al. Word Sense Disambiguation
    using Statistical Models and WordNet. in 3rd
    International Conference on Language Resources
    and Evaluation. 2002. Las Palmas de Gran Canaria,
    Spain.
  • 15 Banerjee, S. and B.P. Mullick, Word Sense
    Disambiguation and WordNet Technology. Literary
    and Linguistic Computing, 2007. 22(1) p. 1-15.
  • 16 Fellbaum, C., WordNet An Electronic Lexical
    Database. 1998 The MIT Press.

86
References (3)
  • 17 Resnik, P., Semantic Similarity in a
    Taxonomy An Information-Based Measure and its
    Application to Problems of Ambiguity in Natural
    Language. Journal of Artificial Intelligence
    Research, 1999. 11 p. 95-130.
  • 18 Lin, D., An Information-Theoretic Definition
    of Similarity, in International Conference on
    Machine Learning (ICML). 1998 Madison,
    Wisconsin, USA.
  • 19 Sheth, A., et al., Managing Semantic Content
    for the Web. IEEE Internet Computing, 2002. 6(4)
    p. 80-87.
  • 20 Guha, R., R. McCool, and E. Miller. Semantic
    search. in the 12th international conference on
    World Wide Web. 2003.
  • 21 Arumugam, M., A. Sheth, and I.B. Arpinar,
    Towards Peer-to-Peer Semantic Web A Distributed
    Environment for Sharing Semantic Knowledge on the
    Web, in International Workshop on Real World RDF
    and Semantic Web Applications. 2002 Hawaii, USA.
  • 22 Alani, H. and C. Brewster. Ontology ranking
    based on the analysis of concept structures. in
    the 3rd international conference on Knowledge
    capture. 2005.
  • 23 Zhang, Y., W. Vasconcelos, and D. Sleeman.
    OntoSearch An Ontology Search Engine. in The
    Twenty-fourth SGAI International Conference on
    Innovative Techniques and Applications of
    Artificial Intelligence. 2004. Cambridge, UK.
  • 24 AKTiveSA. cited Available from
    http//sa.aktivespace.org/.
  • 25 Aleman-Meza, B., et al. SWETO Large-Scale
    Semantic Web Test-bed. in 16th Int'l Conf.
    Software Eng. Knowledge Eng., Workshop on
    Ontology in Action, Knowledge Systems Inst. 2004.
Write a Comment
User Comments (0)
About PowerShow.com