On the Enrichment of a RDF Repository of City Points of Interest based on Social Data - PowerPoint PPT Presentation

Loading...

PPT – On the Enrichment of a RDF Repository of City Points of Interest based on Social Data PowerPoint presentation | free to download - id: 827fe7-ZDYxO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

On the Enrichment of a RDF Repository of City Points of Interest based on Social Data

Description:

Title: Gestion dynamique d ontologies partir de textes par syst mes multi-agents adaptatifs Author: dell Last modified by: dell Created Date – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 21
Provided by: dell7333
Learn more at: http://www-etis.ensea.fr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: On the Enrichment of a RDF Repository of City Points of Interest based on Social Data


1
On the Enrichment of a RDF Repository of City
Points of Interest based on Social Data
  • Zied Sellami, Gianluca Quercini, Chantal
    Reynaud
  • IASI Team, Université Paris-Sud 11, France
  • sellami, reynaud_at_lri.fr
  • E3S Team, Supélec, France
  • gianluca.quercini_at_supelec.fr

WOD2013 - Paris - 03th June, 2013
2
  • Outline
  • Introduction and Related Issues
  • Reconciliation of POI Data and Social Data
  • Enrichment based on Opinion Mining
  • Experiments and Results
  • Conclusion and Future Work

3
  • Introduction and Related Issues
  • Points of interest (POI) geographic locations
  • Restaurants, museums, hotels, theatres,
    landmarks, etc.
  • Formalized as a RDF repository in the context of
    the DataBridges project (Quercini et al., 2012)
  • A POI is described by facets (or attributes)
    name, type, category, address, longitude and
    latitude.
  • Example of POI Louvre Museum

4
  • Introduction and Related Issues
  • POIs are automatically obtained by data
    extracting from Google Fusion Tables (GFT)
    (Quercini et al., 2012)
  • Some extracted POIs contains few attributes
  • Some extracted POIs do not contains a precise
    attributes (not complete address, not precise
    geographic location)
  • Lack of valuable indications in the extracted
    POIs (users reviews, official Web Site, e-mail,
    etc.)
  • Enrich and Correct POIs
  • Additional elements Phone number, e-mail,
    official web site
  • Useful indications to potential visitors (good
    and bad aspects of the place)
  • Enrich POI using what?
  • Using Social Networking Systems (Social Data)

5
  • Matching POIs Across Social Networks
  • Accessing and searching social Web Pages
    concerning POI
  • Yelp (http//www.yelp.com/)
  • Social networking site for retrieving and
    reviewing POI
  • Foursquare (https//foursquare.com/)
  • Application combining geolocalisation and social
    guidance
  • Similar searching method
  • Input name and geographic position
  • Output list of Web Pages of POI related to the
    geographic position and words included in the
    query
  • Filtering the list to select only pertinent Web
    pages

6
  • Matching POIs Across Social Networks
  • Selecting the appropriate Web Pages for a POI
  • Computing a similarity value
  • Several parameters can be used
  • Name
  • Address
  • Category
  • Longitude and Latitude
  • Definition of a similarity formula

7
  • Matching POIs Across Social Networks Similarity
    Measure
  • 2 parameters used name longitude and latitude
  • Different social data ? different manner to
    describe category
  • Eiffel Tower (Monument, Garden, etc Landmarks,
    Historical Building)
  • Uncontrolled social data ? string address
    incomplete or wrong
  • O Pelicano (Portugal) Restaurante O Requinte
    (Portugal) etc.
  • String techniques for name pruning and name
    comparison
  • Stemming with porter algorithm stop words lists
  • Levensthein distance and Jaccard distance
  • Filtering results using distance proximity
  • Processing geographic distance between POI and
    Web Page by using longitude and latitude

8
  • Matching POIs Across Social Networks Similarity
    Measure
  • Similarity measure
  • WP(x).name name of an entity x in a Web Page
  • p.name name of a POI
  • Combination of Levenshtein and Jaccard
  • Boosts the similarity score between names that
    employ words even in a different order
  • Example Museum of Louvre Louvre Museum

9
  • Matching POIs Across Social Networks Filtering
    Measure
  • Filtering measure
  • d1 and d2 similarity name thresholds
  • distmax distance thresholds
  • Thresholds values fixed after some experiments
  • d1 0.9 and d2 0.7
  • distmax 1000 meters

10
  • Opinion Mining
  • Evaluation of the POI from reviews and comments
  • Notation Good, Very Good, Bad, Very Bad, etc.
  • Useful information for a potential visitor
  • What is interesting? (food, ambiance, place,
    etc.)
  • What is to be avoided? (drink, person, etc.)
  • Go further than a conventional sentiment analysis
  • Tweets classification (positive, negative or
    undetermined) (Pak and Paroubek, 2010)
  • http//smm.streamcrab.com/
  • http//www.sentiment140.com/
  • Linguistic approach for opinion mining

11
  • Opinion Mining Principle
  • Identification of positive and negative
    expressions
  • Using Verbs and adjectives (Chesley et al., 2006)
    (Moghaddam and Popowich, 2010) (Li et al., 2012)
  • Example Great food, not good place, I like the
    place, etc.
  • Generating a lexicon of positive and negative
    verbs and adjectives
  • Processing with TreeTagger a lexicon of positive
    words and negative words
  • http//www.cs.uic.edu/liub/FBS/sentiment-analysis
    .html
  • Positive adjectives (1467 adj) / Negative
    adjectives (1609 adj)
  • Positive verbs (421 verb) / Negative Verbs (1243
    verb)

12
  • Opinion Mining Phrase Extraction
  • Definition of lexico-syntactic patterns to
    identify pertinent expressions
  • Expressions describing objects
  • (NOT) ADJ OBJECT (Great food, not interesting
    place, etc.)
  • OBJECT BE ADJ (Sandwich is good, restaurant is
    nice, etc.)
  • Expressions describing sentiments or advice
  • ITS ADJ (its interesting, its happy, etc.)
  • I FEEL OR SUGGEST OBJECT ( I like this place, I
    advice you to test the hotel, etc.)
  • I FEEL (NOT) ADJ (I feel happy, I feel very
    hungry, etc.)
  • Implementation with Java Regex

13
  • Repository Enrichment Notation of a POI
  • Notation measure
  • Scale for giving appreciation to POI

Very bad Bad Medium Indetermined Fairly Good Very Good
-10 -6,6 -3,3 0 3,3 6,6 10
14
  • Repository Enrichment Identifying Useful
    Information
  • General assessment
  • Expressions describing sentiments
  • Expressions describing objects concerning the
    place the name of a POI or one of the POI
    category
  • Tips
  • Expressions describing advices
  • Specific ideas
  • Expressions describing objects other than place
    name or category of the POI

15
  • Evaluation of the Similarity Measure
  • Dataset 600 POI compared with foursquare data
  • Comparing our formula with Levenshtein and
    Jaccard
  • The combination of Levenshtein and Jaccard
    improves the similarity precision
  • Our formula and Levenshtein have a same F-measure
  • Precision parameter is more important

Formula Precision Recall F-measure
Name Levenshtein 0.84 0.68 0.75
Name Jaccard 0.85 0.66 0.74
Our formula 0.86 0.66 0.75
16
  • Evaluation of the Opinion Mining Approach
  • 40 Yelp reviews of Louvre Museum and Eiffel Tower
  • Louvre Museum notation Very Good (7.23)
  • Eiffel Tower notation Very Good (8.58)

Louvre Museum Eiffel Tower
General assessment Positive magnificent place, beautiful place, good museum, prestigious museum Negative crowded place, hard museum, uncomfortable museum Tips go basement, visit basement, not use pyramid entrance Specific ideas Positive contemporary art, contemporary sculpture, original decor, real mummy Negative sketchy people, strange marble sculpture, massive crowd, grumpy folk General assessment Positive great place, funny place, beautiful monument Negative Tips Go top Specific ideas Positive good view, panoramic view, light show Negative slow elevator, crazy line, illegal Eiffel tower souvenir
17
  • Evaluation of the Opinion Mining Approach
  • Comparison with sentiment140 (statistical
    approach)
  • Analysis of 20 tweets concerning Louvre Museum
    and 14 tweets concerning Eiffel Tower

Polarity of Louvre Museum tweet sentiment140 Our approach
Positive 13 10
Negative 2 0
Undetermined 5 10
  • Not contradictory results
  • Our approach identified 3 sentiments that where
    not identified by sentiment140

Polarity of Eiffel Tower tweet sentiment140 Our approach
Positive 11 10
Negative 1 1
Undetermined 2 3
  • 2 tweets analyzed differently
  • Our approach identified the correctness polarity

18
  • Conclusion
  • Original approach for POI data enrichment
  • Definition of a similarity formula to compare POI
    data
  • Linguistic approach to analyze reviews and
    comments
  • Complete tool implemented in Java
  • Experiments shows promising results
  • About 86 of similarity precision
  • Linguistic approach able to identify exactly
    positive and negative aspects of the POI

19
  • Future Work
  • Similarity measure optimisation
  • Compare selected Web Pages for the POI
  • Filtering positive and negative expressions
  • Using metrics like frequency
  • Learning new positive and negative verbs and
    adjectives
  • Using SentiWordNet (Baccianella et al., 2010)
  • Using adverbs in the opinion mining approach
    (Benamara et al., 2007)
  • Very good food is stronger than Good food

20
  • Thank you!

21
  • The Enrichment System

22
  • Experiments and Results Similarity formula
  • Comparing our formula with different parameter
    combinations
  • Best precision but lowest recall by using the
    first formula
  • Exact string match
  • Combination of Name and categories leads to the
    best F-measure
  • Categories in Foursquare are similar to our data
  • Our formula has gives better precision

Formula Precision Recall F-measure
Name, address and categories 0.94 0.25 0.40
Name and address 0.88 0.15 0.25
Name and categories 0.82 0.71 0.76
Our formula 0.86 0.66 0.75
About PowerShow.com