A Platform for Personal Information Integration Xin Luna Dong, Alon Halevy lunadong, aloncs'washingt - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A Platform for Personal Information Integration Xin Luna Dong, Alon Halevy lunadong, aloncs'washingt

Description:

... personal information, with objects and their associations. Data Repository ... Import instances and associations from external source into personal information ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A Platform for Personal Information Integration Xin Luna Dong, Alon Halevy lunadong, aloncs'washingt


1
A Platform for Personal Information
IntegrationXin (Luna) Dong, Alon
Halevylunadong, alon_at_cs.washington.uga.eduUniv
ersity of Washington
  • Paper Review
  • by
  • Delroy Cameron
  • 2/23/2006

2
A Platform for Personal Information Integration
Introduction
  • Digital Information Explosion
  • Search
  • WWW Search (e.g. Google, Yahoo)
  • Personal Search (Desktop data)
  • SEMEX (SEMantic EXplorer)
  • Goals of Personal Information Management (PIM)
  • Uses Semantic Associations
  • Browsing and Querying
  • Automatic Creation/Detection of Association

3
A Platform for Personal Information Integration
1. Browsing by Semantic Association
  • SEMEX Technique
  • Requires Logical View instead of hierarchical
  • Objects and relations between those Objects
  • e.g. Person, Book, AuthoredBy, AttachedTo
  • Instantiation of Logical View
  • Association Database or Personal Information
    space
  • Current Problem
  • Keyword Based Search
  • Data stored in Directory Hierarchies on PCs
  • Time to traverse trees to find relationships

4
A Platform for Personal Information Integration
SEMEX Data Association Techniques
  • Association by File type
  • Simple Case
  • Email Clients
  • e.g. senders and recipients
  • More Complex Case
  • e.g. AuthorOf
  • Associations by LaTex Types and PPT
  • Association using External Source
  • e.g. List of all Graduate Students in a
    University
  • Association by Integration
  • Multiple sources with simpler associations
  • Spread Sheets, WWW
  • Reconcile References

5
A Platform for Personal Information Integration
2. Automatic Creation/Detection of Associations
  • Data Integration using SEMEX
  • Import data into the users personal information
    space
  • From www
  • From local files
  • Formatting Data
  • Scraping from Files or Web pages
  • Form Associations
  • External sources and users personal Domain
    Model
  • Import Data
  • Reconcile references
  • Analyze data for pattern matching
  • Derive new associations

6
A Platform for Personal Information Integration
3. PIM Challenges
  • Handling Long-lived/Evolving Data
  • Data consistency, seamless updating
  • Reference Reconciliation
  • Schema Mapping
  • Right Granularity for personal Data
  • Keep models simple, users not technically savvy
  • Develop user-oriented v. Database Design
  • Let system fit user habitat
  • Not fit user activities into Database
    environment
  • Combining structured/structured data
  • Seamless conversion oblivious to user

7
A Platform for Personal Information Integration
4. SEMEX Architecture
  • Domain Model
  • Ontology of personal information, with objects
    and their associations
  • Data Repository
  • Association database or Personal Information
    Space
  • Reference Reconciliation
  • Ontology of personal information
  • Associations and Instances
  • Simple already stored
  • Extracted rich objects e.g. power point
  • External Sources
  • Defined similar to views in a database

8
(No Transcript)
9
SEMEX Architecture
10
SEMEX Interface
11
A Platform for Personal Information Integration
4.1 Browsing and Querying
  • SEMEX Keyword Search
  • All documents mentioning the keyword
  • Returns Heterogeneous data,
  • From many different Classes
  • SEMEX Selection Queries
  • Specified Class, and specify given Attributes
  • SEMEX Association Queries
  • Conjunctive over triplets, pair of objects and
    their relation
  • Returns links much like web browsing
  • e.g. Search all Bernstein publications

12
A Platform for Personal Information Integration
5. Reference Reconciliation
  • Mesh External Data to users Domain Model
  • e.g. Mike Carey, M. Carey refer to the same
    person
  • Previous Techniques
  • Reconciling tuple references in DB Table
  • Assume References have same attributes
  • Each attribute has a single value
  • Challenges
  • Heterogeneous Data, different set of attributes
  • References have many attributes
  • Each attribute may have multiple values

13
A Platform for Personal Information Integration
Reference Reconciliation contd
  • SEMEX Approach
  • Important Definitions
  • Reference
  • One or several representations of an object
  • Class
  • May have several Keys
  • A Key is a set of Attributes that uniquely
    define an object in a class
  • e.g Person Class email, fname lname Keys

14
A Platform for Personal Information Integration
Reference Reconciliation contd
  • SEMEX Approach
  • Pair wise Decisions
  • Enrich references when they match with others
  • Contain more information about the domain object
  • Support more sophisticated decision matching
  • An Enriched reference contains a set of values
  • e.g. multiple spellings for last name
  • References may be grouped
  • e.g. Publications with multiple references to
    each of its authors
  • Group so that all reference points to single
    author

15
A Platform for Personal Information Integration
Reference Reconciliation contd
  • Algorithm
  • Step1 Based on Shared Keys
  • Merge input references on a key value
  • e.g. Person email, fname, lname,
  • Step2 - Based on String similarity
  • Use edit distance to measure similarity
  • An Independent heuristic, uses known format
  • e.g. phone, email
  • Step3 Applying Global Knowledge
  • Time Series Comparison
  • collects references judges to be similar, gets
    the time stamp and merges if there is little or
    no overlap
  • Step4 - Search Engine Analysis
  • Feeds text into Google and compares top hits

16
A Platform for Personal Information Integration
Objects from Multiple Classes
17
Case Study
18
A Platform for Personal Information Integration
  • Preliminary Experiments

19
A Platform for Personal Information Integration
Reference Reconciliation contd
  • Objects in Multiple Classes
  • Reconcile each class in isolation
  • Create a Dependency Graph
  • A Node for each candidate pair of references
  • Each node has similarity score (0 to 1)
  • An edge between nodes mean we must reconsider
    similarity if we re-compute
  • Using the Dependency Graph
  • If c1 and c2 merged, pa1 and pa2 also merge
  • pe1 , pe2, pe3, and pe4 may merge
  • i1 and i2 merge

20
A Platform for Personal Information Integration
Reference Reconciliation contd
  • Evolving Objects
  • Publication may change author, title, etc
  • Blurs the line of when to model as single or
    multiple reference
  • Distinguish granularity
  • Coarse Grain compile from fine-grained based
    on certain similarities

21
A Platform for Personal Information Integration
Automatic Integration
  • Integrate External Sources
  • Approach
  • Import instances and associations from external
    source into personal information space
  • Mark imported data as Temporary or Permanent
  • Pose query to find intersection of data
  • Export this intersected data to the spreadsheet
  • Previous Work
  • Schema Mapping

22
Browsing Associations with Semex
LUNA DONG
Different references to the same person
AuthorOfArticles
MentionedIn
SenderOfEmails
RecipientOfEmails
Reference reconciliation identifies all
references to the same real-world object
Coauthors
23
Who are Working on Semex? Keyword Search
Returns Associated Instances
Search Semex
3 Conferences for publishing Semex papers
105 Images in Semex papers
2398 Messages 2 Presentations 65 Articles
15 Persons working on Semex (though they are not
named Semex )
24
How do I Get to Know this Person? Semex
Provides Lineage Information
Susan Dumais
Latest Lineage
Shortest Lineage
User Do I know this paper of Susan Dumais?
Semex Yes, you once cited it.
The last time we mentioned Susan Dumais is in an
email
Earliest Lineage
I got to know Susan Dumais by citing her paper
25
A Platform for Personal Information Integration
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com