From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Ma - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Ma

Description:

Fetch the data from an index structure. Behind the scenes, obtaining the group may: ... New capabilities for pay-as-you-go information integration ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 21
Provided by: arturola
Category:

less

Transcript and Presenter's Notes

Title: From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Ma


1
From Personal Desktops to Personal Dataspaces A
Report on Building the iMeMex Personal Dataspace
Management System
  • Jens Dittrich Lukas Blunschi
    Markus Färber Olivier Girard Shant
    Karakashian Marcos Vaz Salles
  • BTW 2007

2
A World of Data Silos
  • gt 80 of data outside of relational databases
  • Documents, spreadsheets, presentations
  • Web pages
  • Email, instant messages, news feeds
  • Images, audio, video
  • Specialized systems for many of the data types
    (filesystems, web/email servers, DBMSs)
  • Lack of unified services over ALL the data

3
Dataspace
  • The complete set of information (documents,
    emails, images, etc) belonging to one
    organization or task
  • Examples
  • Personal dataspaces ? your messages, your family
    photos
  • Enterprise dataspaces ? all information about a
    key customer
  • Scientific dataspaces ? all information about one
    given research project
  • Includes a set of data sources and relationships
    among pieces of information in the sources

4
Dataspace Management System
  • New system abstraction
  • A hybrid of
  • Search Engine
  • Database Management System
  • Information Integration System
  • Data Sharing System
  • Offers services on ALL the data
  • Keyword and structural search to start with
    (baseline)
  • Provides pay-as-you-go information integration
  • Model data relationships and their evolution
  • However, does not acquire full control of data
  • System does not own the data

5
Projects on Dataspaces
  • Vision Paper on Dataspaces
  • Mike Franklin (UC Berkeley), Alon Halevy (U Wash
    / Google), David Maier (U Portland).
  • From Databases to Dataspaces A New Abstraction
    for Information Management. SIGMOD Record,
    December 2005.
  • ETH Zürich iMeMex
  • UC Berkeley (Shawn Jeffrey) and Google (Alon
    Halevy)
  • U Portland (David Maier)
  • Purdue U (Nehme, Elke Rundensteiner, et. al.)

6
Our Focus Personal Dataspaces
Great applications, but information
integration is done by the user
User
Applications
PDSMS
Data Sources
Email Server
Web Server
PC
iPod
7
So far...
  • Vision Dataspaces (VLDB 2005, SIGIR PIM 2006)
  • To come...
  • Data model single framework for different types
    of data (VLDB 2006)
  • System Architecture Mediation / Warehousing
    (CIDR 2007, BTW 2007)
  • Pay-as-you-go information integration (ongoing
    work)

8
Characteristics of Personal Data
  • Non-schematic
  • Heterogeneous collections, no formally defined
    schema
  • Several possible serializations
  • Hundreds of file formats, different encodings
  • Contains arbitrary graphs
  • References within documents (LaTeX/Word),
    filesystem links
  • Distributed among different data sources
  • Filesystem, email servers, web servers,
    databases, iPod
  • Infinite
  • RSS, ATOM, email streams

9
Data Model Options
Extension XLink/ XPointer
Specific schema
View mechanism
Extension ActiveXML
Extension Document streams
Extension Relational streams
Extension XML streams
10
Data Models for Personal Information
Abstraction Level
lower
higher
11
iDM iMeMex Data Model
  • Our approach get the data model closer to
    personal information not the other way around
  • Supports
  • Unstructured, semi-structured and structured
    data, e.g., filesfolders, XML, relations
  • Clearly separation of logical and physical
    representation of data
  • Arbitrary directed graph structures, e.g.,
    section references in LaTeX documents, links in
    filesystems, etc
  • Lazily computed data, e.g., ActiveXML (Abiteboul
    et. al.)
  • Infinite data, e.g., media and data streams

See VLDB 2006
12
iDM Lazily Computed Graph
  • Nodes and edges are lazily computed
  • Each node is a Resource View

13
iDM Lazily Computed Graph
  • Behind the scenes, obtaining the content may
  • Read a file on the filesystem
  • Access a page on the web
  • Fetch the data from an index structure
  • Behind the scenes, obtaining the group may
  • Get the children of a folder in the filesystem
  • Look up an edge replica
  • Obtain the sections of a document

14
How to implement iDM Architectural Perspective
Complex operators (query algebra)
IndexesReplicas access (warehousing)
  • Data source access (mediation)

15
Further Research Challenges in Dataspace
Management Systems
  • Pay-as-you-go information integration
  • Model relationships in the dataspace
  • Examples semantic equivalences, lineage
    relationships
  • Distributed Dataspaces
  • Query language specification (iQL)

16
iMeMex Prototype Implementation
  • iMeMex Prototype
  • 780 classes
  • 70,900 LOC
  • Java-based supported on Linux, Mac and Windows
  • OSGi-based Everything is a Plug-in ( 52
    bundles)
  • Open-source (Apache 2.0) http//www.imemex.org
  • Team
  • Advisor
  • Two Ph.D. students
  • Three M.Sc. students
  • Thirteen Semester Project students

17
Conclusions
  • Dataspace Management Systems are a new system
    abstraction
  • iMeMex is among the first implementations of this
    new breed of systems our focus Personal
    Dataspaces
  • Dataspace Management Systems call for
  • New data model
  • New system architecture
  • New capabilities for pay-as-you-go information
    integration
  • More information http//www.imemex.org

18
Questions? Thanks in Advance for your Feedback! ?
19
Backup Slides
20
Personal Dataspaces Literature
  • Dittrich, Vaz Salles, Kossmann, Blunschi. iMeMex
    Escapes from the Personal Information Jungle
    (Demo Paper). VLDB, September 2005.
  • Dittrich, Vaz Salles. iDM A Unified and
    Versatile Data Model for Personal Dataspace
    Management. VLDB, September 2006
  • Dittrich. iMeMex A Platform for Personal
    Dataspace Management. SIGIR PIM, August 2006.
  • Blunschi, Dittrich, Girard, Karakashian, Vaz
    Salles. A Dataspace Odyssey The iMeMex Personal
    Dataspace Management System (Demo Paper). CIDR,
    January 2007.
  • Dittrich, Blunschi, Färber, Girard, Karakashian,
    Vaz Salles. From Personal Desktops to Personal
    Dataspaces A Report on Building the iMeMex
    Personal Dataspace Management System. BTW, March
    2007
Write a Comment
User Comments (0)
About PowerShow.com