Title: From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Ma
1From Personal Desktops to Personal Dataspaces A
Report on Building the iMeMex Personal Dataspace
Management System
- Jens Dittrich Lukas Blunschi
Markus Färber Olivier Girard Shant
Karakashian Marcos Vaz Salles - BTW 2007
2A World of Data Silos
- gt 80 of data outside of relational databases
- Documents, spreadsheets, presentations
- Web pages
- Email, instant messages, news feeds
- Images, audio, video
- Specialized systems for many of the data types
(filesystems, web/email servers, DBMSs) - Lack of unified services over ALL the data
3Dataspace
- The complete set of information (documents,
emails, images, etc) belonging to one
organization or task - Examples
- Personal dataspaces ? your messages, your family
photos - Enterprise dataspaces ? all information about a
key customer - Scientific dataspaces ? all information about one
given research project - Includes a set of data sources and relationships
among pieces of information in the sources
4Dataspace Management System
- New system abstraction
- A hybrid of
- Search Engine
- Database Management System
- Information Integration System
- Data Sharing System
- Offers services on ALL the data
- Keyword and structural search to start with
(baseline) - Provides pay-as-you-go information integration
- Model data relationships and their evolution
- However, does not acquire full control of data
- System does not own the data
5Projects on Dataspaces
- Vision Paper on Dataspaces
- Mike Franklin (UC Berkeley), Alon Halevy (U Wash
/ Google), David Maier (U Portland). - From Databases to Dataspaces A New Abstraction
for Information Management. SIGMOD Record,
December 2005. - ETH Zürich iMeMex
- UC Berkeley (Shawn Jeffrey) and Google (Alon
Halevy) - U Portland (David Maier)
- Purdue U (Nehme, Elke Rundensteiner, et. al.)
6Our Focus Personal Dataspaces
Great applications, but information
integration is done by the user
User
Applications
PDSMS
Data Sources
Email Server
Web Server
PC
iPod
7So far...
- Vision Dataspaces (VLDB 2005, SIGIR PIM 2006)
- To come...
- Data model single framework for different types
of data (VLDB 2006) - System Architecture Mediation / Warehousing
(CIDR 2007, BTW 2007) - Pay-as-you-go information integration (ongoing
work)
8Characteristics of Personal Data
- Non-schematic
- Heterogeneous collections, no formally defined
schema - Several possible serializations
- Hundreds of file formats, different encodings
- Contains arbitrary graphs
- References within documents (LaTeX/Word),
filesystem links - Distributed among different data sources
- Filesystem, email servers, web servers,
databases, iPod - Infinite
- RSS, ATOM, email streams
9Data Model Options
Extension XLink/ XPointer
Specific schema
View mechanism
Extension ActiveXML
Extension Document streams
Extension Relational streams
Extension XML streams
10Data Models for Personal Information
Abstraction Level
lower
higher
11iDM iMeMex Data Model
- Our approach get the data model closer to
personal information not the other way around - Supports
- Unstructured, semi-structured and structured
data, e.g., filesfolders, XML, relations - Clearly separation of logical and physical
representation of data - Arbitrary directed graph structures, e.g.,
section references in LaTeX documents, links in
filesystems, etc - Lazily computed data, e.g., ActiveXML (Abiteboul
et. al.) - Infinite data, e.g., media and data streams
See VLDB 2006
12iDM Lazily Computed Graph
- Nodes and edges are lazily computed
- Each node is a Resource View
13iDM Lazily Computed Graph
- Behind the scenes, obtaining the content may
- Read a file on the filesystem
- Access a page on the web
- Fetch the data from an index structure
- Behind the scenes, obtaining the group may
- Get the children of a folder in the filesystem
- Look up an edge replica
- Obtain the sections of a document
14How to implement iDM Architectural Perspective
Complex operators (query algebra)
IndexesReplicas access (warehousing)
- Data source access (mediation)
15Further Research Challenges in Dataspace
Management Systems
- Pay-as-you-go information integration
- Model relationships in the dataspace
- Examples semantic equivalences, lineage
relationships - Distributed Dataspaces
- Query language specification (iQL)
16iMeMex Prototype Implementation
- iMeMex Prototype
- 780 classes
- 70,900 LOC
- Java-based supported on Linux, Mac and Windows
- OSGi-based Everything is a Plug-in ( 52
bundles) - Open-source (Apache 2.0) http//www.imemex.org
- Team
- Advisor
- Two Ph.D. students
- Three M.Sc. students
- Thirteen Semester Project students
17Conclusions
- Dataspace Management Systems are a new system
abstraction - iMeMex is among the first implementations of this
new breed of systems our focus Personal
Dataspaces - Dataspace Management Systems call for
- New data model
- New system architecture
- New capabilities for pay-as-you-go information
integration - More information http//www.imemex.org
18Questions? Thanks in Advance for your Feedback! ?
19Backup Slides
20Personal Dataspaces Literature
- Dittrich, Vaz Salles, Kossmann, Blunschi. iMeMex
Escapes from the Personal Information Jungle
(Demo Paper). VLDB, September 2005. - Dittrich, Vaz Salles. iDM A Unified and
Versatile Data Model for Personal Dataspace
Management. VLDB, September 2006 - Dittrich. iMeMex A Platform for Personal
Dataspace Management. SIGIR PIM, August 2006. - Blunschi, Dittrich, Girard, Karakashian, Vaz
Salles. A Dataspace Odyssey The iMeMex Personal
Dataspace Management System (Demo Paper). CIDR,
January 2007. - Dittrich, Blunschi, Färber, Girard, Karakashian,
Vaz Salles. From Personal Desktops to Personal
Dataspaces A Report on Building the iMeMex
Personal Dataspace Management System. BTW, March
2007