The Debye Environment for Web Data Management - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The Debye Environment for Web Data Management

Description:

The Debye Environment for Web Data Management Julia Erdman SE521 – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 26
Provided by: Regis481
Category:

less

Transcript and Presenter's Notes

Title: The Debye Environment for Web Data Management


1
The Debye Environment for Web Data Management
  • Julia Erdman
  • SE521

2
The Web
  • Huge reservoir of data, including digital
    libraries and online stores
  • data containers
  • Identifiable through visual clues
  • Structural variations and irregularities

3
What is Debye?
  • Extract web data from its original sources
  • Logically represent it in a format that allows
    further manipulation
  • Use of extended tables support solutions to
    several Web data management problems

4
Debye what it does
  • Example query amazon.com for Paul McCartney
  • How do users keep track of updates?
  • Revisit
  • Requery
  • Record data
  • Compare with old data

5
Debye what it does
  • Debye
  • Extract target data from pages
  • Store data in nested tables

6
Debye what it does
7
Debye what it does
8
Debye what it does
  • Once the tables are built
  • They can be queried
  • Store data in relational database

9
Debye how it works
  • GUI that allows users to cut and paste values
    from a target page
  • GUI generates an object-extraction pattern (OE
    pattern)
  • OE pattern is fed to a generic extractor
  • The extractor output the extracted objects in
    XML, Debye textual object repository (DTOR)

10
Debye how it works
  • There is also a user-independent example
    generator
  • Compares object in the data repository with
    objects on a webpage

11
Debye how it works
12
Debye how it works
  • Nested tables
  • Use well-known query operations
  • Easy to store the data in relational databases

13
Debye data model
  • ProductList (Storeatom, Info (Title atom,
    Artist atom, AudioTypeatom) (Titleatom,
    Authorsatom, BookTypeatom)(Itematom,
    Bidatom, Timeatom))

14
ProductList (Storeatom, Info (Title atom,
Artist atom, AudioTypeatom) (Titleatom,
Authorsatom, BookTypeatom) (Itematom,
Bidatom, Timeatom) )
15
Debye data extraction
  • GUI
  • Helps users specify objects
  • User marks pieces of data from the Source and
    copy them to columns of a table
  • User can Insert, Remove, Rename, Group, Split
    columns using the GUI

16
Debye data extraction
17
Debye OE Pattern Generation
  • Patterns generated based on nested tables
    assembled by the user
  • Implicitly informs Debye of the objects
    syntactic context and structure through examples
  • OE pattern generation
  • Target objects structure, in the form of a table
    scheme
  • Textual surroundings markups, symbols, keywords
  • Precisely, it is a pair

18
Debye extraction strategy
  • The extractor reads and parses the rules from the
    OE pattern generation
  • Given an OE pattern and set of pages as input,
    the extractor extracts data in a bottom-up
    procedure
  • Atomic components extracted first
  • Then a complete object is assembled

19
(No Transcript)
20
Debye query interface
  • Selection
  • Projection
  • Nest
  • Unnest

21
(No Transcript)
22
Debye data storage manager
  • Store Web data in relational databases
  • Mappings
  • Map-Table
  • Creates a relation for every distinct table
    scheme it finds in the target datas repository
  • Map-Column
  • Created relations for columns with atom lists

23
(No Transcript)
24
Questions?? Comments??
25
  • Laender, A.H.F., et.al. "The Debye environment
    for Web data management" IEEE Internet Computing.
    Volume 6, Issue 4, July-Aug. 2002. pp. 60 - 69
Write a Comment
User Comments (0)
About PowerShow.com