DBxplorer: A System for Keyword-Based Search over Relational Databases - PowerPoint PPT Presentation

About This Presentation
Title:

DBxplorer: A System for Keyword-Based Search over Relational Databases

Description:

Physical schema needs to be explored ... COROLLA. CAMRY. ACCORD. CRV. 2. 4. 2. 2. 2. 2. 5. Overview of DBxplorer. Supports Conjunctive Queries ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 26
Provided by: bpc7
Learn more at: https://ranger.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: DBxplorer: A System for Keyword-Based Search over Relational Databases


1
DBxplorer A System for Keyword-Based Search
over Relational Databases Paper By Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das
Presented By Bhushan Chaudhari University of
Texas at Arlington
2
Contents
  • Typical Key-word Search
  • Overview of DBxplorer
  • Processing Component for producing Symbol Table
  • Storing Symbol table in databases
  • Compaction Algorithms
  • Search Component
  • Finding Matches using graphs
  • Generalized Matches
  • Experiments and Statistics

3
Key-word Search
  • General Scenario
  • Finding specified keywords in same or different
    tables or completely different schemas
  • Physical schema needs to be explored
  • Existing indexes need to be leveraged by using
    proper data structures
  • Data structure gt Structure to be followed by
    symbol tables

4
Levels of Granularity
  • Column level granularity
  • Row level granularity

2
4
TOYOTA
COROLLA
2
TOYOTA
CAMRY
2
HONDA
ACCORD
2
HONDA
CRV
2
5
Overview of DBxplorer
  • Supports Conjunctive Queries
  • Implementation using MSSQL Server2000 and IIS Web
    Server
  • ODBC Interface for database Connection
  • Uses the functionality of relational engines very
    well

6
Overview of DBxplorer (Contd ..)
  • Publish
  • Determine the tables to be published
  • all_tab, all_tab_user
  • Table relations in form of graphs
  • Columns to be published
  • all_tab_columns, user_tab_columns
  • select table_name from user_tab_columns
  • where column_name desired_name
  • Building Symbol Tables

7
Overview of DBxplorer (Contd ..)
  • Search
  • Symbol table is looked to identify the tables,
    columns and rows containing keywords
  • Join trees (set of tables which are related)
  • Query is constructed for each join tree and
    tuples containing all keywords are found

8
Design Alternatives for Symbol Table
  • Location Granularity
  • Column level and cell level
  • Why there is no row level granularity?
  • Hard to implement
  • SQL queries work w.r.t. columns

9
Factors affected due to granularity
  • Space and time requirements
  • Searching time
  • Time required to build
  • Using in-built operators like distinct
    accumulating all values inside column becomes
    easy
  • Most of the typical database systems use Hash
    Value indexes which are good for equality searches

10
Factors affected due to granularity (Contd..)
  • Search Performance
  • Typically depends upon presence of indices
  • Ease of maintenance
  • One time creation
  • Pub-Col Pub-Cell
  • Insert Y/N Y
  • Update Y/N Y
  • Delete N Y
  • Which type of symbol table should we use?

11
Storing Symbol Tables
  • Pub-Col Representation
  • Key-word -gt Column Id
  • Hash Value -gt Column Id
  • Types of Compression Algorithms Used
  • Foreign Key Compression (FK-Comp)
  • General Compression Technique (CP-Comp)

12
Building Compression table
13
Compression Algorithm
14
Storing Symbol Tables (Contd..)
  • Pub-Cell Representation
  • Hash Value -gt Cell ID
  • Hash Value -gt Cell ID List
  • Retrieval of all locations for a key-word is
    achieved by looking up a single row from pub-cell
    symbol table
  • No Compression
  • Pre-computation is complex
  • Inverted lists can be implemented using this
    table

15
Finding Matches for Keyword Search
  • Each join tree is mapped to a SQL Query and
    selects those rows that contain all keywords.
  • Ranking is based upon no. of joins (Quite
    similar to ranking upon proximity of words in
    documents)

16
Search Algorithm
17
Supporting Generalized Matches
  • Where T.C like STRING
  • Microsoft SQL Server (Most Recent version)
  • Where CONTAINS(C,String)
  • Traditional databases use B Trees indices
  • Pub-Prefix Representation
  • Enables token searches having form WHERE T.C LIKE
    PK
  • Symbol table entry (Hash(k), T.C, P)
  • Efficiency depends on length of prefix
  • Length of prefix also affects symbol table size
    and build time.
  • Stemming

18
Experiments Symbol Table Granularity
  • Symbol Table Size

19
Experiments Symbol Table Granularity
  • Publishing Time

20
Experiments Symbol Table Granularity
  • Search Performance

21
Experiments Scalability of Pub-Col
  • Data Size and Distribution

22
Experiments Scalability of Pub-Col
  • Number of Keywords in Search

23
Experiments Scalability of Pub-Col
  • Effectiveness of Compression Techniques

24
Experiments Scalability of Pub-Col
  • Effectiveness of Pub-Prefix Method

25
References
  • DBxplorer A System for Keyword Based Search over
    Relational Databases ICDE 2002
  • By Sanjay Agrawal, Surajit Chaudhuri, Gautam
    Das
  • DBXplorer Enabling Keyword Search over
    Relational Databases. (Demo), SIGMOD Conference
    2002 627
  • By Sanjay Agrawal, Surajit Chaudhuri, Gautam
    Das
Write a Comment
User Comments (0)
About PowerShow.com