Title: DBxplorer: A System for Keyword-Based Search over Relational Databases
1 DBxplorer A System for Keyword-Based Search
over Relational Databases Paper By Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das
Presented By Bhushan Chaudhari University of
Texas at Arlington
2Contents
- Typical Key-word Search
- Overview of DBxplorer
- Processing Component for producing Symbol Table
- Storing Symbol table in databases
- Compaction Algorithms
- Search Component
- Finding Matches using graphs
- Generalized Matches
- Experiments and Statistics
3Key-word Search
- General Scenario
- Finding specified keywords in same or different
tables or completely different schemas - Physical schema needs to be explored
- Existing indexes need to be leveraged by using
proper data structures - Data structure gt Structure to be followed by
symbol tables
4Levels of Granularity
- Column level granularity
- Row level granularity
-
2
4
TOYOTA
COROLLA
2
TOYOTA
CAMRY
2
HONDA
ACCORD
2
HONDA
CRV
2
5Overview of DBxplorer
- Supports Conjunctive Queries
- Implementation using MSSQL Server2000 and IIS Web
Server - ODBC Interface for database Connection
- Uses the functionality of relational engines very
well
6Overview of DBxplorer (Contd ..)
- Publish
- Determine the tables to be published
- all_tab, all_tab_user
- Table relations in form of graphs
- Columns to be published
- all_tab_columns, user_tab_columns
- select table_name from user_tab_columns
- where column_name desired_name
- Building Symbol Tables
7Overview of DBxplorer (Contd ..)
- Search
- Symbol table is looked to identify the tables,
columns and rows containing keywords - Join trees (set of tables which are related)
- Query is constructed for each join tree and
tuples containing all keywords are found
8Design Alternatives for Symbol Table
- Location Granularity
- Column level and cell level
- Why there is no row level granularity?
- Hard to implement
- SQL queries work w.r.t. columns
9Factors affected due to granularity
- Space and time requirements
- Searching time
- Time required to build
- Using in-built operators like distinct
accumulating all values inside column becomes
easy - Most of the typical database systems use Hash
Value indexes which are good for equality searches
10Factors affected due to granularity (Contd..)
- Search Performance
- Typically depends upon presence of indices
- Ease of maintenance
- One time creation
- Pub-Col Pub-Cell
- Insert Y/N Y
- Update Y/N Y
- Delete N Y
- Which type of symbol table should we use?
11Storing Symbol Tables
- Pub-Col Representation
- Key-word -gt Column Id
- Hash Value -gt Column Id
- Types of Compression Algorithms Used
- Foreign Key Compression (FK-Comp)
- General Compression Technique (CP-Comp)
12Building Compression table
13Compression Algorithm
14Storing Symbol Tables (Contd..)
- Pub-Cell Representation
- Hash Value -gt Cell ID
- Hash Value -gt Cell ID List
- Retrieval of all locations for a key-word is
achieved by looking up a single row from pub-cell
symbol table - No Compression
- Pre-computation is complex
- Inverted lists can be implemented using this
table
15Finding Matches for Keyword Search
- Each join tree is mapped to a SQL Query and
selects those rows that contain all keywords. - Ranking is based upon no. of joins (Quite
similar to ranking upon proximity of words in
documents)
16Search Algorithm
17Supporting Generalized Matches
- Where T.C like STRING
- Microsoft SQL Server (Most Recent version)
- Where CONTAINS(C,String)
- Traditional databases use B Trees indices
- Pub-Prefix Representation
- Enables token searches having form WHERE T.C LIKE
PK - Symbol table entry (Hash(k), T.C, P)
- Efficiency depends on length of prefix
- Length of prefix also affects symbol table size
and build time. - Stemming
18Experiments Symbol Table Granularity
19Experiments Symbol Table Granularity
20Experiments Symbol Table Granularity
21Experiments Scalability of Pub-Col
- Data Size and Distribution
22Experiments Scalability of Pub-Col
- Number of Keywords in Search
23Experiments Scalability of Pub-Col
- Effectiveness of Compression Techniques
24Experiments Scalability of Pub-Col
- Effectiveness of Pub-Prefix Method
25References
- DBxplorer A System for Keyword Based Search over
Relational Databases ICDE 2002 - By Sanjay Agrawal, Surajit Chaudhuri, Gautam
Das - DBXplorer Enabling Keyword Search over
Relational Databases. (Demo), SIGMOD Conference
2002 627 - By Sanjay Agrawal, Surajit Chaudhuri, Gautam
Das