Distributed Information Retrieval - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Distributed Information Retrieval

Description:

Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 11
Provided by: self150
Category:

less

Transcript and Presenter's Notes

Title: Distributed Information Retrieval


1
Distributed Information Retrieval
  • Server Ranking for Distributed Text Retrieval
    Systems on the Internet
  • B. Yuwono and D. Lee
  • Siemens TREC-4 Report Further Experiments with
    Database Merging E. Vorhees

Brian Shaw CS 5604
2
Issue Merging for Effective Results
  • multiple brokers (take search queries), multiple
    collection servers
  • broker must select appropriate collection servers
    and merge results

3
Server Ranking overview
  • Problem cost (including users time) of
    broadcasting to all servers and processing power
  • Solution broker ranks collection servers
    (goodness score) broadcasts query to at most s
    (sigma) collection servers (preset number or
    scoring threshold) merges results

1- Server Ranking for Distributed Text Retrieval
on the Internet
4
Server Ranking Server Selection
  • Relies solely on Document Frequency data (DF)
    all collection servers must report changes to
    broker
  • Cue Validity Variance (CVV) goodness score is
    based on estimate that term j distinguishes one
    collection server from another not an indication
    of quantity or quality of relevance

1- Server Ranking for Distributed Text Retrieval
on the Internet
5
Server Ranking Merging
  • Assumption 1 the best document in collection i
    is equally relevant to the best document in
    collection k
  • A collection server containing a few but highly
    relevant documents will contribute to the final
    list.
  • Assumption 2 the distance between two
    consecutive document ranks is inversely
    proportional to the goodness score
  • Relative goodness scores are roughly proportional
    to the number of documents contributed to the
    final list.
  • Final ranking is a combination of goodness score
    and local rankings.

1- Server Ranking for Distributed Text Retrieval
on the Internet
6
Experiments (overview)
  • Problem broker has no access to meta-data from
    isolated collection servers
  • Solution choose collection server(s) based on
    results from previous training queries

2- Further Experiments with Database Merging
7
Experiments Server Selection, two approaches
  • Query Clustering (QC) cluster training queries
    (based on of same documents retrieved) and
    calculate cluster centroid vector compare
    query vector to centroid vector and assign weight
    to collection
  • Modeling Relevant Document Distributions (MRDD)
    find M most similar training queries and assign
    weights to collections based on the training
    runs relevant document distribution

2- Further Experiments with Database Merging
8
Experiments Merging
  • N documents retrieved from each server as
    determined by weights
  • Final ranking is a random process roll a C-faced
    die that is biased by the number of documents
    still to be picked from each of the C collections

2- Further Experiments with Database Merging
9
Comparison
1-Server Ranking 2-Experiments
Brokers Knowledge Shared Document Frequency Data Training Query Results
Collection Server Selection CVV Goodness Scoring Comparison to Training Queries
Merging Goodness Score Local Rank Random
10
Conclusions
  • The server ranking method proposed by Yuwono and
    Lee is an effective way to minimize operating
    costs (such as time) in an environment where
    brokers and collection servers can share document
    frequency data.
  • The isolated merging strategies proposed by
    Vorhees is an effective way to choose a
    collection server where no meta-information is
    shared between the broker and collection server.
Write a Comment
User Comments (0)
About PowerShow.com