Parallel Text Searching on a Beowulf Cluster using SRW - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Parallel Text Searching on a Beowulf Cluster using SRW

Description:

Demonstrate 100 searches/second on our 50 million record WorldCat database ... Searched using OCLC Research's Open Source Gwen and Pears toolkits. Architecture ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 13
Provided by: Ralph102
Category:

less

Transcript and Presenter's Notes

Title: Parallel Text Searching on a Beowulf Cluster using SRW


1
Parallel Text Searching on a Beowulf Cluster
using SRW
  • Ralph LeVan
  • OCLC Research

2
Goal
  • Demonstrate 100 searches/second on our 50 million
    record WorldCat database residing on a small
    Beowulf Cluster

3
Beowulf Cluster
  • 24 nodes
  • 2 2.8GHtz Xeon CPUs
  • 4 GB of memory
  • 80 GB of disk on 23 application nodes
  • 130 GB of disk on root node

4
Database
  • 50 million records
  • 69 partitions (700,000 records)
  • 3 partitions per application node
  • Partitioned by popularity
  • Searched using OCLC Researchs Open Source Gwen
    and Pears toolkits

5
Architecture
  • 1 Tomcat on each application node
  • 3 SRW/U databases configured for each Tomcat
  • 1 client application on the root node

6
Trial 1
  • SRW client searching 69 databases
  • Result
  • 2 searches/second (437ms/search)
  • Ganglia Cluster Report shows the root node
    glowing red and the application nodes a peaceful
    blue

7
Trial 2
  • SRU client with scanned response searching 69
    databases
  • Result
  • 25 searches/second (40ms/search)
  • Ganglia Cluster Report still shows the root node
    glowing red and the application nodes a peaceful
    blue

8
Trial 3
  • SRW client with hand built XML and scanned
    response searching 69 databases
  • Result
  • 21 searches/second (46ms/search)
  • Ganglia Cluster Report still shows the root node
    glowing red and the application nodes a peaceful
    blue
  • SRW dropped

9
Rearchitecture
  • Problem Ganglia Reports indicate that the client
    is the bottleneck
  • Solution Put a 3-way federator on each Tomcat (a
    virtual database for the client) and have the
    client search 23 databases instead of 69

10
Result
  • SRU client 71 searches/second (14 ms)
  • Hand-built SRW client 33 searches/second (30ms)
  • Original SRW client 6 searches/second(164)
  • Ganglia cluster report still shows root node red,
    but application nodes are now green and yellow

11
Rearchitecture
  • Create a virtual 23-way database on each Tomcat
    that will federate searches from the 23 virtual
    3-way databases
  • Put one of these on each Tomcat
  • Create a new client that sends searches on
    threads to each available 23-way database

12
Result
  • With 23 threads, 172 searches/second
  • Average response time of 170ms
  • The Ganglia report showed all nodes running red
Write a Comment
User Comments (0)
About PowerShow.com