Datacentric computing with Netezza Architecture - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Datacentric computing with Netezza Architecture

Description:

Does a poor job of taking advantage of human insight available in interactive models ... Found all uses of the AND macro, as well as many other (1300 ) identical ... – PowerPoint PPT presentation

Number of Views:449
Avg rating:5.0/5.0
Slides: 15
Provided by: www2Pitts
Category:

less

Transcript and Presenter's Notes

Title: Datacentric computing with Netezza Architecture


1
Data-centric computing with Netezza Architecture
  • DISC reading groupSeptember 24, 2007

2
High Level Points
  • Supercomputer use model today
  • Compile, submit, wait
  • Does a poor job of taking advantage of human
    insight available in interactive models
  • Large datasets can be interactively processed
    using Netezza

3
What is Netezza?
  • Essentially A big, fast SQL database

4
What is Netezza?
  • Frontend provides SQL interface
  • Backend is a large rack of specialized blades

5
Custom Backend Blades
  • Commodity CPU, NIC, disk
  • Custom FPGA replaces disk interface
  • Can do basic filtering in hardware, i.e., stream
    processing before data hits main memory

6
Division of Data
  • Database distributed across multiple (100) SPUs
  • Each SPU controls, manages its slice of DB
  • No info on data management, replciation, etc.

7
Division of Labor
  • SPU FPGA handles basic filtering tasks
  • SPU CPU handles record level processing
    filtering, parsing, projecting, logging, etc.
  • SPU CPU handles most operations on intermediate
    results sorts, joins, aggregates
  • Frontend CPU handles remaining operation
  • gtgtgt Processing close to disk

8
What can this be used for?
  • Paper gives 3 examples
  • Citation graph processing
  • Search for particular structure in electrical
    netlist
  • Word meaning disambiguation through search of
    ontology

9
Citation graph example
  • Look through large, sparse graph (16 million
    nodes, 388 million edges)
  • Find both strong (direct edge) and weak couplings
    (e.g., two papers cite the same work)
  • Essentially same code for workstation and Netezza
    no need to expose parallel architecture
  • Workstation DNF 80-100x speedup on smaller tests

10
IC netlist example
  • Flattened netlist of 3.5 million transistors, 10
    million wires
  • Search for AND structure

11
IC example results
  • Combinatorial explosion makes directly joining
    all possibilities for each element impossible
  • Can constrain better using fanouts of signals
    internal to the circuit
  • Individual SQL queries for finding possible
    matches for the individual transistors took under
    10 seconds
  • Found all uses of the AND macro, as well as many
    other (1300) identical structures generated
    through other means

12
Ontology example
  • Expand out all possible interpretations of a
    phrase
  • Ontology specifies lexical elements, IS-A
    relations, concepts, and constraints on concepts
  • Goal is to search the space, expand concepts to
    find all matches to given phrase

13
Ontology results
  • Partially unfolded ontology
  • Greatly expands database size, but reduces
    iterations / recursions
  • Recoded ontology triples as integers
  • 5.58 sec. vs. 262 sec.
  • can pipeline multiple queries

14
Issues
  • Works if you can reduce your problem to SQL
    queries
  • All of the problems were based on graph expansion
    / exploration how about other domains?
  • Issues of database partitioning? How does
    arbitrary slicing across 108 blades affect
    performance / scalability, esp. for non-sparse
    problems?
  • Strawman comparison to workstation class machine
    how does a traditional DB server / storage
    cluster compare?
Write a Comment
User Comments (0)
About PowerShow.com