The State of the Art in Distributed Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

The State of the Art in Distributed Query Processing

Description:

The State of the Art in Distributed Query Processing. by Donald Kossmann ... Client-server distributed DB models. Heterogeneous distributed DB models ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: CCCL
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: The State of the Art in Distributed Query Processing


1
The State of the Art in Distributed Query
Processing
  • by Donald Kossmann
  • Presented by Chris Gianfrancesco

2
Introduction
  • Distributed database technology is becoming an
    increasingly attractive enhancement to many
    database systems
  • Cost and scalability
  • Software integration
  • Legacy systems
  • New applications
  • Market forces

3
Introduction
  • Topics covered in this paper
  • Basics of distributed query processing
  • Client-server distributed DB models
  • Heterogeneous distributed DB models
  • Data placement techniques
  • Other distributed architectures

4
Client-Server Database Systems
  • Relationships between distributed nodes take a
    client-server form
  • Client makes requests of the servers, usually
    the source of queries
  • Server responds to client requests, usually the
    source of data
  • System architectures peer-to-peer, strict
    client-server, middleware/multitier

5
Architectures Peer-to-Peer
  • All nodes are equivalent
  • Each can be either a client or server on demand
    (can store data and/or make requests)
  • Ex SHORE system

6
Architectures Strict Client-Server
  • Client or server status is pre-defined and can
    never change
  • Clients supply queries, servers supply data
  • Most common architecture in commercial DBMSs

7
Architectures Middleware/Multitier
  • Multiple levels of client-server interaction
  • Nodes act as clients to those below them and
    servers to those above
  • SAP R/3, web servers with DB backends

8
Architectures Evaluation
  • Peer-to-Peer
  • Simplest setup
  • Equal load sharing
  • Strict Client-Server
  • Specialization
  • Administration for servers only
  • Middleware/Multitier
  • Functionality integration
  • Scalability

9
Client-Server Query Processing
  • Queries initiated at clients, data stored at
    servers
  • Where do we execute the query?
  • Query shipping move the query down to the data
  • Data shipping move the data up to the query
  • Hybrid shipping combination of both

10
Query Shipping
  • SQL query code is sent down to the server
  • Server parses and evaluates query, returns result
  • Used in DB2, Oracle, MS SQL Server

11
Data Shipping
  • Client parses query and requests data from server
  • Server provides data, then client executes query
  • Data can be cached at client (main memory or disk)

12
Hybrid Shipping
  • Mix-and-match data shipping and query shipping
  • Query parts can be executed at any level
    according to query plan
  • Data is cached when beneficial

13
Evaluation
  • Query Shipping
  • Reliant on server performance
  • Scales poorly with increasing client load
  • Data Shipping
  • Good scalability
  • High communication costs
  • Hybrid
  • Potential to outperform other options
  • More complex optimizations

14
Hybrid Shipping Observations
  • Some observations of optimal performance using
    hybrid shipping
  • Preference to not use a client cache
  • If network transfer cost lt client access cost
  • Shipping down cached data
  • If in main memory execution at server
  • Multiple small updates
  • Maintain at client and post to server only when
    necessary

15
Query Optimization
  • Query plans must also specify where the query
    pieces are executed
  • Data shipping all execution done at client
  • Query shipping all execution done at server
  • Hybrid choice can be made for each operator
  • Results display to user is always at client

16
Distributed Query Plans
  • Each operator is annotated with a logical site of
    execution plans are shareable
  • client means an operator is executed from the
    client where the query is issued
  • server means
  • for scan operators, execute at a location that
    has the necessary data
  • for updates, execute at all locations with the
    relevant data

17
Query Optimization Where?
  • Should optimization occur at the client or the
    server?
  • At client less load on servers, better
    scalability
  • At server more information about system
    statistics, especially server loads
  • Potential solution primary parsing and query
    rewriting at client, further optimization at
    server

18
Query Optimization Statistics
  • Even when optimization is done at a server, that
    server does not usually have full knowledge of
    the system
  • System can either
  • Guess the status of other servers less
    accuracy, less cost
  • Ask other servers their status fully accurate,
    additional communication costs

19
Query Optimization When?
  • Tradeoff of accuracy vs. cost
  • Traditional-style optimize once, store plan
  • No support for changing DB conditions
  • No incurred cost for query execution
  • Plan sets optimize for possible scenarios
  • Generate a few query plans for diff. conditions
  • Choose plans based on runtime statistics
  • On-the-fly observe intermediate results
  • Re-optimize query if different from expectations

20
Query Optimization Two-Step
  • Compile-time generate join order, etc.
  • Runtime perform site selection
  • Reasonable cost at each end
  • Responds well to changing server loads
  • Fully utilizes client data caching

21
Two-Step Optimization Downside
  • Optimal plan is generated traditional-style
  • Site selection is performed
  • True optimal plan was missed
  • Optimal was missed because first optimization
    step was done with no knowledge of the system

22
Query Execution Techniques
  • Standard fare row blocking, multithread when
    possible
  • Issues transactions with both updates and
    retrieval queries using hybrid shipping
  • We want to wait to propagate updates for
    efficiencys sake
  • Other option perform query before update and
    temporarily pad results

23
  • Questions?
  • Comments?
Write a Comment
User Comments (0)
About PowerShow.com