22. Parallel, Distributed Access - PowerPoint PPT Presentation

About This Presentation
Title:

22. Parallel, Distributed Access

Description:

Hetero- & homo-geneous, client- & collaborating server, horizontal ... Distributive sort: Range partition the data, then sort each range. Is it scalable? ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 45
Provided by: RaghuRamak
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: 22. Parallel, Distributed Access


1
22. Parallel, Distributed Access
  • Parallel vs Distributed DBMSs
  • Parallel DBMSs
  • Measuring success scalability
  • Hardware Architectures
  • Types of parallelism, partitioning
  • Parallelizing operators scan, select, sort,
    aggregate, join
  • Distributed DBMSs
  • Classifying distributed DBMSs
  • Hetero- homo-geneous, client- collaborating
    server, horizontal vertical fragmentation
  • Distributed catalog management
  • Distributed query processing
  • General queries, joins, optimization
  • Replication
  • Synchronous voting, read-any write-all
  • Asynchronous Peer to Peer, Primary Site (Capture
    and Apply)
  • Distributed Locking, Deadlock Detection,
    Transactions

2
Parallel vs. Distributed DBMSs
  • DBMS may be spread among the nodes of a network
  • Parallel DBMS
  • Nodes are under a central control
  • Typically nodes are in one location
  • Motivated by performance
  • Distributed DBMS
  • Nodes are autonomous, may run different DBMSs
  • Nodes are physically distributed
  • Motivated by need for remote access to data in
    spite of network failures

3
Motivation for Parallel DBMSs What if one CPU is
not enough?
  • For a large web application, one CPU is never
    enough
  • A large web application may need to scan a 1TB
    file or service 100s of customers at one time.
  • At 50MB/second, scanning a 1TB file takes 5 hours
  • Using 1000 CPUs scanning 1000 disks, the scan
    will take 20 seconds
  • Similarly, 100s of customers accessing a database
    can result in response times in the minutes
  • Solution Parallel DBMSs

4
DBMS The Success Story
  • DBMSs, and Information Retrieval, are the most
    (only?) successful application of parallelism.
  • Every major DBMS vendor has some server
  • Large DBMS backends, for web applications, are
    all parallelized.
  • Large IR applications are parallelized
  • Reasons for DBMS success
  • Relational Algebra has few operators
  • Just parallelize each one and compose them
  • Bulk-processing ( partition -ism).
  • Natural pipelining in operator trees.

5
How do we measure the success of parallel
algorithms and architectures?
  • Scalability
  • There are two kinds of scalability
  • Speedup When the number of nodes grows by a
    factor n, then so does the efficiency
  • Scaleup When the number of nodes and the size of
    the database both grow by n, then the efficiency
    does not change
  • Efficiency can be measured in terms of
  • Elapsed time, or average time per transaction
  • Number of transactions per second (throughput)

6
Architecture Issue Shared What?
Easy to program. Expensive to build. Difficult to
scale Beyond 30 procs
Hard to program. Cheap to build. Easy to scale
To 1000s of procs
Clusters
SUN, many others
NCR Teradata, IBM SP
7
How would you parallelize this plan?
  • Assume Sailors, Reservations are already sorted
    on sid

Sort-merge join
?sid
On-the-fly
?ranklt5
Sailors
Reservations
8
Different Types of DBMS -ism
  • Pipeline parallelism Inter-operator
  • each operator may run concurrently on a different
    node (may exploit pipelining or bushy plans)
  • Partition parallelism Intra-operator
  • multiple nodes work to compute a given operation
    (scan, sort, join)
  • Well focus on intra-operator -ism
  • Challenge How to partition the data!

9
Three Types of Data Partitioning
Range Hash Round Robin
A...E
F...J
F...J
T...Z
A...E
K...N
O...S
T...Z
F...J
K...N
O...S
T...Z
K...N
O...S
A...E
Good for equijoins, range queries group-by
Good for equijoins Group-by
Good to spread load
Shared disk and memory less sensitive to
partitioning, Shared nothing benefits more from
"good" partitioning
10
How to Parallelize the Scan Operator
  • Given a partition, what is the parallel version
    of the scan operator?
  • Scan each partition, then merge
  • What kind of partition will make the scan
    scalable?
  • Equal size

11
Parallelizing Selection
  • What is the parallel version of the selection
    operator?
  • Range partition, range selection
  • Hash partition, range selection
  • Hash partition, equality selection
  • Round Robin partition
  • Which of the above is scalable for speedup?
  • Range partition, range selection
  • Hash partition, range selection
  • Hash partition, equality selection
  • Round Robin partition

12
Parallel Sorting Two Algorithms
  • Distributive sort Range partition the data, then
    sort each range
  • Is it scalable? Only if partitions are equal
    sized
  • How to choose partitioning vector?
  • Parallel External Sort-Merge Sort the data at
    each node, then merge
  • Problem how to merge in parallel
  • One solution range partition the data
  • Is it scalable?

13
Parallel Sorting, ctd.
  • How to choose partitioning vector? Sampling
  • What if sampling is not possible?
  • For example, if input is another process, such as
    another operator in a query plan
  • In this case can use external sort-merge and set
    up a tree of merging steps
  • See US Patent 5,852,826

14
Parallel Aggregates
  • Problem Compute an aggregate function in
    parallel
  • For each aggregate function, need a
    decomposition
  • sum(S) sum( sum(s(i)) )
  • count(S) sum( count(s(i)) )
  • avg(S) (S sum(s(i))) / S count(s(i))
  • Is it scalable?
  • Sometimes its not so simple median( )
  • Similar for groups

15
Parallel Hash Join
Phase 1
  • Recall Phase 1 of Hash Join
  • Do Phase 1 in parallel. For R, then S
  • Perform partitioning at each node
  • Send hash bucket is output to node i, j to j,
    etc.
  • Phase 2 is simpler
  • Perform join of Ri and Si at each node i

16
Dataflow Network for Join A ? B
Node J
Node I
  • Good use of split/merge makes it easier to build
    parallel versions of sequential join code.

17
Complex Parallel Query Plans
  • Complex Queries Inter-Operator parallelism
  • Pipelining between operators
  • note that sort and phase 1 of hash-join block the
    pipeline!!
  • Bushy Trees

Sites 1-8
Sites 1-4
Sites 5-8
18
Distributed Databases
  • Data is stored at several sites, each managed by
    a DBMS that can run independently.
  • Distributed Data Independence Users should not
    have to know where data is located (extends
    Physical and Logical Data Independence
    principles).
  • Distributed Transaction ACIDITY Users should be
    able to write atomic and durable Xacts accessing
    multiple sites

19
Types of Distributed Databases
  • Homogeneous Every site runs same type of DBMS.
  • Heterogeneous Different sites run different
    DBMSs (different RDBMSs or even non-relational
    DBMSs).

Gateway
DBMS1
DBMS2
DBMS3
20
Network Architectures
  • Definitions
  • Client Requests a service
  • Server provides a service
  • Client-server architecture one of each
  • Fat client Applications run from the client
  • Thin client Applications run from the server
  • Business Applications use Thin Client
  • Fat Clients are too hard to mange
  • Personal Applications use Fat Client
  • Does anyone use Google Docs?
  • 3-tier architecture. Middle tier runs
    middleware.
  • Middlware includes business logic, DBMS
    coordination
  • DBMS back end may involve multiple collaborating
    servers

21
Availability
  • Definition of time can answer queries
  • Sometimes only local queries can be answered
  • What detracts from availability?
  • Node crashes
  • Network crashes
  • During a network (perhaps partial) crash, the
    DBMS at each node should continue to operate.

22
Storing Data Fragmentation
  • Horizontal Fragmentation
  • Usually disjoint
  • Example
  • Motivation More efficient to place data where
    queries are located, if possible
  • Vertical Fragmentation
  • Remember Normalization?
  • Not as common in DDBMSs as horizontal
    fragmentation

23
Storing Data - Replication
  • Disadvantages (redundancy!)
  • Wasted space
  • Possibly inconsistent data values
  • Advantages
  • Increased availability
  • Faster query evaluation
  • How do we keep track of all replicas of data?

SITE A
24
Distributed Catalog Management
  • Catalog contains schema, authorization,
    statistics, and location of each
    relation/replica/fragment
  • Problem where to store the catalog?
  • Example I want to query sales. Where do I find
    it? Location info is in the catalog, but where
    is the catalog?
  • Solutions
  • Store the entire catalog at every site
  • Catalog updates are too expensive
  • Store the entire catalog at a single master site
  • Single point of failure, performance bottleneck
  • Store catalog info for all replicas and fragments
    of a relation at the birthsite of each relation
  • How do I find the birthsite of a relation?

25
Distributed Catalog Management (ctd.)
  • Intergalactic standard solution
  • Name each relation with ltlocal-name, birth-sitegt
  • E.g., ltsales, Portland01gt
  • Keep catalog of a relation at its birthplace
  • How many network I/Os are required to update a
    catalog?
  • If a data items location is changed, does the
    birth-site change? How many network I/Os are
    required?

26
Distributed Queries
SELECT AVG(S.age) FROM Sailors S WHERE S.rating gt
3 AND S.rating lt 7
  • Horizontally Fragmented Tuples with rating lt 5
    at Shanghai, gt 5 at Tokyo.
  • Must compute SUM(age), COUNT(age) at both sites.
  • If WHERE contained just S.ratinggt6, just one
    site.
  • Vertically Fragmented sid and rating at
    Shanghai, sname and age at Tokyo, tid at both.
  • How to evaluate the query?
  • Replicated Sailors copies at both sites.
  • Choice of site based on local costs, network
    costs.

27
Distributed Joins
LONDON
PARIS
Sailors
Reserves
500 pages
1000 pages
SELECT FROM Sailors S, Reserves R WHERE S.sid
Reserves.sid and rank5
  • Fetch as Needed, Page NL, Sailors as outer
  • Cost 500 D 500 1000 (DS)/10
  • /10 because of rank5 condition
  • D is cost to read/write page S is cost to ship
    page.
  • S is very large, so cost is essentially 50,000S
  • Ship to One Site Ship Reserves to London.
  • Cost 1000 S 4500 D (SM Join cost
    3(5001000))
  • Essentially 1000S

28
Semijoin Technique
  • At London, project ?rank5Sailors onto join
    column sid and ship this to Paris. Cost 5S
  • 10 for rank5, 10 for project onto join column
  • At Paris, join Sailors projection with Reserves.
  • Result is called reduction of Reserves wrt
    Sailors.
  • Ship reduction of Reserves to London. Cost 100S
  • At London, join Sailors with reduction of
    Reserves.
  • Total cost 105S
  • Idea Tradeoff the cost of computing and
    shipping projection for cost of shipping full
    Reserves relation.

29
Bloomjoin
  • At London
  • Create a 64K bit vector V (one 8K page)
  • Select h a hash function from sids to 0,64K-1
  • If theres a sailor with rating 5 set V(h(her
    sid))true
  • Ship V to Paris. Cost 1S
  • At Paris
  • If a reservation has V(h(its sid))true, ship it
    to London
  • Cost 100S
  • Perhaps a bit more if hash function is not
    effective, but 64K is a pretty big bit vector
  • At London
  • Join Sailors with reduced Reservations.
  • Total cost 101S.

30
Distributed Query Optimization
  • Cost-based approach consider plans, pick
    cheapest similar to centralized optimization.
  • Difference 1 Communication costs must be
    considered.
  • Difference 2 Local site autonomy must be
    respected.
  • Difference 3 New distributed join methods.
  • Query site constructs global plan, with suggested
    local plans describing processing at each site.
  • If a site can improve suggested local plan, free
    to do so.

31
Updating Distributed Data
  • Synchronous Replication All copies of a modified
    relation (fragment) must be updated before the
    modifying Xact commits.
  • Data distribution is made transparent to users.
  • Asynchronous Replication Copies of a modified
    relation are only periodically updated different
    copies may get out of synch in the meantime.
  • Users must be aware of data distribution.

32
Synchronous Replication
  • Read-any Write-all Writes are slower and reads
    are faster, relative to Voting.
  • Most common approach to synchronous replication.
  • Voting Xact must write a majority of copies to
    modify an object must read enough copies to be
    sure of seeing at least one most recent copy.
  • E.g., 10 copies 7 written for update 4 copies
    read.
  • Each copy has version number.
  • Not attractive usually because reads are common.
  • Choice of technique determines which locks to set.

33
Cost of Synchronous Replication
  • Before an update Xact can commit, it must obtain
    locks on all modified copies.
  • Sends lock requests to remote sites, and while
    waiting for the response, holds on to other
    locks!
  • If sites or links fail, Xact cannot commit until
    they are back up.
  • Even if there is no failure, committing must
    follow an expensive commit protocol with many
    msgs.
  • So the alternative of asynchronous replication is
    becoming widely used.

34
Asynchronous Replication
  • Allows modifying Xact to commit before all copies
    have been changed (and readers nonetheless look
    at just one copy).
  • Users must be aware of which copy they are
    reading, and that copies may be out-of-sync for
    short periods of time.
  • Two approaches Primary Site and Peer-to-Peer
    replication.
  • Difference lies in how many copies are
    updatable or master copies.

35
Peer-to-Peer Replication
  • More than one of the copies of an object can be a
    master in this approach.
  • Changes to a master copy must be propagated to
    other copies somehow.
  • If two master copies are changed in a conflicting
    manner, this must be resolved. (e.g., Site 1
    Joes age changed to 35 Site 2 to 36)
  • Best used when conflicts do not arise
  • E.g., Each master site owns a disjoint fragment.

36
Primary Site Replication
  • Exactly one copy of a relation is designated the
    primary or master copy. Replicas at other sites
    cannot be directly updated.
  • The primary copy is published.
  • Other sites subscribe to (fragments of) this
    relation these are secondary copies.
  • Main issue How are changes to the primary copy
    propagated to the secondary copies?
  • Done in two steps. First, capture changes made
    by committed Xacts then apply these changes.

37
Implementing the Capture Step
  • Log-Based Capture The log (kept for recovery) is
    used to generate a Change Data Table (CDT).
  • If this is done when the log tail is written to
    disk, must somehow remove changes due to
    subsequently aborted Xacts.
  • Procedural Capture A procedure that is
    automatically invoked does the capture
    typically, just takes a snapshot.
  • Log-Based Capture is better (cheaper, faster) but
    relies on proprietary log details.

38
Implementing the Apply Step
  • The Apply process at the secondary site
    periodically obtains (a snapshot or) changes to
    the CDT table from the primary site, and updates
    the copy.
  • Period can be timer-based or user/application
    defined.
  • Replica can be a view over the modified relation!
  • If so, the replication consists of incrementally
    updating the materialized view as the relation
    changes.
  • Log-Based Capture plus continuous Apply minimizes
    delay in propagating changes.
  • Procedural Capture plus application-driven Apply
    is the most flexible way to process changes.

39
Distributed Locking
  • How do we manage locks for objects across many
    sites?
  • Centralized One site does all locking.
  • Vulnerable to single site failure.
  • Primary Copy All locking for an object done at
    the primary copy site for this object.
  • Reading requires access to locking site as well
    as site where the object (copy/fragment) is
    stored.
  • Fully Distributed Locking for a copy is done at
    the site where the copy/fragment is stored.
  • Locks at all or many sites while writing an
    object.

40
Distributed Deadlock Detection
  • Each site maintains a local waits-for graph.
  • A global deadlock might exist even if the local
    graphs contain no cycles

T1
T1
T1
T2
T2
T2
SITE A
SITE B
GLOBAL
  • Three solutions Centralized (send all local
    graphs to one site) Hierarchical (organize sites
    into a hierarchy and send local graphs to parent
    in the hierarchy) Timeout (abort Xact if it
    waits too long).

41
Distributed Transactions
  • Problem Atomicity and Durability in DDBMSs
  • Example
  • A100 at Paris, B-100 at London
  • Each is called a subtransaction.
  • The query originates at a third site called the
    coordinator.
  • Who keeps the log? Each site.
  • How does each subtransaction know when to commit?

42
2-Phase Commit (2PC) Protocol
  • Coordinator sends prepare message to each
    subtransaction
  • Each subtransaction responds with yes or no
  • Yes means it has done everything but write the
    end record
  • Coordinator sends commit or abort to all
    subtransactions
  • Subtransactions send ack messages to coordinator

43
2PC Issues
  • What if
  • only one subtransaction says no?
  • All subtransactions say yes, then one cannot
    commit?
  • All subtransactions say yes, then one crashes?

44
Learning Objectives
  • Definitions parallel and distributed DBMSs,
    scalability, 3 parallel architectures, pipeline
    and partition parallelism, 3 types of data
    partitioning and their uses
  • How to parallelize operators and determine their
    scalability scan, select, sort, aggregate, hash
    join
  • Definition of distributed data independence and
    acidity, network architectures, availability,
    fragmentation, replication
  • How is replication managed?
  • How are distributed queries processed?
  • How is distributed data updated?
  • How are distributed locks and deadlocks managed?
  • How are distributed transactions managed?
Write a Comment
User Comments (0)
About PowerShow.com