CS556: Distributed Systems - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

CS556: Distributed Systems

Description:

Dual hosting for the storage devices. SCSI, NAS, SAN ... A particular disk or filesystem can only be accessed by one server at a time. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 36
Provided by: mar177
Category:

less

Transcript and Presenter's Notes

Title: CS556: Distributed Systems


1
CS-556 Distributed Systems
Fault Tolerance (II)
  • Manolis Marazakis
  • maraz_at_csd.uoc.gr

2
Dependability Basic Concepts
  • Availability
  • Reliability
  • Safety
  • Maintainability

Fault ? Error ? Failure
  • Faults
  • -Transient
  • Intermittent
  • Permanent

3
A 2-node cluster

4
Shared-disks vs Shared-nothing
  • Shared-disks
  • Dual hosting for the storage devices
  • SCSI, NAS, SAN
  • Access is arbitrated by external software that
    runs on both servers
  • Shared-nothing
  • Replication schemes
  • Requires more effort to recover a server
  • More suitable for WAN
  • Requires a functional network and a functional
    host on the other side to ensure that the writes
    actually succeed
  • Danger of inconsistency after a failover

5
Failover Management Software
  • Key components of the system must be monitored
  • H/W is generally the easiest part to monitor
  • Relatively easy tests
  • Relatively few different varieties of H/W
    components
  • How to monitor the health of an application ?
  • Examine systems process table
  • No guarantee that the app. is running properly!
  • Query the application itself
  • checking for accurate timely responses
  • For some apps, query is easy (eg DBMS)
  • Make sure the check is end-to-end
  • E.g DBMS s/w disk network
  • For others, this is hard!
  • Web server ? web page access
  • File server ? file access
  • Custom s/w ? ??

6
Active-Passive Configuration (I)
  • Both servers are connected to a set of
    dual-hosted disks.
  • These disks are divided between 2 separate
    controllers disk arrays
  • The data is mirrored from one controller to the
    other.
  • A particular disk or filesystem can only be
    accessed by one server at a time.
  • Ownership conflicts are arbitrated by the
    clustering software.
  • Both servers are connected to the same public
    network, and share a single IP address
  • which is migrated by the FMS from one server to
    the other as part of the failover.

7
Active-Passive Configuration (II)


8
Active-Passive Configuration (III)
  • Cost
  • 2 hosts are reserved to perform the work of one.
  • One host sits largely idle most of the time,
    consuming electricity, administrative effort,
    data center space, cooling, and other limited and
    expensive resources.
  • However, active-passive configurations are going
    to be the most highly available ones over time.
  • Since there are no unnecessary processes running
    on the second host, there are fewer opportunities
    for an error to cause the system to fail.

9
Active-Active Configuration (I)
  • Each host acts as the standby for its partner in
    the cluster, while still delivering its own
    critical services.
  • When one server fails, its partner takes over for
    it begins to deliver both sets of critical
    services
  • until the failed server can be repaired
    returned to service.
  • The servers must be truly independent of each
    other

10
Active-Active Configuration (II)


11
Service Group Failover (I)
Capability for multiple service groups that ran
together on one server to failover to separate
machines when that first server fails

12
Service Group Failover (II)
  • Service Group a set containing one or more IP
    addresses, one or more disks or volumes, and one
    or more critical processes
  • A service group is the unit that fails from one
    server to another within a cluster.
  • For service groups to maintain their relevance
    value, they must be totally independent of each
    other.
  • If because of external requirements, two service
    groups must failover together, then they are, in
    reality, a single group.

13
N-to-1 clusters (I)
A single standby node for the whole cluster -
This node can see all disks.
After recovery of a failed node, we must fail its
services back to it, freeing up the one node to
takeover for another set of service groups.
4-to-1 SCSI cluster
14
N-to-1 clusters (II)

The hosts are all identically attached to the
storage.
SAN-based 6-to-1 cluster
15
N-plus-1 clusters
1 dedicated stand-by node

After recovery, no failover is needed from
standby to recovered node
- Over time, the layout of hosts services will
not match the original layout within the
cluster. - As long as all of the cluster members
have similar performance capabilities, and they
can see all of the required disks, it does not
actually matter which host actually runs the
service.
As clusters begin to grow, its possible that a
single standby node will not be adequate
SAN-based 6-to-1 cluster
16
Failure Models
17
Failure detectors
  • Not necessarily reliable !
  • P is here message, every T sec, assuming a max.
    message transmission delay D
  • Categorization of processes (hints)
  • suspected vs unsuspected
  • A process may be functioning correctly on the
    other side of a partitioned network
  • or it could be slow to respond to probes
  • Reliable detection
  • unsuspected vs failed (crashed)
  • Feasible only in synchronous systems
  • It is possible to give different responses to
    different processes
  • different comm. conditions

18
Failure Masking by Redundancy (I)
  • Hide the occurrence of failures from other
    processes, by redundancy
  • Information
  • Extra bits to allow recovery
  • Time
  • Transactions to allow abort/redo
  • Particularly suited for transient or intermittent
    faults
  • Physical
  • Extra equipment to tolerate loss/malfunction of
    some components
  • or redundant s/w processes
  • Voter circuitry
  • Voters are components too ? They may themselves
    fail !

19
Failure Masking by Redundancy (II)
  • Triple modular redundancy (TMR)

20
Flat vs Hierarchical Groups (I)
Process resilience by replicating processes into
groups
Group membership protocols
21
Flat vs Hierarchical Groups (II)
  • Flat groups
  • Symmetrical (no special roles)
  • No single point of failure
  • Complex operation protocols (eg voting)
  • Hierarchical groups
  • Coordinator is a single point of failure
  • Group membership
  • group server
  • distributed management
  • Eg reliable multicast
  • Detection of failed processes?
  • Join/leave must be synchronous
  • with data messages!
  • How to rebuild a group after a major
  • failure?

22
Failure Masking Replication
  • Having a group of identical processes allows us
    to mask gt1 faulty process(es)
  • Primary-backup protocols
  • Hierarchical organization
  • Election among backups to select a new primary
  • Replicated-write protocols
  • Flat process groups
  • Active replication
  • Quorum protocols
  • K-fault tolerant system
  • Fail-silent processes ? group size (k 1)
  • Byzantine failures ? group size gt (2k 1)
  • Assuming that processes do not team up !!
  • (independent failures)

23
Coordination/Agreement
  • A set of process must collaborate
  • or agree with one or more processes
  • without a fixed master/slave relationships
  • failure assumptions failure detectors
  • Problems
  • mutual exclusion
  • election
  • multicast
  • reliability ordering semantics
  • consensus
  • Byzantine agreement

24
Problems of Agreement
  • A set of processes need to agree on a value
    (decision), after one or more processes have
    proposed what that value (decision) should be
  • Examples
  • mutual exclusion, election, transactions
  • Processes may be correct, crashed, or they may
    exhibit arbitrary (Byzantine) failures
  • Messages are exchanged on an one-to-one basis,
    and they are not signed

25
Two Agreement Problems
  • Consensus problem every process i proposes a
    value vi, while in the undecided state. Process i
    exchanges messages until it makes decision di and
    moves to decided state.
  • Termination all correct processes must make a
    decision
  • Agreement same decision for all correct
    processes
  • Integrity if all correct processes proposed same
    value, any correct process decides that value
  • Byzantine generals problem a commander
    process i orders value v.
  • The lieutenant processes must agree on what the
    commander ordered.
  • Processes may be faulty
  • provide wrong or contradictory messages
  • Integrity requirement
  • A distinguished process decides a value for
    others to agree upon
  • Solution only exists if N gt 3f, where f faulty
    processes

26
Consensus for 3 processes
27
The Two-Army Problem
  • How can two perfect processes reach agreement
    about 1 bit of information ?
  • over an unreliable comm. channel
  • Red army 5000 troops
  • Blue army 1, 2 3000 troops each
  • How can the blue armies reach agreement on when
    to attack ?
  • Their only means of communication is by sending
    messengers
  • that may be captured by the enemy !
  • No solution!
  • Proof by contradiction Assume there is a
    solution with a minimum messages

28
Consensus No Failures Case
majority(v1, , vN) returns most frequently
occurring value - returns if no majority
exists
Consensus via reliable multicast
For ordered values, min/max could be used instead
of majority
In general, if failures can occur it is not 100
certain that consensus can be reached in finite
time !
Terminating Reliable Multicast (TRB) A single
process multicasts a msg, and all
correct processes must agree on that msg -
Even if sender crashes, all correct processes
must deliver a special msg (Server-Fault)
29
Relation among problems
A problem B reduces to a problem A if there is an
algorithm which transforms any algorithm for A
into an algorithm for B.
Synchronous systems TRB is equivalent to
Consensus
Asynchronous systems Consensus reduces to
TRB but not vice versa!
Asynchronous systems with crash failures
Atomic Multicast is equivalent to Consensus
30
Consensus in synchronous systems
Duration of round max. delay of B-multicast
Up to f faulty processes
Dolev Strong, 1983 Any algorithm to reach
consensus despite up to f failures requires (f
1) rounds.
31
Byzantine agreement synchronous
Faulty process
Nothing can be done to improve a correct
process knowledge beyond the first stage -
It cannot tell which process is faulty.
3 says 1 says u
Lamport et al, 1982 No solution for N 3, f
1
Pease et al, 1982 No solution for Nlt 3f
(assuming private comm. channels)
32
Agreement in Faulty Systems (I)
  • The Byzantine generals problem for 3 loyal
    generals and 1 traitor
  • The generals announce their troop strengths
  • The vectors that each general assembles based on
    (a)
  • The vectors that each general receives in step 3.

Consensus by generals 1, 2, 4 ? (1, 2, UNKNOWN,
4))
33
Agreement in Faulty Systems (II)
No majority !
  • The same as in previous slide, except now with 2
    loyal generals and one traitor.

34
Byzantine agreement for N gt 3f
Example with N4, f1 - 1st round Commander
sends a value to each lieutenant - 2nd round
Each of the lieutenants sends the value it has
received to each of its peers.
- A lieutenant receives a total of (N 2) 1
values, of which (N 2) are correct. -
By majority(), the correct lieutenants compute
the same value.
In general, O(N(f1)) msgs
O(N2) for signed msgs
35
Impossibility of (deterministic) consensus in
asynchronous systems
M.J. Fischer, N. Lynch, and M. Paterson
Impossibility of distributed consensus with one
faulty process, J. ACM, 32(2), pp. 374-382,
1985.
A crashed process cannot be distinguished from a
slow one. - Not even with a 100 reliable
comm. network !
There is always a chance that some continuation
of the processes execution avoid consensus being
reached.
No guarantee for consensus, but Prob(consensus)
gt 0
Solutions based on randomization or
(unreliable) failure detectors or by fault
masking
Write a Comment
User Comments (0)
About PowerShow.com