HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems - PowerPoint PPT Presentation

About This Presentation
Title:

HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems

Description:

Title: HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance Author: James Cowling Last modified by: James Cowling Created Date – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 77
Provided by: JamesC210
Category:

less

Transcript and Presenter's Notes

Title: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems


1
HQ ReplicationEfficient Quorum Agreement
forReliable Distributed Systems
  • James Cowling1, Daniel Myers1, Barbara Liskov1
  • Rodrigo Rodrigues2, Liuba Shrira3
  • 1MIT CSAIL
  • 2INESC-ID and Instituto Superior Técnico
  • 3Brandeis University

2
Byzantine Fault Tolerance
  • Reliable client-server distributed systems
  • Server replicated across group of replica
    machines
  • General operations
  • Bounded number f of Byzantine replicas
  • Must ensure correct system state
  • Consistent ordering of client operations

3
State of the Art
  • Approaches
  • State Machine Replication BFT
  • 3f1 replicas
  • Byzantine Quorums Q/U
  • 5f1 replicas
  • Increased performance
  • Degradation when writes contend

4
Contributions
  • Low overhead Byzantine Fault Tolerance
  • Performance of Byzantine Quorums without 5f1
    replicas or contention degradation
  • Hybrid Quorum scheme for Byzantine Fault
    Tolerance
  • Quorum approach in normal-case
  • Use Byzantine agreement to resolve write
    contention

5
Outline
  • Current Approaches
  • HQ Replication
  • BFT Improvements
  • Performance Evaluation
  • Conclusions

6
State Machine Replication
  • BFT - Castro and Liskov TOCS 02
  • Operations ordered by primary
  • Agreed upon by replicas

7
Byzantine Quorums
  • Q/U - Abd-El-Malek et al. SOSP 05
  • Client controlled protocol
  • Replicas order operations independently
  • Optimistic
  • Best case one-phase protocol
  • Worst case unbounded
  • Randomized backoff

8
Advantages/Disadvantages
  • BFT
  • Good
  • 3f1 replicas
  • Bounded number of phases
  • Bad
  • Higher latency
  • Quadratic communication
  • Q/U
  • Good
  • Best-case performance
  • One-phase write
  • Low replica load
  • Bad
  • 5f1 replicas
  • Degraded performance when writes contend

9
Outline
  • Current Approaches
  • HQ Replication
  • Normal-case Protocol
  • Contention Resolution
  • BFT Improvements
  • Performance Evaluation
  • Conclusions

10
HQ Replication
  • 3f1 replicas
  • Supports general operations
  • No all-to-all communication in normal-case
  • BFT used to resolve contention

11
HQ Replication
  • One-phase read
  • Two-phase write

12
System Architecture (remove this?)
13
High-level Write Protocol
  • Two-phase write protocol
  • Phase 1
  • Client obtains timestamp grant from each replica
  • Phase 2
  • Client forms certificate from 2f1 matching
    grants
  • Sends to replicas to complete write

14
Grants
  • Promise to execute operation at given sequence
    number
  • Assuming agreement from quorum
  • Grant
  • Client ID
  • Object ID
  • Hash over requested operation
  • Sequence Number (timestamp)
  • Replica signature

15
Certificates
  • Certificate
  • Quorum (2f1) matching grants
  • Proves quorum of replicas agree to ordering of
    operation
  • Uniquely identify client, operation and
    sequential ordering
  • Existence of certificate precludes existence of
    conflicting certificate

16
Replica State
  • Multiple independent objects
  • State per-object
  • Certificate supporting most recent write
  • Operation status
  • Active
  • Write in progress, outstanding grant
  • Quiescent
  • No current write operation

17
Write Phase 1
  • Client sends write request to replicas
  • If quiescent, replica assigns new grant to client
  • If active, replica sends currently outstanding
    grant
  • Several Possibilities
  • All grants match
  • Grants for different client
  • Grants conflict

18
Isolated Write
19
Isolated Write
20
Isolated Write
Write A
Write A
Write A
21
Isolated Write
Write A
Write A
Write A
22
Isolated Write
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
23
Isolated Write
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
Matching grants Phase 2 write
24
Isolated Write
Cert G1,G2,G3
Cert G1,G2,G3
Cert G1,G2,G3
Matching grants Phase 2 write
25
Isolated Write
execute A
Cert G1,G2,G3
Cert G1,G2,G3
execute A
Cert G1,G2,G3
execute A
26
Isolated Write
Result A
Result A
Result A
27
Isolated Write
Result A
result
Result A
Result A
Write Complete
28
Incomplete Write
29
Incomplete Write
30
Incomplete Write
Write A
Write A
Write A
31
Incomplete Write
Write A
Write A
Write A
32
Incomplete Write
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
33
Incomplete Write
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
Client 1 slow or failed
34
Incomplete Write
Write B
Write B
Write B
35
Incomplete Write
Grantlt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
Replicas active Return current grant
36
Incomplete Write
Grantlt1,1,Agt1
Grant lt1,1,Agt2
Grant lt1,1,Agt3
Grants for different client Perform Writeback
37
Incomplete Write
Cert G1,G2,G3, Write B
Cert G1,G2,G3, Write B
Cert G1,G2,G3, Write B
Grants for different client Perform Writeback
38
Incomplete Write
execute A
Cert G1,G2,G3, Write B
execute A
Cert G1,G2,G3, Write B
Cert G1,G2,G3, Write B
execute A
39
Incomplete Write
Cert G1,G2,G3, Write B
Cert G1,G2,G3, Write B
Cert G1,G2,G3, Write B
40
Incomplete Write
Grantlt2,2,Bgt1
Grant lt2,2,Bgt2
Grant lt2,2,Bgt3
41
Incomplete Write
Grantlt2,2,Bgt1
Grant lt2,2,Bgt2
Grant lt2,2,Bgt3
Matching grants Phase 2 write
42
Write Contention
43
Write Contention
Write A
44
Write Contention
Write A
Write A
45
Write Contention
Write A
Write A
Write A
Write B
46
Write Contention
Write A
Write A
Write A
Write B
47
Write Contention
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt2,1,Bgt3
48
Write Contention
Grant lt1,1,Agt1
Grant lt1,1,Agt2
Grant lt2,1,Bgt3
Conflicting grants Request resolution
49
Write Contention
Resolve Request
Cert G1,G2,G3
Cert G1,G2,G3
Cert G1,G2,G3
Conflicting grants Request resolution
50
Write Contention
Resolve Request
Cert G1,G2,G3
Cert G1,G2,G3
Cert G1,G2,G3
51
Write Contention
Resolve Request
execute A
Cert G1,G2,G3
Cert G1,G2,G3
execute A
Cert G1,G2,G3
execute A
52
Write Contention
Resolve Request
execute B
Cert G1,G2,G3
Cert G1,G2,G3
execute B
Cert G1,G2,G3
execute B
53
Write Contention
Result A
Result A
Result A
54
Write Contention
Result A
result
Result A
Result A
55
Write Contention
Result B
Result B
Result B
56
Write Contention
Result B
Result B
result
Result B
57
Contention Resolution
  • BFT module used to resolve contention
  • Establish sequential order on contending ops
  • On receiving resolve request
  • Freeze local object state
  • Send state to primary
  • Primary runs BFT on combined state
  • Replicas execute contending operations

58
Read Protocol
  • Client sends read request to replicas
  • Replica returns current object state
  • Supported by previous write certificate
  • Read complete if quorum of matching responses
  • Writeback used to retry if responses inconsistent

59
Additional Details
  • Read protocol
  • State transfer
  • Multi-object transactions
  • Performance enhancements

60
Performance Enhancements
  • Preferred quorums
  • Core protocol run by only 2f1 replicas
  • Symmetric-key cryptography
  • Authenticators instead of signatures
  • Collection of 3f1 MACs
  • ltmi,1,mi,2,,mi,ngt
  • Lower CPU overhead

61
BFT Improvements
  • Preferred quorums
  • Reduces degree of quadratic communication
  • Single MAC per message
  • Significant improvements over authenticators

62
Outline
  • Current Approaches
  • HQ Replication
  • BFT Improvements
  • Performance Evaluation
  • Analysis
  • Experiments
  • Conclusions

63
Non-Contention Message Overhead
Messages sent/received at each replica per write
request
64
Non-Contention Bandwidth Use
Total bandwidth at each replica per write request
65
Experimental Setup
  • HQ and BFT prototypes deployed on Emulab
  • Up to 16 replicas (f5), 200 clients (4 per
    machine)
  • New BFT codebase
  • Implement counter service
  • Negligible operation payload
  • Multiple objects
  • Private non-contention objects
  • Shared contention object

66
Non-contention Throughput
Maximum operation throughput
67
Resilience to Contention
Throughput degradation with increasing
write-contention
68
Resilience to Contention
new
Throughput degradation with increasing
write-contention
69
BFT Batching
  • BFT allows batching at primary
  • Greatly reduces internal protocol communication
  • Increased delay

Request
Pre-Prepare
Prepare
Commit
Reply
Client
once per batch
Primary
Replica 1
Replica 2
Replica 3
70
Batched Performance
Effect of BFT batching on maximum write throughput
71
Recommendations
  • Use Q/U when
  • Latency critical
  • Contention low
  • 5f1 replicas acceptable
  • Use HQ when
  • Low latency important
  • Moderate contention
  • Use BFT when
  • Contention high
  • Throughput more important than latency

72
Conclusions
  • First Byzantine Quorum protocol with 3f1
    replicas
  • Supports general operations
  • Resilient to Byzantine clients
  • Introduced Hybrid technique
  • Resolve contention without performance
    degradation
  • Applicable to general quorum systems
  • Found optimized BFT to perform well under high
    load

73
  • Questions?

74
Further Details
  • HQ Replication Properties and optimizations
  • James Cowling, Daniel Myers, Barbara Liskov,
    Rodrigo Rodrigues and Liuba Shrira. Technical
    Memo In Prep., MIT Computer Science and
    Artificial Laboratory, Cambridge, Massachusetts,
    2006.
  • Contact
  • cowling_at_csail.mit.edu
  • http//people.csail.mit.edu/cowling/

75
Write-back Operation
  • Write certificate paired with a subsequent
    request
  • Used to ensure progress with slow replicas or
    clients
  • Completes phase 2 for a slow client
  • Advances state of slow replicas
  • Replica processes write phase 2 based on
    certificate, then the paired request

76
(No Transcript)
77
Backups
78
Slow Replicas
  • Some grants in quorum have old timestamp
  • Perform writeback to slow replicas, using
    certificate provided with highest grant
  • Brings replicas up to date and solicits new grants

79
Why 3f1?
  • 3f1 replicas
  • f of which can be faulty
  • 2f1 agree on any ordering
  • f of these may be Byzantine
  • The remaining f may be slow
  • Maximum of 2f can respond with old system state,
    but not 2f1

80
  • Wont HQ have a higher rate of contention since
    its two phase (higher latency) than Q/U?
  • No contention window only between first replica
    receives phase 1 request to last replica receives
    it. Hence independent of two-phase, and actually
    smaller than in Q/U
Write a Comment
User Comments (0)
About PowerShow.com