Title: Byzantine%20Techniques%20II
 1Byzantine Techniques II
- Justin W. Hart 
 - CS 614 
 - 12/01/2005
 
  2Papers
- BAR Fault Tolerance for Cooperative Services. 
Amitanand S. Aiyer, et. al. (SOSP 2005)  - Fault-scalable Byzantine Fault-Tolerant Services. 
Michael Abd-El-Malek et.al.  SOSP 2005 
  3BAR Fault Tolerance for Distributed Services
- BAR Model 
 - General Three-Level Architecture 
 - BAR-B
 
  4Motivation
- General approach to constructing cooperative 
services that span multiple administrative 
domains (MADs) 
  5Why is this difficult?
- Nodes are under control of multiple 
administrators  - Broken  Byzantine behaviors. 
 - Misconfigured, or configured with malicious 
intent.  - Selfish  Rational behaviors 
 - Alter the protocol to increase local utility
 
  6Other models?
- Byzantine Models  Account for Byzantine 
behavior, but do not handle rational behavior.  - Rational Models  Account for rational behavior, 
but may break with Byzantine behavior. 
  7BAR Model
- Byzantine 
 - Behaving arbitrarily or maliciously 
 - Altruistic 
 - Execute the proposed program, whether it benefits 
them or not  - Rational 
 - Deviate from the proposed program for purposes of 
local benefit 
  8BART  BAR Tolerant
- Its a cruel world 
 - At most (n-2)/3 nodes in the system are Byzantine 
 - The rest are rational
 
  9Two classes of protocols
- Incentive-Compatible Byzantine Fault Tolerant 
(IC-BFT)  - Guarantees a set of safety and liveliness 
properties  - It is in the best interest of rational nodes to 
follow the protocol exactly  - Byzantine Altruistic Rational Tolerant 
 - Guarantees a set of safety and liveliness 
properties despite the presence of rational nodes  - IC-BFT is a subset of BART
 
  10An important concept
- It isnt enough for a protocol to survive drills 
of a handful of attacks. It must provably 
provide its guarantees. 
  11A flavor of things to come
- Protocol builds on Practical Byzantine Fault 
Tolerance in order to combat Byzantine behavior  - Protocol uses game theoretical concepts in order 
to combat rational behavior 
  12A taste of Nash Equilibrium
Swerve Go Straight
Swerve 0, 0 -1,1
Go Straight 1,-1 X_X,X_X -100,-100 
 13and the nodes are starving!
- Nodes require access to a state machine in order 
to complete their objectives  - Protocol contains methods for punishing rational 
nodes, including denying them access to the state 
machine 
  14An expensive notion of identity
- Identity is established through cryptographic 
keys assigned through a trusted authority  - Prevents Sybil attacks 
 - Bounds the number of Byzantine nodes 
 - Gives rational nodes reason to consider long-term 
consequences of their actions  - Gives real world grounding to identity
 
  15Assumptions about rational nodes
- Receive long-term benefit from staying in the 
protocol  - Conservative when computing the impact of 
Byzantine nodes on their utility  - If the protocol provides a Nash equilibrium, 
then all rational nodes will follow it  - Rational nodes do not colludecolluding nodes 
are classified as Byzantine 
  16Byzantine nodes
- Byzantine fault model 
 - Strong adversary 
 - Adversary can coordinate collusion attacks
 
  17Important concepts
- Promptness principal 
 - Proof of Misbehavior (POM) 
 - Cost balancing
 
  18Promptness principal
- If a rational node gains no benefit from delaying 
a message, it will send it as soon as possible 
  19Proof of Misbehavior (POM)
- Self-contained, cryptographic proof of wrongdoing 
 - Provides accountability to nodes for their actions
 
  20Example of POM
- Node A requests that Node B store a chunk 
 - Node B replies that it has stored the chunk 
 - Later Node A requests that chunk back 
 - Node B sends back random garbage (it hadnt 
stored the chunk) and a signature  - Because Node A stored a hash of the chunk, it can 
demonstrate misbehavior on part of Node B 
  21but its a bit more complicated than that!
- This corresponds to a rather simple behavior to 
combat. Aggressively Byzantine behavior. 
  22Passive-aggressive behaviors
- Harder cases than aggressively Byzantine 
 - A malicious Node A could merely lie about 
misbehavior on the part of Node B  - A node could exploit non-determinism in order to 
shirk work 
  23Cost Balancing
- If two behaviors have the same cost, there is no 
reason to choose the wrong one 
  24Three-Level Architecture 
 25Level 1
- Unilaterally deny service to nodes that fail to 
deliver messages  - Tit-for-Tat 
 - Balance costs 
 - No incentive to make the wrong choice 
 - Penance 
 - Unilaterally impose extra work on nodes with 
untimely responses 
  26Level 2
- Failure to respond to a request by a state 
machine will generate a POM from a quorum of 
nodes in the state machine 
  27Level 3
- Makes use of reliable work assignment 
 - Needs only to provide sufficient information to 
identify valid request/response pairs 
  28Nuts and Bolts
  29Level 1
- Ensure long-term benefit to participants 
 - The RSM rotates the leadership role to 
participants.  - Participants want to stay in the system in order 
to control the RSM and complete their protocols  - Limit non-determinism 
 - Self interested nodes could hide behind 
non-determinism to shirk work  - Use Terminating Reliable Broadcast, rather than 
consensus.  - In TRB, only the sender can propose a value 
 - Other nodes can only adopt this value, or choose 
a default value 
  30Level 1
- Mitigate the effects of residual non-determinism 
 - Cost balancing 
 - The protocol preferred choice is no more 
expensive than any other  - Encouraging timeliness 
 - Nodes can inflict sanctions on untimely messages 
 - Enforce predictable communication patterns 
 - Nodes have to have participated at every step in 
order to have the opportunity to issue a command 
  31Terminating Reliable Broadcast 
 323f2 nodes, rather than 3f1
- Suppose a sender s is slow 
 - The same group of nodes now want to determine 
that s is slow  - A new leader is elected 
 - Every node but s wants a timely conclusion to 
this, in order to get their turn to propose a 
value to the state machine  - s is not allowed to participate in this quorum
 
  33TRB provides a few guarantees
- They differ during periods of synchrony and 
periods of asynchrony 
  34In synchrony
- Termination 
 - Every non-Byzantine process delivers exactly one 
message  - Agreement 
 - If on non-Byzantine process delivers a message m, 
then all non-Byzantine processes eventually 
deliver m 
  35In asynchrony
- Integrity 
 - If a non-Byzantine process delivers m, then the 
sender sent m  - Non-Triviality 
 - If the sender is non-Byzantine and sends m, then 
the sender eventually delivers m 
  36Message Queue
- Enforces predictable communication patterns 
 - Bubbles 
 - A simple retaliation policy 
 - Node As message queue is filled with messages 
that it intends to send to Node B  - This message queue is interleaved with bubbles. 
 - Bubbles contain predicates indicating messages 
expected from B  - No message except the expected predicate from B 
can fill the bubble  - No messages in As queue will go to B until B 
fills the bubble 
  37Balanced Messages
- Weve already discussed this quite a bit 
 - We assure this at this level of the protocol 
 - This is where we get our gigantic timeout message
 
  38Penance
- Untimely vector 
 - Tracks a nodes perception of the responsiveness 
of other nodes  - When a node becomes a sender, it includes its 
untimely vector with the message 
  39Penance
- All nodes but the sender receive penance messages 
from each node.  - Because of bubbles, each untimely node must sent 
a penance message back in order to continue using 
the system  - This provides a penalty to those nodes 
 - The sender is excluded from this process, because 
it may be motivated to lie in its penance vector, 
in order to avoid the work of transmitting 
penance messages 
  40Timeouts and Garbage Collection
- Set-turn timeout 
 - Timeout to take leadership away from the sender 
 - Initially 10 seconds in this implementation, in 
order to overcome all expected network delays  - Can only be changed by the sender 
 - Max_response_time 
 - Time at which a node is removed from the system, 
its messages discarded and its resources garbage 
collected  - Set to 1 week or 1 month in the prototypes
 
  41Global Punishment
- Badlists 
 - Transform local suspicion into POMs 
 - Suspicion is recorded in a local nodes badlist 
 - Sender includes its badlist with its message 
 - If, over time, recipients see a node in f  1 
different senders badlists, then they too, 
consider that node to be faulty 
  42Proof
- Real proofs do not appear in this paper, they 
appear in the technical report 
  43but heres a bit
- Theorem 1 The TRB protocol satisfies 
Termination, Agreement, Integrity and 
Non-Triviality 
  44and a bit more
- Theorem 2 No node has a unilateral incentive to 
deviate from the protocol  - Lemma 1 No rational node r benefits from 
delaying sending the set-turn message  - Follows from penance 
 - Lemma 2 No rational node r benefits from sending 
the set-turn message early  - Sending early could result in senderTO to be sent 
(this protocol uses synchronized clocks, and all 
messages are cryptographically signed) 
  45and the rest thats mentioned in the paper
- Lemma 3 No rational node r benefits from sending 
a malformed set-turn message.  - The set-turn message only contains the turn 
number. Because of this, doing so reduces to 
either sending early (dealt with in Lemma 1) or 
sending late (dealt with in Lemma 2) 
  46Level 2
- State machine replication is sufficient to 
support a backup service, but the overhead is 
unacceptable  - 100 participants 100 MB backed up 10 GB of 
drive space  - Assign work to individual nodes, using arithmetic 
codes to provide low-overhead fault-tolerant 
storage 
  47Guaranteed Response
- Direct communication is insufficient when nodes 
can behave rationally  - We introduce a witness that overhears the 
conversation  - This eliminates ambiguity 
 - Messages are routed through this intermediary
 
  48Guaranteed Response 
 49Guaranteed Response
- Node A sends a request to Node B through the 
witness  - The witness stores the request, and enters 
RequestReceived state  - Node B sends a response to Node A through the 
witness  - The witness stores the response, and enters 
ResponseReceived 
  50Guaranteed Response
- Deviation from this protocol will cause the 
witness to either notice the timeout from Node B 
or lying on the part of Node A 
  51Implementation
- The system must remain incentive-compatible 
 - Communication with the witness node is not in the 
form of actual message sending, it is in the form 
of a command to the RSM  - Theorem 3 If the witness node enters the 
request received state, for some work w to 
rational node b, then b will execute w  - Holds if sufficient sanctions exist to cause it 
to be motivated to do this 
  52State limiting
- State is limited by limiting the number of slots 
(nodes with which a node can communicate) 
available to a node  - Applies a limit to the memory overhead 
 - Limits the rate at which requests are inserted 
into the system  - Forces nodes to acknowledge responses to requests 
 - Nodes want their slots back
 
  53Optimization through Credible Threats 
 54Optimization through Credible Threats
- Returns to game theory 
 - Protocol is optimized so nodes can communicate 
directly. Add a fast path  - Nodes register vows with the witness 
 - If recipient does not respond, nodes proceed to 
the unoptimized case  - Analogous to a driver in chicken throwing their 
steering wheel out the window 
  55Periodic Work Protocol
- Witness checks that periodic tasks, such as 
system maintenance are performed  - It is expected that, with a certain frequency, 
each node in the system will perform such a task  - Failure to perform one will generate a POM from 
the witness 
  56Authoritative Time Service
- Maintains authoritative time 
 - Binds messages sent to that time 
 - Guaranteed response protocol relies on this for 
generating NoResponses 
  57Authoritative Time Service
- Each submission to the state machine contains the 
timestamp of the proposer  - Timestamp is taken to be the maximum of the 
median of timestamps of the previous f1 
decisions  - If no decision is decided, then the timestamp 
is the previous authoritative time 
  58Level 3 BAR-B
- BAR-B is a cooperative backup system 
 - Three operations 
 - Store 
 - Retrieve 
 - Audit
 
  59Storage
- Nodes break files up into chunks 
 - Chunks are encrypted 
 - Chunks are stored on remote nodes 
 - Remote nodes send signed receipts and store 
StoreInfos 
  60Retrieval
- A node storing a chunk can respond to a request 
for a chunk with  - The chunk 
 - A demonstration that the chunks lease has 
expired  - A more recent StoreInfo
 
  61Auditing
- Receipts constitute audit records 
 - Nodes will exchange receipts in order to verify 
compliance with storage quotas 
  62Arithmetic Coding
- Arithmetic coding is used to keep storage size 
reasonable  - 1 GB of storage requires 1.3 GB of overhead 
 - Keeping this ratio reasonable is crucial to 
motivate self-interested nodes to participate 
  63Request-Response pattern
  64Retrieve
- Originator sends a Receipt for the StoreInfo to 
be retrieved  - Storage node can send 
 - A RetrieveConfirm 
 - Containing the data and the receipt 
 - A RetrieveDeny 
 - Containing a receipt and a proof regarding why 
 - Anything else 
 - Generates a POM
 
  65Store
- Originator sends a StoreInfo to be stored 
 - Storage node can send 
 - A receipt 
 - A StoreReject 
 - Demonstrates that the node has reached its 
storage commitment  - Anything else 
 - Generates a POM
 
  66Audit
- Three phases 
 - Auditor requests both OwnList and StoreList from 
auditee  - Does this for random nodes in the system 
 - Lists are checked for inconsistencies 
 - Inconsistencies result in a POM
 
  67Time constraints
- Data is stored for 30 days 
 - After this, it is garbage collected 
 - Nodes must renew their leases on stored chunks, 
in order to keep them in the system, prior to 
this expiration 
  68Sanctions
- Periodic work protocol forces generation of POMs 
or special NoPOMs  - POMs and NoPOMs are balanced 
 - POMs evict nodes from the system
 
  69Recovery
- Nodes must be able to recover after failures 
 - Chained membership certificates are used in order 
to allow them to retrieve their old chunks  - Use of certificate later in the chain is regarded 
as a new node entering the system  - The old node is regarded as dead 
 - The new node is allowed to view the old nodes 
chunks 
  70Recovery
- This forces nodes to redistribute their chunks 
that were on that node  - Length of chains is limited, in order to prevent 
nodes from shirking work by using a certificate 
later in the chain 
  71Guarantees
- Data on BAR-B can be retrieved within the lease 
period  - No POM can be gathered against a node that does 
not deviate from the protocol  - No node can store more than its quota 
 - A time window is available to nodes with 
catastrophic failures for recovery 
  72Evaluation
- Performance is inferior to protocols that do note 
make these guarantees, but acceptable 
  73Impact of additional nodes 
 74Impact of rotating leadership 
 75Impact of fast path optimization 
 76Fault-Scalable Byzantine Fault-Tolerant Services
- Query/Update (Q/U) protocol 
 - Optimistic quorum based protocol 
 - Better throughput and fault-scalability than 
Replicated State Machines  - Introduces preferred quorum as an optimization on 
quorum protocols 
  77Motivation
- Compelling need for services and distributed data 
structures to be efficient and fault-tolerant  - In Byzantine fault-tolerant systems, performance 
drops off sharply as more faults are tolerated 
  78Fault Scalability
- A fault-scalable service is one in which 
performance degrades gracefully as more server 
faults are tolerated 
  79Operations-based interface
- Provides an interface similar to RSMs 
 - Exports interfaces comprised of deterministic 
methods  - Queries 
 - Do not modify data 
 - Updates 
 - Modify data 
 - Multi-object updates 
 - Allow a set of objects to be updated together
 
  80Properties
- Operates correctly under an asynchronous model 
 - Queries and updates are strictly serializable 
 - In benign execution, they are obstruction-free 
 - Cost is an increase in the number of required 
servers 5b  1 servers, rather than 3b  1 servers 
  81Optimism
- Servers store a version history of objects 
 - Updates are non-destructive to the objects 
 - Use of logical timestamps based on contents of 
update and object state upon which the update is 
conditioned 
  82Speedups
- Preferred quorum, rather than random quorum 
 - Addressed later 
 - Efficient cryptographic techniques 
 - Addressed later
 
  83Efficiency and Scalability 
 84Efficiency
- Most failure atomic protocols require at least a 
2 phase commit  - Prepare 
 - Commit 
 - The optimistic approach does not need a prepare 
phase  - This introduces the need for clients to repair 
inconsistent objects  - The optimistic approach also obviates the need 
for locking! 
  85Versioning Servers
- In order to allow for this, versioning servers 
are employed  - Each update creates a new version on the server 
 - Updates contain information about the version to 
be updated.  - If no update has been committed since that 
version, the update goes through unimpeded. 
  86Throughput-scalability
- Additional servers, beyond those necessary to 
provide the desired fault tolerance, can provide 
additional throughput 
  87Scaleup pitfall?
- Encourage the use of fine-grained objects, which 
reduce per-object contention  - If majority of accesses access individual 
objects, or few objects, then scaleup pitfall can 
be avoided  - In the example applications, this holds.
 
  88No need to partition
- Other systems achieve throughput-scalability by 
partitioning services  - This is unnecessary in this system
 
  89The Query/Update Protocol 
 90System model
- Asynchronous timing 
 - Clients and servers may be Byzantine faulty 
 - Clients and servers assumed to be computationally 
bounded, assuring effectiveness of cryptography  - Failure model is a hybrid failure model 
 - Benign 
 - Malevolent 
 - Faulty
 
  91System model
- Extends definition of fail prone system given 
by Malkhi and Reiter 
  92System model
- Point-to-point authenticated channels exist 
between all clients and servers  - Infrastructure deploying symmetric keys on all 
channels  - Channels are assumed unreliable 
 - but, of course, they can be made reliable
 
  93Overview
- Clients update objects by issuing requests 
stamped with object versions to version servers.  - Version servers evaluate these requests. 
 - If the request is over an out of date version, 
the clients version is corrected and the request 
reissued  - If an out of date server is required to reach a 
quorum, it retrieves an object history from a 
group of other servers  - If the version matches the server version, of 
course, it is executed  - Everything else is a variation upon this theme
 
  94Overview
- Queries are read only methods 
 - Updates modify an object 
 - Methods exported take arguments and return 
answers  - Clients perform operations by issuing requests to 
a quorum  - A server receives a request. If it accepts it it 
invokes a method  - Each update creates a new object version
 
  95Overview
- The object version is kept with its logical 
timestamp in a version history called the replica 
history  - Servers return replica histories in response to 
requests  - Clients store replica histories in their object 
history set, an array of replicas indexed by 
server 
  96Overview
- Timestamps in these histories are candidates for 
future operations  - Candidates are classified in order to determine 
which object version a method should be executed 
upon 
  97Overview
- In non-optimistic operation, a client may need to 
perform a repair  - Addressed later 
 - To perform an operation, a client first retrieves 
an object history set. The clients operation is 
conditioned on this set, which is transmitted 
with the operation. 
  98Overview
- The client sends this operation to a quorum of 
servers.  - To promote efficiency, the client sends the 
request to a preferred quorum  - Addressed later 
 - Single phase operation hinges on the availability 
of a preferred quorum, and on concurrency-free 
access. 
  99Overview
- Before executing a request, servers first 
validate its integrity.  - This is important, servers do not communicate 
object histories directly to each other, so the 
clients data must be validated.  - Servers use authenticators to do this, lists of 
HMACs that prevent malevolent nodes from 
fabricating replica histories.  - Servers cull replica histories from the 
conditioned on OHS that they cannot validate 
  100Overview  the last bit
- Servers validate that they do not have a higher 
timestamp in their local replica histories  - Failing this, the client repairs 
 - Passing this, the method is executed, and the new 
timestamp created  - Timestamps are crafted such that they always 
increase in value 
  101Preferred Quorums
- Traditional quorum systems use random quorums, 
but this means that servers frequently need to be 
synced  - This is to distribute the load 
 - Preferred quorums choose to access servers with 
the most up to date data, assuring that syncs 
happen less often 
  102Preferred Quorums
- If a preferred quorum cannot be met, clients 
probe for additional servers to add to the quorum  - Authenticators make it impossible to forge object 
histories for benign servers  - The new host syncs with b1 host servers, in 
order to validate that the data is correct  - In the prototype, probing selects servers such 
that the load is distributed using a method 
parameterized on object ID and server ID 
  103Concurrency and Repair
- Concurrent access to an object may fail 
 - Two operations 
 - Barrier 
 - Barrier candidates have no data associated with 
them, and so are safe to select during periods of 
contention  - Barrier advances the logical clock so as to 
prevent earlier timestamps from completing  - Copy 
 - Copies the latest object data past the barrier, 
so it can be acted upon 
  104Concurrency and Repair
- Clients may repeatedly barrier each other, to 
combat this, an exponential backoff strategy is 
enforced 
  105Classification and Constraints
- Based on partial observations of the global 
system state, an operation may be  - Complete 
 - Repairable 
 - Can be repaired using the copy and barrier 
strategy  - Incomplete
 
  106Multi-Object Updates
- In this case, servers lock their local copies, if 
they approve the OHS, the update goes through  - If not, a multi-object repair protocol goes 
through  - In this case, repair depends on the ability to 
establish all objects in the set  - Objects in the set are only repairable if all are 
repairable. If objects in the set that would be 
repairable are reclassified as incomplete. 
  107An example of all of this 
 108Implementation details 
 109Cached object history set
- Clients cache object history sets during 
execution, and execute updates without first 
querying.  - Failing the request based on an out of date OHS, 
the server returns an up-to-date OHS with the 
failure 
  110Optimistic query execution
- If a client has not accessed an object recently, 
it is still possible to complete in a single 
phase.  - Servers execute the update on the latest object 
that they store. Clients then evaluate the 
result normally. 
  111Inline repair
- Does not require a barrier and copy 
 - Repairs the candidate in-place, obviating the 
need for a round trip  - Only possible in cases where there is no 
contention 
  112Handling repeated requests
- Mechanisms may cause requests to be repeated 
 - In order to shortcut other checks, the timestamp 
is checked first 
  113Retry and backoff policies
- Update-update requires retry, and backoff to 
avoid livelock  - Update-query does not, the query can be updated 
in place 
  114Object syncing
- Only 1 server needs to send the entire object 
version state  - Others send hashes 
 - Syncing server then calculates hash and comparers 
against all others 
  115Other speedups
- Authenticators 
 - Authenticators use HMACs rather than digital 
signatures  - Compact timestamps 
 - Hashes are used rather than object histories in 
timestamps using a collision resistant hash  - Compact replica histories 
 - Replica histories are prune based on the 
conditioned-on timestamp after updates 
  116Malevolent components
- The astute among you must have noticed the 
possibility of DOS attacking by refusing 
exponential backoff  - Servers could rate-limit clients 
 - Clients could also issue updates to a subset of a 
quorum, forcing incomplete updates  - Lazy verification can be used to verify 
correctness of client operations in the 
background  - The amount of unverified work by a client can 
then be limited  
  117Correctness
- Operations are strictly serializable 
 - To understand, consider the conditioned-on chain. 
 - All operations chain back to the initial 
candidate, and a total order is imposed through 
on all established operations  - Operations occur atomically, including those 
spanning multiple objects  - If no operations span multiple objects, then 
correct operations that complete are also 
linearizable 
  118Tests
- Tests performed on a rack of 76 Intel Pentium 4 
2.8 GHz machines  - Implemented an increment method and an NFSv3 
metadata service 
  119Fault Scalability 
 120More fault-scalability 
 121Isolated vs Contending 
 122NFSv3 metadata 
 123References
- Text and images have been borrowed directly from 
both papers.