The Specification-Consistent Coordination Model (SCCM) and its applications to Byzantine Failures - PowerPoint PPT Presentation

About This Presentation

Title:

The Specification-Consistent Coordination Model (SCCM) and its applications to Byzantine Failures

Description:

The Specification-Consistent Coordination Model (SCCM) and its applications to Byzantine Failures The Byzantine Failure Problem In a large multi-processor, internal ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 28

Provided by: csCornell

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: The Specification-Consistent Coordination Model (SCCM) and its applications to Byzantine Failures

1
The Specification-Consistent Coordination Model
(SCCM) and its applications to Byzantine
Failures
2
The Byzantine Failure Problem

In a large multi-processor, internal breakdowns
are expected to be common events.
Most of these breakdowns will result in complex
behavior where the processor will return
incorrect output.
Given the source code of the program, we want to
be able to detect and recover from such failures.
SCCM may be a way to do this.

3
The Goals of SCCM

SCCM was originally intended as an aid to
programmers.
It divides the programming task into two stages
A Specification that rigorously defines the
algorithm.
A Coordination, which defines the actual
imperative program and associates it with the
Specification to ensure correctness.
SCCM employs a runtime checker to ensure that the
imperative program being executed matches the
specification.

4
SCCM Specification

SCCM defines the algorithm via a functional
language.
For every piece of information that arises during
the algorithm's lifetime, there is a function
with a particular argument value to identify it.
Example sum of all the numbers in array A
input A
sum(i) sum(i-1) Ai
output sum(A.length)
Each intermediate value is identified by sum(i)

5
SCCM Coordination

An imperative version of this program
var ImpSum 0
Array A 1, 2, 3, 4, 5, 6
for i1 to A.length
ImpSum ImpSum Ai
Output ImpSum
We can ensure that this program matches the
Specification by associating with every value of
ImpSum a corresponding identifier sum(i).

6
Named Values

SCCM works by naming each piece of mutable
storage with some f(x) from the specification.
It maintains correctness by ensuring that when
the imperative program overwrites values, it
transforms their names in a way consistent with
the specification.
Because all values have names and names may only
be transformed in consistent ways, SCCM ensures
that the implementation's control flow is the
same as in the specification.

7
Summation's Named Values

In command ImpSum ImpSum Ai, we would use
the definition of sum() to transform ImpSum's
name from sum(i) to sum(i1).
Sum() sum(i) sum(i-1) Ai
ImpSum's values and names

8
Fibonacci Sequence

The algorithm specification for the Fibonacci
Sequence is simple
fib(0) 1
fib(1) 1
fib(i) fib(i-1) fib(i-2)
An implementation will have to name each of its
values with some fib(i) and only use the above
rule to transform values.

9
Fibonacci Specification

The source code of the specification of the
Fibonacci Sequence algorithm.

10
Fibonacci Coordination
11
The Consistency Link

Full Application Afib(0)
Here, fib() is actually called with 0 as the
argument and its return value, 1 is assigned to
A. fib(0) is now A's name.
Fetch Application Cfib(i) lt- A
The value and name of A are copied into C. SCCM
makes sure that before the copy, A's name is
fib(i).

12
The Consistency Link 2

Step Application
Bfib(i2)(fib.l1 lt-A,
fib.l2 lt-C)
fib() is executed to obtain the value of B.
Rather than wastefully recursively calling
fib(i1) and fib(i), SCCM pulls those values from
A and C.
It ensures that the name in A is fib(i1) and C's
name is f(i).
Thus, B gets its value and SCCM ensures that
proper control flow was maintained.

13
Potential Coding Errors

Errors in the imperative program are caught.
Example Setting loop bounds to (0,n) rather than
(0, n-1) results in fib(n1) being output rather
than fib(n). SCCM detects this error.
In general, it is hard to make errors in both the
specification and the coordination that match
each other.

14
SCCM Message Passing

SCCM allows us to create parallel programs via
message passing.
We can send and receive SCCM named values, with
SCCM ensuring global adherence to the
specification.
Both the Send and the Receive are checked.

15
SCCM Send

Sample Send
send n() lt- N to ltdestinationgt
endsend
The value of N, named n() is sent out.
We can send out single elements or lists of
elements.
SCCM makes sure that the values sent out actually
have the names the Send command claims them to
have.

16
SCCM Receive

Sample Receive
recv
match n() N
endrecv
The value of N, that was sent in the prior slide
is received by the target processor.
N's value must be named n(), just like in the
send.
All receives (as far as I can tell) are
Receive-Any's.

17
Another Send Example

Sample send command
send a(i, 2i) lt- A(i, i)
for i in (1,3)
to ltdestinationgt
endsend
The contents of 3 diagonal elements of A are
sent, named a(1,2), a(2,4), a(3,6) to the
destination processor.
SCCM checks that those are indeed the names in
those diagonal elements.

18
Another Receive Example

Sample receive command
recv
check a(i, 2i) B(i)
match for iint in (s,t)
endrecv
The diagonal elements of A are now received.
Their names must be the same but they may be
saved into some other structure at the target
processor. (like the array B)

19
SCCM Performance

When the same problem is implemented in C, SCCM
and SML
SCCM is usually 6-9 time slower than C because of
all the runtime checking overhead.
SCCM is 50 faster than SML, because SCCM
produces imperative programs that do not have
SML's functional overheads.

20
Is SCCM useful for Programmers?

The amount of time one spends writing a SCCM
program is much larger than for a normal program.
Arguably, this is less than the amount of time
spent on debugging but writing a specification
for a large system would be very hard.
Most programmers would find it hard to express
their algorithms in purely functional notation.
Programs in SCCM are several times longer than
their equivalents in C.
Example Bubble Sort.

21
SCCM for Byzantine Failures

SCCM effectively captures a program's control
flow.
The price for the programmer is having to write a
more complex program that is several times
longer.
We are trying to design compilers techniques that
can verify whether a processor has faithfully
executed a program.
Thus, the added difficulty does not concern our
purposes.

22
SCCM for Byzantine Failures

We may be able to annotate a program so that
after execution it can prove to us that it
transformed all of its data according to the
original source code.
SCCM can be thought of a system for creating
problem-specific type systems. Can we create a
Linear-Algebra specific type system?
Can Model Checking help us determine a program's
legal set of data transformations?

23
Related FieldsI. Certification Trails

A Certification Trail is a trail of information a
program leaves behind, describing its work.
After the first program completes, a second
program can use this trail to perform the same
computation much more quickly.
Thus, the certification trail for a program acts
much like a checksum or parity bit for data.
Little overhead is required.
Problem Currently this approach requires mostly
manual work. No techniques exist for compilers to
generate certification trails.

24
Related FieldsII. Result Checking

A subfield of CS Theory dealing with ways to
probabilistically verify the correctness of an
algorithm's output.
Related to Interactive Proofs.
Problems
Though the focus is on checkers that are
asymptotically faster than the actual algorithm,
most solutions are too inefficient to be used in
practice.
There is no general methodology for generating
checkers for problems and most checkers in
existence are for obscure and specialized
problems.

25
Related FieldsIII. Replication

Run the same program on multiple computers.
Compare their output to protect from corruption.
The only available solution to Byzantine
Failures.
Very resource inefficient. Most replication-based
approaches require 3 times as many resources as
unprotected systems.
ED4I run the same program twice with different
data to detect permanent and transient faults.
BFS Replicated services. Processors Vote on
results. Resilient to f faults by using 3f1
replicas.