Title: RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn.
1RAMBOA Reconfigurable Atomic Memory Service for
Dynamic NetworksNancy Lynch, MIT
Alex Shvartsman, U.
Conn. DISC 2002 October 29, 2002
2Goal
- An algorithm to implement atomic read/write
shared memory in a dynamic network setting. - Participants may join, leave, fail during
computation. - Mobile networks, peer-to-peer networks.
- High availability, low latency.
- Atomicity for all patterns of asynchrony and
change. - Good performance under reasonable limits on
asynchrony and change. - Applications
- Battle data for teams of soldiers in military
operation. - Game data for players in multiplayer game.
3Approach Dynamic Quorums
- Objects are replicated at several network
locations. - To accommodate small, transient changes
- Uses quorum configurations members,
read-quorums, write-quorums. - Maintains atomicity during stable situations.
- Allows concurrency.
- To handle larger, more permanent changes
- Reconfigure
- Maintains atomicity across configuration changes.
- Any configuration can be installed at any time.
- Reconfigure concurrently with reads/writes no
heavyweight view change.
4RAMBO
- RAMBO Reconfigurable Atomic Memory for Basic
Objects (dynamic atomic read/write shared
memory). - Global service specification
- Algorithm
- Reads and writes objects.
- Chooses new configurations, notifies members.
- Identifies, garbage-collects obsolete
configurations. - All concurrently.
5RAMBO algorithm structure
- Main algorithm reconfiguration service
- Loosely coupled
- Recon service
- Provides the main algorithm with
a consistent sequence of
configurations. - Main algorithm
- Handles reading, writing.
- Receives, disseminates new configuration
information no formal installation. - Garbage-collects old configurations.
- Reads/writes may use several configurations.
Recon
6Main algorithm Reads/writes
- Uses two-phase strategy Attiya, Bar-Noy, Dolev
96 - Phase 1 Collect object values from read-quorums
of active configurations. - Phase 2 Propagate latest value to write-quorums
of active configurations. - Operations may execute concurrently.
- Quorum intersection properties guarantee
atomicity. - Our communication mechanism
- Background gossiping
- Terminate by fixed-point condition, involving a
quorum from each active configuration.
7Removing old configurations
- Main algorithm removes old configurations by
garbage-collecting them in the background. - Two-phase garbage-collection procedure
- First phase
- Inform write-quorum of old configuration about
the new configuration. - Collect object values from read-quorum of the old
configuration. - Second phase
- Propagate the latest value to a write-quorum of
the new configuration. - Garbage-collection concurrent with reads/writes.
- Implemented using gossiping and fixed points.
8Implementation of Recon
- Uses distributed consensus to determine
successive configurations 1,2,3, - Members of old configuration propose new
configuration. - Proposals reconciled using consensus
- Consensus is a heavyweight mechanism, but
- Used only for reconfigurations, infrequent.
- Does not delay Read/Write operations.
9Implementation of consensus
- Use a version of the Paxos algorithm Lamport 89,
98, 02. - Agreement, validity guaranteed absolutely.
- Termination guaranteed if/when underlying system
stabilizes.
10Models and analysis
- I/O automaton models.
- Prove atomicity for arbitrary patterns of
asynchrony and change. - Analyze performance conditionally, based on
failure and timing assumptions. - Reads and writes take time at most 8d, under
reasonable steady-state assumptions.
11 Other approaches
- Use consensus to agree on total ordering of
operations Lamport 89 - Not resilient to transient failures.
- Termination of r/w depends on termination of
consensus. - Totally-ordered broadcast over group
communication Amir, Dolev,
Melliar-Smith, Moser 94, Keidar, Dolev 96 - View formation takes a long time, delays
reads/writes. - One change may trigger view formation.
- Dynamic quorums over GC De Prisco, et al, 99
- New view must satisfy intersection requirements.
- Single reconfigurer Lynch, Shvartsman 97,
Englert, Shvartsman 00
12Outline of talk
- 1. Introduction ?
- 2. Reconfigurable Atomic Memory (RAMBO)
specification - 3. Reconfiguration service (Recon) specification
- 4. Implementation of RAMBO using Recon
- 5. Proof of atomicity
- 6. Implementation of Recon
- 7. Conditional performance results
- 8. Conclusions
132. RAMBO Service Specification
- I, infinite set of participants locations
- X, set of objects
- C, configuration identifiers
- External actions for each i and x
- Inputs joinx,i, readx,i, write(v)x,i,
recon(c,c)x,i - Outputs join-ackx,i, read-ack(v)x,i, ,
report(c)x,i - Ignore joins in this talk.
- Behavior
- Assuming basic well-formedness conditions, RAMBO
guarantees atomicity. - Liveness replaced by latency bounds.
14Atomicity
- AKA linearizability
- Definition Each operation appears to occur at
some point between its invocation and response. - Sufficient condition For each object x, all the
read and write operations for x can be partially
ordered by ?, so that - ? is consistent with the order of invocations and
responses there are no operations such that ?1
completes before ?2 starts, yet ?2 ? ?1 . - All write operations are ordered with respect to
each other and with respect to all the reads. - Every read returns the value of the last write
preceding it in ?.
15Implementing RAMBO
- Composition of separate service for each x.
- RAMBO (for x) uses separate Recon service (for x)
163. Recon Service Specification
- External actions for each i
- Inputs recon(c,c)i
- Outputs recon-acki, report(c)i, new-config(c,k)i
- And some joining actions (ignore)
- Behavior
- Assuming well-formedness, Recon produces
consistent configuration identifiers at
participating locations - Agreement Two configs never assigned to same k.
- Validity Any announced new-config was previously
requested by someone. - No duplication No configuration is assigned to
more than one k.
174. Implementing RAMBO using Recon
- Recon
- Chooses configurations
- Tells members of the previous and new
configuration. - Informs Reader-Writer components (new-config).
- Reader-Writer
- Conducts read and write operations
- Two-phased quorum-based algorithm.
- Uses all current configurations.
- Garbage-collects obsolete configurations.
18Static Reader-Writer protocol
- Quorum configuration for I
- read-quorums, write-quorums, two collections of
subsets of I - For any R in read-quorums, W in write-quorums, R
? W ? ?. - Replicate the object x at all locations in I.
- At each i in I, keep
- value
- tag, consisting of (sequence number, location)
- Read, Write use two phases
- Phase 1 Read (value, tag) from a read-quorum
- Phase 2 Write (value,tag) to a write-quorum
19Static Reader-Writer protocol
- Write at location i
- Phase 1
- Read (value, tag) from a read-quorum.
- Determine largest seq-number among the tags that
are read. - Choose new-tag (larger sequence-number, i).
- Phase 2
- Propagate (new-value, new-tag) to a write-quorum.
- Read at location i
- Phase 1
- Read (value, tag) from a read-quorum.
- Determine largest (value,tag) among those read.
- Phase 2
- Propagate this (value,tag) to a write-quorum.
- Return value.
- Highly concurrent.
- Quorum intersection implies atomicity
20Extend to dynamic setting
- Any member of current configuration can propose a
new configuration. - Recon produces consistent configurations.
- Reader-Writer processes run two-phase static
quorum-based algorithm, using all current
configurations. - Uses gossip and fixed-point tests.
- When Recon provides new configuration,
Reader-Writer doesnt abort reads/writes in
progress, but does extra work to access
additional processes needed for new quorums.
21Configurations and Config Maps
- Configuration c
- members(c) --owners of the data in
configuration c - read-quorums(c)
- write-quorums(c)
- Configuration map cm
- Sequence of configurations cm(k)
- Can be defined, undefined (?), garbage-collected
()
...
c
c
c
?
c
?
...
?
c
GCd Defined Mixed
Undefined
22Configuration maps
. . .
c0
?
?
?
?
?
?
?
?
?
?
. . .
c0
c1
?
?
?
?
?
?
?
?
?
. . .
c0
c1
c2
?
?
?
ck
?
?
?
?
. . .
c1
c2
?
?
?
ck
?
?
?
?
. . .
c2
?
?
?
ck
?
?
?
?
. . .
c3
?
?
ck
?
?
?
?
. . .
. . .
c
c
c
?
c
?
23Reader-Writer state
- world
- value, tag
- cmap
- pnum1, counts phases of locally-initiated
operations - pnum2, records latest known phase numbers for all
locations - op-record, keeps track of the status of a current
locally initiated read/write operation - Includes op.cmap, consisting of consecutive
configs. - gc-record, keeps track of the status of a current
locally-initiated garbage-collection operation
24Reader-Writer protocol
- One kind of message, gossiped nondeterministically
. - Message ltW, v, t, cm, ns, nr gt from i to j,
where - W is i s world
- v,t are is value and tag
- cm is is cmap
- ns is is phase number, pnum1
- nr is the latest phase number i knows for j,
pnum2(j) - (ns,nr) used to identify fresh messages.
- Key actions are taken when enough information
has been gathered (fixed point).
25When ltW,v,t,cm,ns,nrgt arrives from j
- world world ?W
- if t gt tag then (value,tag) (v,t)
- cmap update(cmap,cm)
- Updates cmap with newer information in cm.
- pnum2(j) max(pnum2(j), ns)
- gc-record If message is fresh, record the
sender. - op-record If message is fresh
- Record the sender.
- Extend op.cmap with newly-discovered
configurations.
26Processing reads and writes
- Reads and Writes perform Query and Propagation
phases using known configurations, stored in
op.cmap. - Query phase Obtains fresh value, tag, cmap
information from read-quorums. - Propagation phase Propagates up-to-date
(value,tag) to write-quorums obtains fresh cmap
information from write-quorums. - Both phases Extend op.cmap with
newly-discovered configurations new
configurations are also used in the phase. - Each phase ends with a fixed point, after hearing
from quorums of all the configurations currently
in op.cmap.
27Garbage collection
- A process can try to GC config k when its cmap
looks like - Phase 1
- Informs a write-quorum of ck about ck1.
- Collects latest (value, tag) from a read-quorum
of ck. - Phase 2
- Propagates (value, tag) to a write-quorum of
ck1. - Set cmap(k) to .
- GC operates concurrently with reads and writes.
285. Proof of Atomicity
- Atomicity holds for
- arbitrary patterns of asynchrony,
- arbitrary crash-failures and message loss,
- arbitrary joins.
- Proof Construct partial order ? of read and
write operations satisfying - ? is consistent with the order of invocations and
responses. - All write operations are ordered with respect to
each other and with respect to all the reads. - Every read returns the value of the last write
preceding it in ?. - Let ? be the lexicographic order on the
operations tags, and order write with tag t
before all reads with tag t.
29Showing consistency
- Lemma 1 Tags of GC operations are nondecreasing
with respect to the configuration index. - Proof GC is done sequentially.
- Lemma 2 If the first GC of config k completes
before a read/write operation ? begins, then the
tag of the GC is less than or equal to the tag of
? (lt if ? is a write). - Lemma 3 If ?1 and ?2 are two read/write
operations and ?1 completes before ?2 begins,
then the tag of ?1 is less than or equal to the
tag of ?2 (lt if ?2 is a write).
30Proof of Lemma 3
- Assume ?1 and ?2 are two read/write operations
and ?1 completes before ?2 begins. - Each phase uses consecutive configurations.
- Case 1 prop-cmap(?1) and query-cmap(?2) share a
configuration c. - Quorum intersection for c yields the tag
inequality. - Case 2 All configs in prop-cmap(?1) are less
than all those in query-cmap(?2). - The tag inequality follows from a chain of tag
inequalities, following a chain of GC operations
for the intervening configurations. Uses Lemmas
1 and 2. - Case 3 All configs in prop-cmap(?1) are greater
than all those in query-cmap(?2). - Impossible.
316. Implementing Recon
- Recon algorithm uses (static) consensus services
to determine configurations 1, 2,
3, - Cons(k,c) Used to determine config k, if config
k-1 is c. - Consensus is used only for reconfigurations, does
not delay read and write operations.
recon-ack
recon
Recon
Consensus
Net
32Implementing Recon
- Simple---no atomicity issues.
- Members of old configuration may propose a new
configuration proposals reconciled using
consensus. - recon(c,c) Request for reconfiguration from c
to c. - If c is the k-1st configuration (and is
current), then send init message to members
invoke Cons(k,c) with initial value c - Receive an init message Participate in
consensus. - decide(c) Tell Reader-Writer the new
configuration send config message to members of
c. - Receipt of config message Tell Reader-Writer
the new configuration. - Consensus implemented using Paxos Synod algorithm.
337. Latency Analysis
- Consider a subset of timed executions
- Gossip occurs
- Periodically, and
- At certain key times
- At beginning of operation phase.
- Just after receiving a message from someone with
a new phase number. - Just after certain join and reconfiguration
events. - Perform local steps immediately.
- Reliable message delivery, bounded delay.
- Normal timing for consensus services.
34Additional assumptions
- e-Configuration-viability for time parameter e
- A read-quorum and a write-quorum of configuration
k remain alive, until at least time e after
configuration k1 is installed (decided upon by
all non-failed members of configuration k). - e-Reconfiguration-spacing
- recon(c,)i occurs at least e time after
report(c)i - e-Join-connectivity
- If i and j join by time t then they learn about
each other by time te
35Latency results
- Reconfiguration
- 13d, if recon(c,c)i occurs and no members of c
subsequently fail. - Garbage-collection of ck by process i
- 4d, if process i, a read-quorum and a
write-quorum of ck, and a write-quorum of ck1,
do not fail. - Read or write operation by process i in a
stable system - 4d, if no reconfigurations occur, and process is
cmap is up-to-date. - Learning about configurations
- If i and j are old enough and dont fail, then
information from i is conveyed to j within time
2d.
36Latency results
- Garbage-collection, in executions with
6d-reconfiguration-spacing and 5d-configuration-vi
ability - If report(c) occurs at i and i does not fail then
any non-failed process that is old enough learns
about c and garbage-collects any older
configuration within time 6d. - Read and write operations, in executions with
12d-reconfiguration-spacing and
11d-configuration-viability - 8d, for an operation managed by a process that is
old enough and does not fail.
378. Conclusions
- RAMBO algorithm
- Composed of R/W algorithm, Recon service,
Consensus - Atomicity in all executions.
- Good latency bounds
- For reading, writing, garbage-collection.
- Under assumptions about timing, joins, failures,
and rate of reconfiguration.
38Algorithmic innovations
- Dynamic configurations
- Members can be changed dynamically.
- Any current member may request reconfiguration.
- Arbitrary configurations can be installed no
intersection requirements. - Loosely-coupled reconfiguration
- Concurrent reading, writing, reconfiguration.
- Reads/writes can use several configurations can
complete during reconfiguration. - Efficient steady-state
- Assuming bounded delays, infrequent
reconfiguration, and periodic gossip, read and
write operations complete in time 8d.
39Comparison with other approaches
- Using consensus to agree on a total ordering of
operations - We use consensus only for the configurations.
- Consensus termination impacts only
reconfiguration latency, not read and write
latency. - Group communication
- Our reads/writes work during new view
establishment. - Dynamic quorum configurations over GC
- We allow arbitrary new configurations - no
intersection requirements. - Single reconfigurer approaches
- We allow multiple reconfigurers.
- We uncouple introduction of new configurations
and garbage-collection of old configurations.
40Current and future work
- LAN implementation Musial, Shvartsman
- More analysis
- Normal behavior starting from some point
- Tradeoff between configuration-viability and gc
rate. - Algorithmic improvements and additions
- Concurrent garbage-collection Gilbert
- Reducing communication.
- Better join protocol, explicit leave protocol.
- Early return of read values.
- Backup strategies for when configuration-viability
fails. - Choosing good configurations.
- Extensions to other data types?