RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn. - PowerPoint PPT Presentation

About This Presentation
Title:

RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn.

Description:

RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks ... Assuming basic well-formedness conditions, RAMBO guarantees atomicity. ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 41
Provided by: cse18
Category:

less

Transcript and Presenter's Notes

Title: RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks Nancy Lynch, MIT Alex Shvartsman, U. Conn.


1
RAMBOA Reconfigurable Atomic Memory Service for
Dynamic NetworksNancy Lynch, MIT
Alex Shvartsman, U.
Conn. DISC 2002 October 29, 2002
2
Goal
  • An algorithm to implement atomic read/write
    shared memory in a dynamic network setting.
  • Participants may join, leave, fail during
    computation.
  • Mobile networks, peer-to-peer networks.
  • High availability, low latency.
  • Atomicity for all patterns of asynchrony and
    change.
  • Good performance under reasonable limits on
    asynchrony and change.
  • Applications
  • Battle data for teams of soldiers in military
    operation.
  • Game data for players in multiplayer game.

3
Approach Dynamic Quorums
  • Objects are replicated at several network
    locations.
  • To accommodate small, transient changes
  • Uses quorum configurations members,
    read-quorums, write-quorums.
  • Maintains atomicity during stable situations.
  • Allows concurrency.
  • To handle larger, more permanent changes
  • Reconfigure
  • Maintains atomicity across configuration changes.
  • Any configuration can be installed at any time.
  • Reconfigure concurrently with reads/writes no
    heavyweight view change.

4
RAMBO
  • RAMBO Reconfigurable Atomic Memory for Basic
    Objects (dynamic atomic read/write shared
    memory).
  • Global service specification
  • Algorithm
  • Reads and writes objects.
  • Chooses new configurations, notifies members.
  • Identifies, garbage-collects obsolete
    configurations.
  • All concurrently.

5
RAMBO algorithm structure
  • Main algorithm reconfiguration service
  • Loosely coupled
  • Recon service
  • Provides the main algorithm with
    a consistent sequence of
    configurations.
  • Main algorithm
  • Handles reading, writing.
  • Receives, disseminates new configuration
    information no formal installation.
  • Garbage-collects old configurations.
  • Reads/writes may use several configurations.

Recon
6
Main algorithm Reads/writes
  • Uses two-phase strategy Attiya, Bar-Noy, Dolev
    96
  • Phase 1 Collect object values from read-quorums
    of active configurations.
  • Phase 2 Propagate latest value to write-quorums
    of active configurations.
  • Operations may execute concurrently.
  • Quorum intersection properties guarantee
    atomicity.
  • Our communication mechanism
  • Background gossiping
  • Terminate by fixed-point condition, involving a
    quorum from each active configuration.

7
Removing old configurations
  • Main algorithm removes old configurations by
    garbage-collecting them in the background.
  • Two-phase garbage-collection procedure
  • First phase
  • Inform write-quorum of old configuration about
    the new configuration.
  • Collect object values from read-quorum of the old
    configuration.
  • Second phase
  • Propagate the latest value to a write-quorum of
    the new configuration.
  • Garbage-collection concurrent with reads/writes.
  • Implemented using gossiping and fixed points.

8
Implementation of Recon
  • Uses distributed consensus to determine
    successive configurations 1,2,3,
  • Members of old configuration propose new
    configuration.
  • Proposals reconciled using consensus
  • Consensus is a heavyweight mechanism, but
  • Used only for reconfigurations, infrequent.
  • Does not delay Read/Write operations.

9
Implementation of consensus
  • Use a version of the Paxos algorithm Lamport 89,
    98, 02.
  • Agreement, validity guaranteed absolutely.
  • Termination guaranteed if/when underlying system
    stabilizes.

10
Models and analysis
  • I/O automaton models.
  • Prove atomicity for arbitrary patterns of
    asynchrony and change.
  • Analyze performance conditionally, based on
    failure and timing assumptions.
  • Reads and writes take time at most 8d, under
    reasonable steady-state assumptions.

11
Other approaches
  • Use consensus to agree on total ordering of
    operations Lamport 89
  • Not resilient to transient failures.
  • Termination of r/w depends on termination of
    consensus.
  • Totally-ordered broadcast over group
    communication Amir, Dolev,
    Melliar-Smith, Moser 94, Keidar, Dolev 96
  • View formation takes a long time, delays
    reads/writes.
  • One change may trigger view formation.
  • Dynamic quorums over GC De Prisco, et al, 99
  • New view must satisfy intersection requirements.
  • Single reconfigurer Lynch, Shvartsman 97,
    Englert, Shvartsman 00

12
Outline of talk
  • 1. Introduction ?
  • 2. Reconfigurable Atomic Memory (RAMBO)
    specification
  • 3. Reconfiguration service (Recon) specification
  • 4. Implementation of RAMBO using Recon
  • 5. Proof of atomicity
  • 6. Implementation of Recon
  • 7. Conditional performance results
  • 8. Conclusions

13
2. RAMBO Service Specification
  • I, infinite set of participants locations
  • X, set of objects
  • C, configuration identifiers
  • External actions for each i and x
  • Inputs joinx,i, readx,i, write(v)x,i,
    recon(c,c)x,i
  • Outputs join-ackx,i, read-ack(v)x,i, ,
    report(c)x,i
  • Ignore joins in this talk.
  • Behavior
  • Assuming basic well-formedness conditions, RAMBO
    guarantees atomicity.
  • Liveness replaced by latency bounds.

14
Atomicity
  • AKA linearizability
  • Definition Each operation appears to occur at
    some point between its invocation and response.
  • Sufficient condition For each object x, all the
    read and write operations for x can be partially
    ordered by ?, so that
  • ? is consistent with the order of invocations and
    responses there are no operations such that ?1
    completes before ?2 starts, yet ?2 ? ?1 .
  • All write operations are ordered with respect to
    each other and with respect to all the reads.
  • Every read returns the value of the last write
    preceding it in ?.

15
Implementing RAMBO
  • Composition of separate service for each x.
  • RAMBO (for x) uses separate Recon service (for x)

16
3. Recon Service Specification
  • External actions for each i
  • Inputs recon(c,c)i
  • Outputs recon-acki, report(c)i, new-config(c,k)i
  • And some joining actions (ignore)
  • Behavior
  • Assuming well-formedness, Recon produces
    consistent configuration identifiers at
    participating locations
  • Agreement Two configs never assigned to same k.
  • Validity Any announced new-config was previously
    requested by someone.
  • No duplication No configuration is assigned to
    more than one k.

17
4. Implementing RAMBO using Recon
  • Recon
  • Chooses configurations
  • Tells members of the previous and new
    configuration.
  • Informs Reader-Writer components (new-config).
  • Reader-Writer
  • Conducts read and write operations
  • Two-phased quorum-based algorithm.
  • Uses all current configurations.
  • Garbage-collects obsolete configurations.

18
Static Reader-Writer protocol
  • Quorum configuration for I
  • read-quorums, write-quorums, two collections of
    subsets of I
  • For any R in read-quorums, W in write-quorums, R
    ? W ? ?.
  • Replicate the object x at all locations in I.
  • At each i in I, keep
  • value
  • tag, consisting of (sequence number, location)
  • Read, Write use two phases
  • Phase 1 Read (value, tag) from a read-quorum
  • Phase 2 Write (value,tag) to a write-quorum

19
Static Reader-Writer protocol
  • Write at location i
  • Phase 1
  • Read (value, tag) from a read-quorum.
  • Determine largest seq-number among the tags that
    are read.
  • Choose new-tag (larger sequence-number, i).
  • Phase 2
  • Propagate (new-value, new-tag) to a write-quorum.
  • Read at location i
  • Phase 1
  • Read (value, tag) from a read-quorum.
  • Determine largest (value,tag) among those read.
  • Phase 2
  • Propagate this (value,tag) to a write-quorum.
  • Return value.
  • Highly concurrent.
  • Quorum intersection implies atomicity

20
Extend to dynamic setting
  • Any member of current configuration can propose a
    new configuration.
  • Recon produces consistent configurations.
  • Reader-Writer processes run two-phase static
    quorum-based algorithm, using all current
    configurations.
  • Uses gossip and fixed-point tests.
  • When Recon provides new configuration,
    Reader-Writer doesnt abort reads/writes in
    progress, but does extra work to access
    additional processes needed for new quorums.

21
Configurations and Config Maps
  • Configuration c
  • members(c) --owners of the data in
    configuration c
  • read-quorums(c)
  • write-quorums(c)
  • Configuration map cm
  • Sequence of configurations cm(k)
  • Can be defined, undefined (?), garbage-collected
    ()

...


c
c
c
?
c
?
...
?
c
GCd Defined Mixed
Undefined
22
Configuration maps
. . .
c0
?
?
?
?
?
?
?
?
?
?
. . .
c0
c1
?
?
?
?
?
?
?
?
?
. . .
c0
c1
c2
?
?
?
ck
?
?
?
?
. . .

c1
c2
?
?
?
ck
?
?
?
?
. . .


c2
?
?
?
ck
?
?
?
?
. . .



c3
?
?
ck
?
?
?
?
. . .
. . .





c
c
c
?
c
?
23
Reader-Writer state
  • world
  • value, tag
  • cmap
  • pnum1, counts phases of locally-initiated
    operations
  • pnum2, records latest known phase numbers for all
    locations
  • op-record, keeps track of the status of a current
    locally initiated read/write operation
  • Includes op.cmap, consisting of consecutive
    configs.
  • gc-record, keeps track of the status of a current
    locally-initiated garbage-collection operation

24
Reader-Writer protocol
  • One kind of message, gossiped nondeterministically
    .
  • Message ltW, v, t, cm, ns, nr gt from i to j,
    where
  • W is i s world
  • v,t are is value and tag
  • cm is is cmap
  • ns is is phase number, pnum1
  • nr is the latest phase number i knows for j,
    pnum2(j)
  • (ns,nr) used to identify fresh messages.
  • Key actions are taken when enough information
    has been gathered (fixed point).

25
When ltW,v,t,cm,ns,nrgt arrives from j
  • world world ?W
  • if t gt tag then (value,tag) (v,t)
  • cmap update(cmap,cm)
  • Updates cmap with newer information in cm.
  • pnum2(j) max(pnum2(j), ns)
  • gc-record If message is fresh, record the
    sender.
  • op-record If message is fresh
  • Record the sender.
  • Extend op.cmap with newly-discovered
    configurations.

26
Processing reads and writes
  • Reads and Writes perform Query and Propagation
    phases using known configurations, stored in
    op.cmap.
  • Query phase Obtains fresh value, tag, cmap
    information from read-quorums.
  • Propagation phase Propagates up-to-date
    (value,tag) to write-quorums obtains fresh cmap
    information from write-quorums.
  • Both phases Extend op.cmap with
    newly-discovered configurations new
    configurations are also used in the phase.
  • Each phase ends with a fixed point, after hearing
    from quorums of all the configurations currently
    in op.cmap.

27
Garbage collection
  • A process can try to GC config k when its cmap
    looks like
  • Phase 1
  • Informs a write-quorum of ck about ck1.
  • Collects latest (value, tag) from a read-quorum
    of ck.
  • Phase 2
  • Propagates (value, tag) to a write-quorum of
    ck1.
  • Set cmap(k) to .
  • GC operates concurrently with reads and writes.

28
5. Proof of Atomicity
  • Atomicity holds for
  • arbitrary patterns of asynchrony,
  • arbitrary crash-failures and message loss,
  • arbitrary joins.
  • Proof Construct partial order ? of read and
    write operations satisfying
  • ? is consistent with the order of invocations and
    responses.
  • All write operations are ordered with respect to
    each other and with respect to all the reads.
  • Every read returns the value of the last write
    preceding it in ?.
  • Let ? be the lexicographic order on the
    operations tags, and order write with tag t
    before all reads with tag t.

29
Showing consistency
  • Lemma 1 Tags of GC operations are nondecreasing
    with respect to the configuration index.
  • Proof GC is done sequentially.
  • Lemma 2 If the first GC of config k completes
    before a read/write operation ? begins, then the
    tag of the GC is less than or equal to the tag of
    ? (lt if ? is a write).
  • Lemma 3 If ?1 and ?2 are two read/write
    operations and ?1 completes before ?2 begins,
    then the tag of ?1 is less than or equal to the
    tag of ?2 (lt if ?2 is a write).

30
Proof of Lemma 3
  • Assume ?1 and ?2 are two read/write operations
    and ?1 completes before ?2 begins.
  • Each phase uses consecutive configurations.
  • Case 1 prop-cmap(?1) and query-cmap(?2) share a
    configuration c.
  • Quorum intersection for c yields the tag
    inequality.
  • Case 2 All configs in prop-cmap(?1) are less
    than all those in query-cmap(?2).
  • The tag inequality follows from a chain of tag
    inequalities, following a chain of GC operations
    for the intervening configurations. Uses Lemmas
    1 and 2.
  • Case 3 All configs in prop-cmap(?1) are greater
    than all those in query-cmap(?2).
  • Impossible.

31
6. Implementing Recon
  • Recon algorithm uses (static) consensus services
    to determine configurations 1, 2,
    3,
  • Cons(k,c) Used to determine config k, if config
    k-1 is c.
  • Consensus is used only for reconfigurations, does
    not delay read and write operations.

recon-ack
recon
Recon
Consensus
Net
32
Implementing Recon
  • Simple---no atomicity issues.
  • Members of old configuration may propose a new
    configuration proposals reconciled using
    consensus.
  • recon(c,c) Request for reconfiguration from c
    to c.
  • If c is the k-1st configuration (and is
    current), then send init message to members
    invoke Cons(k,c) with initial value c
  • Receive an init message Participate in
    consensus.
  • decide(c) Tell Reader-Writer the new
    configuration send config message to members of
    c.
  • Receipt of config message Tell Reader-Writer
    the new configuration.
  • Consensus implemented using Paxos Synod algorithm.

33
7. Latency Analysis
  • Consider a subset of timed executions
  • Gossip occurs
  • Periodically, and
  • At certain key times
  • At beginning of operation phase.
  • Just after receiving a message from someone with
    a new phase number.
  • Just after certain join and reconfiguration
    events.
  • Perform local steps immediately.
  • Reliable message delivery, bounded delay.
  • Normal timing for consensus services.

34
Additional assumptions
  • e-Configuration-viability for time parameter e
  • A read-quorum and a write-quorum of configuration
    k remain alive, until at least time e after
    configuration k1 is installed (decided upon by
    all non-failed members of configuration k).
  • e-Reconfiguration-spacing
  • recon(c,)i occurs at least e time after
    report(c)i
  • e-Join-connectivity
  • If i and j join by time t then they learn about
    each other by time te

35
Latency results
  • Reconfiguration
  • 13d, if recon(c,c)i occurs and no members of c
    subsequently fail.
  • Garbage-collection of ck by process i
  • 4d, if process i, a read-quorum and a
    write-quorum of ck, and a write-quorum of ck1,
    do not fail.
  • Read or write operation by process i in a
    stable system
  • 4d, if no reconfigurations occur, and process is
    cmap is up-to-date.
  • Learning about configurations
  • If i and j are old enough and dont fail, then
    information from i is conveyed to j within time
    2d.

36
Latency results
  • Garbage-collection, in executions with
    6d-reconfiguration-spacing and 5d-configuration-vi
    ability
  • If report(c) occurs at i and i does not fail then
    any non-failed process that is old enough learns
    about c and garbage-collects any older
    configuration within time 6d.
  • Read and write operations, in executions with
    12d-reconfiguration-spacing and
    11d-configuration-viability
  • 8d, for an operation managed by a process that is
    old enough and does not fail.

37
8. Conclusions
  • RAMBO algorithm
  • Composed of R/W algorithm, Recon service,
    Consensus
  • Atomicity in all executions.
  • Good latency bounds
  • For reading, writing, garbage-collection.
  • Under assumptions about timing, joins, failures,
    and rate of reconfiguration.

38
Algorithmic innovations
  • Dynamic configurations
  • Members can be changed dynamically.
  • Any current member may request reconfiguration.
  • Arbitrary configurations can be installed no
    intersection requirements.
  • Loosely-coupled reconfiguration
  • Concurrent reading, writing, reconfiguration.
  • Reads/writes can use several configurations can
    complete during reconfiguration.
  • Efficient steady-state
  • Assuming bounded delays, infrequent
    reconfiguration, and periodic gossip, read and
    write operations complete in time 8d.

39
Comparison with other approaches
  • Using consensus to agree on a total ordering of
    operations
  • We use consensus only for the configurations.
  • Consensus termination impacts only
    reconfiguration latency, not read and write
    latency.
  • Group communication
  • Our reads/writes work during new view
    establishment.
  • Dynamic quorum configurations over GC
  • We allow arbitrary new configurations - no
    intersection requirements.
  • Single reconfigurer approaches
  • We allow multiple reconfigurers.
  • We uncouple introduction of new configurations
    and garbage-collection of old configurations.

40
Current and future work
  • LAN implementation Musial, Shvartsman
  • More analysis
  • Normal behavior starting from some point
  • Tradeoff between configuration-viability and gc
    rate.
  • Algorithmic improvements and additions
  • Concurrent garbage-collection Gilbert
  • Reducing communication.
  • Better join protocol, explicit leave protocol.
  • Early return of read values.
  • Backup strategies for when configuration-viability
    fails.
  • Choosing good configurations.
  • Extensions to other data types?
Write a Comment
User Comments (0)
About PowerShow.com