Nondeterminism - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Nondeterminism

Description:

Promiscuous receive operations. Snd(2) Snd(2) Rcv(?) Rcv ... be able to deal with promiscuous nonblocking operations: attach a sequential ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 9
Provided by: ron80
Category:

less

Transcript and Presenter's Notes

Title: Nondeterminism


1
Nondeterminism
  • caused by all kinds of input
  • normal input from keyboard, file, network, ...
  • certain system calls such as random(), date(),
    ...
  • for message passing systems data received
  • executions are not repeatable, making cyclic
    debugging difficult
  • equivalent re-executions are possible using
    record/replay techniques
  • information about an execution is traced in a
    non-intrusive way and saved in small trace files
  • the information is used to force equivalent
    re-executions

2
Record/replay techniques
  • contents driven replay
  • record phase message data is traced
  • replay phase message data is imposed to the
    receive and test functions
  • control driven replay
  • record phase the order of the messages is traced
  • replay phase the same order is forced
  • goal low overhead, small trace files,
    equivalence
  • nondeterminism we deal with
  • promiscuous receive operations
  • test operations

3
Promiscuous receive operations
  • Used to receive a message from an unspecified
    source
  • Examples MPI_Recv, MPI_Irecv, MPI_Probe,
    MPI_Iprobe with MPI_ANY_SOURCE or MPI_ANY_TAG
    (MPI) and pvm_recv, pvm_nrecv, pvm_precv,
    pvm_probe with TID or msgtag-1(PVM)

Snd(2)
Snd(2)
Rcv(?)
Rcv(?)
4
  • MPI_ANY_TAG and tag-1 (PVM) pose no problems as
    the message channels are FIFO
  • solution for MPI_ANY_SOURCE and TID1 log the
    identity of the sender.

5
Test operations
  • Used in combination with nonblocking operations
    such as MPI_Test and MPI_Iprobe (MPI) and
    pvm_nrecv (PVM) to test for the completion of the
    operation

Snd(2)
IRcv(?)
Test() ? FALSE
Test() ? FALSE
Test() ? TRUE
6
  • Solution
  • record phase log the number of test operations
    (n)
  • replay phase each time we increment the number
    of operations
  • for the first n-1 test operations return FALSE
  • for the last operation force a MPI_Wait,
    MPI_Probe, or mpi_recv
  • for a serie of unsuccessful test operations log
    a number that is bigger than the number of test
    operations
  • To be able to deal with promiscuous nonblocking
    operations attach a sequential number to each
    operation and sort the trace file before
    replaying.

7
Implementation
  • The method was implemented for Athapascan, a
    runtime system for parallel systems composed of
    shared memory multiprocessor nodes. The
    communication uses shared memory and MPI (LAM).
  • The ROLT (Reconstruction of Lamport Timestamps)
    is used to deal with nondeterminism caused by
    shared memory accesses (synchronization
    operations).
  • Athapascan uses a guard (a mutex) to execute MPI
    functions ? these functions are replayed in the
    correct order

8
Evaluation
  • The implementation was tested on 2 PCs running
    Linux connected by 100Mbps Ethernet.
  • The overhead is small in time and space
  • Average overhead
  • record phase 0.86
  • replay phase 3.0
Write a Comment
User Comments (0)
About PowerShow.com