DynamiteG - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

DynamiteG

Description:

process address space (text, data, heap, stack, shared libraries) dumped to a checkpoint file ... restores heap, shared libraries and stack, jumps to signal ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 41
Provided by: pmas1
Category:
Tags: dynamiteg | heap

less

Transcript and Presenter's Notes

Title: DynamiteG


1
Dynamite-G
  • Peter Sloot

2
Topics
  • Load balancing by task migration in
    message-passing applications
  • Checkpoint and restart mechanisms
  • Migrating tasks
  • Performance issues
  • Current status

3
Why cluster and grid computing
  • Clusters and grids increasingly interesting
  • more workstations
  • higher performance per workstation
  • faster interconnecting networks
  • price/performance competitive with MPP
  • enormous unused capacity
  • cyclic availability

4
Issues
  • Clusters are inherently inhomogeneous
  • intrinsic differences in performance, memory,
    bandwidth
  • dynamically changing background load
  • ownership of nodes
  • Grids add
  • differences in administration
  • disjoint file systems
  • security etc.

5
Goals of Dynamite
  • Utilise unused cycles
  • Support parallel applications (PVM-MPI)
  • Respect ownership
  • Dynamic load redistribution at task level
  • Code level transparency
  • User level implementation

6
Task allocation domains
Static task load
Dynamic task load
Static task allocation
Predictable reallocation
Dynamical reallocation
Static resource load
Dynamic resource load
7
Why migrate
  • Performance of parallel program usuallydictated
    by slowest task
  • Task resource requirements and available
    resources both vary dynamically
  • Therefore, optimal task allocation changes
  • Gain must exceed cost of migration
  • Resources used by long-running programs may be
    reclaimed by owner

8
Checkpointing/restoring infrastructure
  • User level
  • Implemented in LINUX ELF dynamic loader V1.9.9
  • can run arbitrary code before the application
    starts running
  • wrapping function calls
  • straightforward support for shared libraries
  • only need to re-link with different loader
    (special option when linking)

9
Checkpointing
  • Checkpointing
  • signal received
  • register/signal mask state saved using sigsetjmp
  • process address space (text, data, heap, stack,
    shared libraries) dumped to a checkpoint file
  • Checkpoint file is a standalone ELF executable

10
Restoring
  • OS kernel loads text and data segments, invokes
    dynamic loader
  • Dynamic loader
  • recognises checkpoint file (special sections)
  • restores heap, shared libraries and stack, jumps
    to signal handler (siglongjmp)
  • Process returns from signal handler to
    application code

11
Handling kernel context
  • Kernel context not automatically preserved
  • open files, pipes, sockets, shared memory
  • Open files important, call wrapping used (open,
    close, creat, ...)
  • Shared file system a prerequisite
  • Method allows shut-down of source node

12
Open files
  • Relevant file operations are monitored
  • primarily open, close, creat
  • Obtain file-position before migration, close file
  • Reopen and reposition file after migration
  • no mirror or proxy needed on old host
  • fcntl and ioctl calls are not monitored
  • not much used and very complex
  • incomplete functionality

13
Location independent addressing
  • Standard PVM node identifier encoded in task
    identifier
  • (e.g. t80001 task 1 running on node 8). Used
    when routing messages between tasks.
  • Dynamite approach
  • task identifier stays the same after migration
  • routing tables maintained in all PVM daemons

14
Dynamite Initial State
Two PVM tasks communicating through a network of
daemons Migrate task 2 to node B
15
Prepare for Migration
Create new context for task 2 Tell PVM daemon B
to expect messages for task 2 Update routing
tables in daemons (first B, then A, later C)
16
Checkpointing
Node A
Node B
Newcontext
PVMD A
PVMD B
PVMtask 1
Node C
PVMD C
Program PVM Ckpt
Send checkpoint signal to task 2 Flush
connections Checkpoint task to disk
17
Restart Execution
Node A
Node B
NewPVM task 2
PVMD A
PVMD B
PVMtask 1
Node C
PVMD C
Restart checkpointed task 2 on node B Resume
communications Re-open re-position files
18
Connection flushing
SourcePVMD
Migrating task
Remote task
RemotePVMD
SIGURG
TC_MOVED
SIGUSR1
Time
TC_EOC
TC_EOC
TC_EOC
close()
EOF
TM_MIG
close()
Checkpoint
close()
19
Connection flushing
  • all tasks are notified with SIGURG and TC_MOVED
    message
  • migrating task M sends TC_EOC messages via all
    direct connections
  • tasks reply to TC_EOC messages to M
  • direct connections are closed
  • source PVM daemon sends TC_EOC message to M
  • migrating task M replies daemon with TM_MIG
  • task-daemon connection is closed

20
Special considerations
  • Critical sections
  • signal blocking and unblocking
  • Blocking calls
  • modifications to low-level mxfer function
  • Out-of-order fragments and messages
  • message forwarding and sequencing
  • Messages partially sent on migration
  • if via direct connections, re-send entirely

21
Performance
  • Migration speed largely dependent on the speed of
    shared file system
  • and that depends mostly on the network
  • NFS over 100 Mbps Ethernet
  • 0.4 s lt Tmig lt 15 s for 2 MB lt sizeimg lt
    64 MB
  • Communication speed reduced due to added overhead
  • 25 for 1 byte direct messages
  • 2 for 100 KB indirect messages

22
Migration (Linux)
23
Ping-pong experiment(Linux)
24
Migration decider
Configuration file
PVMD
Migration decider
Master monitor
25
Decider
  • Cost of configuration derived from weighted sum
    of
  • average CPU load
  • average memory load
  • migrations
  • Use of maximum instead of average optional
  • accounts for interdependence of tasks
  • Branch and bound search
  • Upper bound on search time

26
Three environments
  • The progress of a test program
  • Undisturbed (PVM)
  • Disturbed and migrated (DPVMmigration)
  • Disturbed but not migrated (DPVMload)

27
NAS CG Benchmark
28
3 tasks in a FE code
29
Status
  • Checkpointer operational under
  • Solaris 2.5.1 and higher (UltraSparc, 32 bit)
  • Linux/i386 2.0 and 2.2 (libc5 and glibc 2.0)
  • PVM 3.3.x applications supported, tested on
  • Pam-Crash (ESI) - car crash simulations
  • CEM3D (ESI) - electro-magnetics code
  • Grail (UvA) - large, simple FEM code
  • NAS parallel benchmarks

30
Dynamite The Grid
  • Critical analysis of usefulness nowadays
  • Popular computing platform Beowulf clusters
  • Typical cluster management strategy space
    sharing
  • Checkpointing multiple tasks or even the whole
    parallel application quite useful for fault
    tolerance or cross-cluster migration
  • File access presents complex problems
  • Dynamic resource requests to the grid

31
Road to Dynamite-G
  • Study and solve issues for cross-cluster
    migration
  • No shared file system
  • Authentication
  • Basic infrastructure stays the same, we only use
    some of the Grid services (remote file access,
    monitoring, scheduling)
  • Full integration with Globus (job submission, job
    management, security)
  • Globus is a moving target

32
Cross-cluster checkpointing
Node A
Node B
Helper task
PVMD A
PVMD B
PVMtask 1
Node C
PVMD C
Program PVM Ckpt
Send checkpoint signal to task 2 Flush
connections, close files Checkpoint task to disk
via helper task
33
Socket and File-based migration in a single
cluster
34
Nodes in two different clusters
35
Performance of socket migration
  • Target file format retained
  • Usually transfer to local disk (/tmp) most
    efficient
  • For migration to local disk no network link is
    crossed more than once
  • Performance depends on network speed, local disk
    speed and memory (cache) of target machine
  • Performance compares well to original mechanism
    (checkpoint to file on file server)
  • Consider mechanism as standard, also for
    in-cluster migration

36
Issues for file access
  • Moving open files with tasks appears least
    complicated solution, but
  • Tasks may open and close files required files
    unknowable at time of migration
  • Task may share a file
  • Files need to be returned after task completion
  • Connect to proxy file server on source cluster
  • Security issues
  • Performance

37
Some other open issues
  • Checkpointing and restarting entire programs
  • Saving communication context
  • Checkpoint-and-stay-alive
  • Cross-cluster migration (target cluster known)
  • Monitoring and scheduling
  • Migration cost vs. performance gain
  • Migrating tasks vs. migrating entire programs
  • Grid
  • When to start looking for a new cluster
  • How best to use available mechanisms

38
Full integration with Globus
  • Upgrade our checkpointer
  • Existing Grid-enabled implementation of MPI,
    MPICH-G2, does not use "ch_p4" communication
    device, it uses its own "globus2"
  • Start from scratch?
  • Support most of the fancy features in MPICH-G2
    such as heterogeneity ?

39
Conclusions
  • Migration of tasks allows
  • optimal task allocation in dynamic environment
  • freeing of nodes
  • Dynamite addresses the problem of migrating tasks
    in parallel programs
  • dynamically linked programs with open files
  • direct and indirect PVM connections
  • MPI expected in near future
  • scheduler needs further work
  • Slight performance penalties in communication and
    migration
  • The road to Dynamite-G is long, but appears
    worthwhile

40
Collaborations
  • UvA
  • Kamil Iskra
  • Dick van Albada
  • ESI
  • Jan Clinckemaillie, Henri Luzet
  • Genias
  • Ad Emmen
  • Univ. Indonesia, Jakarta
  • Chan Bassaruddin, Judhi Santoso
  • IT Bandung
  • Bobby Nazief, Oerip Santoso
  • Univ. Mining Metallurgy, Krakow
  • Marian Bubak, Darius Zbik
  • Univ. Wisconsin
  • Miron Livny
  • State Univ. Mississipi
  • Ioana Banicescu
Write a Comment
User Comments (0)
About PowerShow.com