Title: The Tool Daemon Protocol: Defining the Interface Between Tools and Process Management Systems
1The Tool Daemon ProtocolDefining the Interface
Between Tools and Process Management Systems
- Paradyn Group
- Condor Group
- paradyn,condor_at_cs.wisc.eduComputer Sciences
Department - University of Wisconsin
- Madison, Wisconsin 53705
- USA
Ana Cortés Miquel A. Senar miquelangel.senar,ana.
cortes_at_uab.es Departament dInformà tica Universit
at Autònoma de Barcelona Barcelona, Spain
Presented by Philip C. Roth pcroth_at_cs.wisc.edu
2The Current Situation
- Consider a job submitted to a process management
system (e.g., Condor, PBS, Globus, MPICHs
MPD)the process manager
starts the jobs processes Sets up file I/O Sets
up standard I/O monitors process
status controls the job
ProcessManager Daemon
monitor/ control
Application Process
Application Process
3The Current Situation
- Next, consider a tool wanting to monitor the job.
The tool
also may want to start the processes (or attach
to them) also needs to monitors process
status also may want to control the job also
may want access to file I/O or standard
I/O needs to communicate with its front-end
ProcessManager Daemon
Tool Daemon
monitor/ control
?
?
Application Process
Application Process
4The Current Situation
ProcessManager Daemon
Tool Daemon
monitor/ control
?
?
Application Process
Application Process
5The Current Situation
- Process managers are many and varied
- E.g., IBM POE, SGI Origin MPI and MPICH all work
differently - Some process managers have support for specific
tools - E.g., MPICH support for TotalView debugger
- Heading for an m ? n combination of m process
managers and n tools
Bottom line need a standard interface for
process managers and tools to coexist The Tool
Daemon Protocol (TDP)
6TDP The Tool Daemon Protocol
- Defines an API between process management system
and tool processes for - Creating processes
- Controlling processes
- Sharing information between processes
- Pilot implementationtrying out ideas to see what
works
7TDP Job Startup Sequence
- Tool submits job request to process management
system
Execution Host
Local Host
Process ManagerDaemon
Tool Front-End
Create job
8TDP Job Startup Sequence
- Process manager creates application processes,
leaving it suspended (pause on exec)
Execution Host
Local Host
Process ManagerDaemon
Tool Front-End
Application Process
9TDP Job Startup Sequence
- PM daemon creates tool daemon process (if
necessary)
Execution Host
Local Host
ProcessManagerDaemon
Tool Front-End
TDP
Application Process
Tool Daemon
10TDP Job Startup Sequence
- PM daemon passes information to tool daemon
(e.g., process pid,front-end host/port, standard
I/O host/port)
Execution Host
Local Host
ProcessManagerDaemon
Tool Front-End
PID,host/port pairs
Application Process
Tool Daemon
11TDP Job Startup Sequence
- Tool daemon examines the application process
(e.g., parses symbols,discovers static call
graph)
Execution Host
Local Host
ProcessManager Daemon
Tool Front-End
Application Process
Tool Daemon
12TDP Job Startup Sequence
- App process is allowed to run
Execution Host
Local Host
ProcessManagerDaemon
Tool Front-End
Application Process
Tool Daemon
13TDP Pilot Implementation
- Goals
- To try out TDP ideas and see what makes sense in
real environment - To collect informed suggestions for a standard
- The software
- Two well-established packages at U.
Wisconsin-Madison - Paradyn performance tool
- Condor resource management system
14Challenges
- Process startup
- Notification of exited processes
- Inter-process communication
- Mechanism
- Identification of information to be transferred
- Asynchronous notifications
- Private networks and firewalls
- Tool daemon communicating to front-end
- Application process sending standard I/O
15Challenge Process Startup
- Most functionality already in place, but not in
the right place - Need to refactor process startup logic between
process manager daemon and tool daemon - Control handoff (process manager daemon to tool
daemon) difficult under some OSs - E.g., Linuxtwo scheduling race conditions
between application process and tool daemon
16Challenge Exit Process Notification
- Want the starter to be aware if the app or tool
daemon process exits - Process exit notification (e.g., SIGCHLD to the
parent under UNIX/Linux)
starter
SIGCHLD
SIGCHLD
Parent of
Parent of
paradynd
App
17Challenge Exit Process Notification
- Parental relationships may change when tool
daemon attaches - E.g., Linuxdaemon process becomes app process
parent - On app process termination, SIGCHLD sent to
paradynd, NOT to the Condor starter
starter
Parent of
paradynd
App
Parent of
SIGCHLD
18Challenge Exit Process Notification
- SIGCHLD delivered to Condor starter only if
paradynd calls wait() - Condor must trust monitoring daemon or poll the
application process state
starter
SIGCHLD
paradynd
App
19Challenge Information Transfer
- Attribute Space
- name, value pairs shared between processes
- Mainly, intra-host sharing between process
manager daemon and tool daemon - Also tool front-end, daemon sharing
- E.g., application PIDs for front end
- Basic idea from MPICH
- Not a Linda tuple space
- Not a global shared environment space
20Attribute Space (Execution Host)
ProcessManagerDaemon
tdp_put(PID, 2473)tdp_put(FE_host,
cham.cs.wisc.edu)tdp_put(FE_port, 7331)
PID2473FE_hostcham.cs.wisc.eduFE_port7331
Application Process
Tool Daemon
21Attribute Space (Execution Host)
ProcessManagerDaemon
PID2473FE_hostcham.cs.wisc.eduFE_port7331
Tool Daemon
tdp_get(PID)tdp_get(FE_host) tdp_get(FE_port
)
Application Process
22Challenge Asynchronous Notification
- Uses attribute space
- In process interested in event notification,
register action - tdp_register_notify(handle, event, action)
- In event-generating process, deliver event to
attribute space - tdp_put(event,value)
- Value available in action function
23Challenge Firewalls and Private Nets
Remote Host
Local Host
ProcessManagerDaemon
Tool Front-End
X
Tool Daemon
Firewall
Application Process
24Challenge Firewalls and Private Nets
Remote Host
Local Host
ProcessManagerDaemon
Comm Proxy
Tool Front-End
Tool Daemon
Firewall
Application Process
25Status
- Pilot implementation nearly complete
- Paradyn with jobs submitted to Condor
- Linux 2.4
- Create process model
- Condor vanilla and MPI universes
- Remaining work library packaging, documentation
- Periodic planning meetings
- MPICH (Butler, Gropp, Lusk)
- Etnus (Cownie, Delsignore)
- Globus (Kesselman)
- HP/Compaq (Balle)
- Pallas (Vampir group)
- Paradyn (Miller)
- Condor (Livny)
- U. Barcelona (Cortés, Senar)
- TUM (Wismüller)
- U. Vienna (Fahringer)
- U. Tennessee (Moore)
26The Path Forward
- Identify necessary information exchange between
principals - Complete design, implement attribute space as
standalone package - Get other tool builders, process management
system builders involved - Integrate TDP ideas into their systems to see
what works
27Summary
- TDP standardizes the interface between process
management systems and tools - API for tools and management systems
- Support libraries
- Distributed attribute space
- Avoids the propagation of tool- and process
manager-specific interfaces - Pilot implementation nearly complete
28TDP The Tool Daemon Protocol
- It is the early stages of this important
effortwe want your participation! - Draft report in progressavailable for review and
comments soon - Web http//www.cs.wisc.edu/tdp
- Email tdp_at_cs.wisc.edu