Title: Making%20Parallel%20Processing%20on%20Clusters%20Efficient,%20Transparent%20and%20Easy%20for%20Programmers
1Making Parallel Processing on Clusters Efficient,
Transparent and Easy for Programmers
- Andrzej M. Goscinski
- School of Computing and Mathematics
- Deakin University
- Joint work with Michael Hobbs. Jackie Silcock and
Justin Rough
2Overview and Aims
- Basic issues and solutions
- Parallel processing user expectations, clusters,
phases - Parallelism management
- Transparency
- Communication paradigms
- What to do?
- Related systems
- Cluster Execution environments
- Middleware
- Cluster operating systems
- GENESIS
- Architecture
- Services for parallelism management and
transparency - GENESIS programming interface
- Message passing
- DSM
- Primitives
- Easy to Use and Program Environment
- Performance Study
- Summary and Future Work
3Parallel ProcessingUser Expectations
- Affordable
- Supercomputers for a poor man
- Performance
- Good performance
- Ease of Use
- Free from creation and placement concerns
- Transparency
- Unaware of location of processes
- Ease of Programming
- Choice and easy use of communication paradigm
4Parallel ProcessingClusters
- Advantages
- Cheap to build commodity PCs, networks
- Widely available
- Idle during weekends
- Low utilization during working hours
- Disadvantages
- Poor and difficult to use software (operating
systems and runtime systems) - User unfriendly
- Distribution of resources (CPUs and peripherals)
- Clusters are an ideal platform for the execution
of parallel applications - Many institutions (universities, banks,
industries) move toward homogeneous non-dedicated
clusters
5Parallel Processing Phases
- Three distinct phases
- Initialization
- Execution
- Termination
- Researchers and manufacturers mainly concentrate
on execution to achieve the best performance - Ease of use of parallel systems and programmers
time are neglected - Application developers are discouraged as they
have to program many activities, which are of an
operating system nature
6Parallelism Management
- Present operating systems that manage clusters
are not built to support parallel processing - Reason these operating systems do not provide
services to manage parallelism - Parallelism management is the management of
parallel processes and computational resources - Achieve high performance
- Use computational resources efficiently
- Make programming and use of parallel systems easy
7Parallelism Management
- Parallelism management in parallel programming
tools, Distributed Shared Memory and enhanced
operating system environments - has been neglected
- left to the application developers
- Application developers must deal
- not only with parallel application development
- but also with the problems of initiation and
control for the execution on the cluster - Transparency and reliability (SSI) have been
neglected users do not see a cluster as a
single powerful computer
8Services for Parallelism Management on Clusters
- Services for parallelism management and
transparency - Establishment of a virtual machine
- Mapping of processes to computers
- Parallel processes instantiation
- Data (including shared) distribution
- Initialisation of synchronization variables
- Coordination of parallel processes
- Dynamic load balancing
9Transparency
- Users should see a cluster as a single powerful
computer - Dimensions of parallel processing transparency
- Location transparency
- Process relation transparency
- Execution transparency
- Device transparency
10Communication Paradigms
- Two communication paradigms
- Message Passing (MP)Explicit communication
between processes of a parallel application - Fast
- Difficult to use for programmers
- Distributed Shared Memory (DSM)Implicit
communication between processes of a parallel
application through shared memory objects - Easy to use
- Demonstrates reduced performance
- Claim Operating environments that offer MP and
DSM should be provided as a part of a cluster
operating system as they manage system resources
11What to do?
- Affordable
- Clusters
- Performance
- Introduce special services
- Ease of Use
- Parallelism management
- Transparency
- Operating systems
- Ease of Programming
- Message passing and DSM
- Development of cluster operating systems
supporting parallel processing - Services of cluster operating systems
- Distributed services for transparent
communication and management of basic system
resources - Services for parallelism management and
transparency
12Related SystemsMessage Passing Systems
- PVM
- A set of cooperating server processes and
specialized libraries that support process
communication, execution and synchronization - A virtual machine must be set up by the user
- Provides transparent process creation and
termination - MPI
- Objective is to standardize and coordinate the
direction of various message passing
applications, tools and environments - Provides limited process management functions to
support parallel processing - HARNESS
- Does not provide transparency
- Programmers are forced to specify computers, map
processes to these computers - Load imbalance is neglected
13Related SystemsDSM Systems
- Research concentrates mainly on improving
performance - Ease of use has been neglected
- Munin
- Programmers must label different variables
according to the consistency protocol they
require - The initialisation stage requires the application
developer to define the number of computers to be
used - Programmers must create a thread on each
computer, initialise shared data and create
synchronization variables - TreadMarks
- The application developer has a substantial input
into initialisation of DSM processes - Full transparency is not provided
14Related SystemsExecution Environments
- Improvement to PVM, MPI and DSM approach of
running on top of an operating system is through
the enhancement of an operating system to support
parallel processing - Beowulf
- Exploits distributed process space to manage
parallel processes - Processes can be started on remote computers
after logon operation into that computer was
completed successfully - It does not address resource allocation nor load
balancing - Transparent process migration is not provided
15Related SystemsExecution Environments
- NOW
- Combines specialized libraries and server
processes with enhancement to the kernel - Enhancement scheduling and communication kernel
modules- GLUnix to provide network wide process,
file and VM management - Parallelism management service process
initialisation on any cluster computer, support
semi-transparent start of parallel processes on
multiple nodes (how to select nodes?), barriers,
MPI - MOSIX
- Provides enhanced and transparent communication
and scheduling within the kernel - Employs PVM to provide parallelism support
(initial placement) - Process migration transparently migrates
processes - Provides dynamic load balancing and data
collection - Remote communication is handled through the
originating computer
16Related SystemsSummary
- All systems but MOSIX are based on middleware
there is no trial to develop a comprehensive
operating system to support parallel processing
on clusters - The solutions are performance driven little
work has been done on making them programmer
friendly - Problems from parallel processing point of view
- Processes are created one at a time although
primitives provided enable the user to create
multiple processes - These systems (with the exception of MOSIX) do
not provide complete transparency - Virtual machine is not set up automatically
- These systems do not provide load balancing
17Cluster Execution Environments
- Execution environments that support parallel
processing on clusters can be developed using - Middleware approach at the application level
- Underware at the kernel level
18Middleware
19Middleware - summary
- Middleware allows programmers
- to develop parallel application (PVM, MPI)
- execute parallel applications on clusters
(Beowulf) - employ shared memory based programming (Munin)
- achieve good execution performance
- take advantage of portability
- Middleware
- does not offer complete transparency
- reduces potential execution performance (services
are duplicated) - forces programmers to be involved in many time
consuming and error prone activities that are of
the operating system nature - Conclusion to provide parallelism management,
offer transparency, make programming and use of a
system easy develop the needed services at the
operating system level
20Cluster operating systems
- Cluster is a special kind of a distributed system
- Cluster operating system supporting parallel
processing should - possess the features of a distributed operating
system to deal with distributed resources and
their management and hide distribution - exploit additional services to manage parallelism
for application and offer complete transparency - provide an enhanced programming environment
- Three logical levels of a cluster operating
system - Basic distributed operating system
- Parallelism management and transparency system
- Programming environment
21Logical architecture of a cluster operating
system
22GENESIS Cluster Operating System
- Proof of concept
- Client-server model, microkernel approach and
object based approach (all entities have names) - All basic resources processor, main memory,
network, interprocess communication, files are
managed by relevant servers - IPC - Message passing services
- basic communication paradigm
- cornerstone of the architecture
- provided by IPC Manager and local IPC component
of microkernel - IPC placement and relationship with other
services designed to achieve high performance and
transparency - DSM provided by Space (memory) and IPC Managers
23The GENESIS Architecture
24GENESIS Services for Parallelism Management and
Transparency
- Basic services that provide parallelism
management and offer transparency - Establishment of a virtual machine
- Process creation
- Process duplication
- Process migration
- Global scheduling
25Establishment of a Virtual Machine
- Resource Discovery Server supports adaptive
establishment of a virtual machine - Resource Discovery Server
- Identifies
- Idle and lightly loaded computers
- Computer resources e.g., processor model, memory
size - Computational load and available memory
- Communication patterns for each process
- Passes information to the Global Scheduling
Server per - Process
- Server
- Averaged over an entire cluster
- Virtual machine changes dynamically
- Some computers become overloaded or out of order
- Some computers become idle
26Process Creation
- Requirements
- Multiple process creation to create many
instances of a process on a single or over many
computers - Scalability must be scalable to many computers
- Complete transparency must hide the location of
all resources and processes - Three forms of process creation
- Single
- Multiple
- Group
- Creation is invoked when the Execution Manager
receives a process create request from a parent
process - Execution Manager notifies Global Scheduler
- Global Scheduler sends location on which process
should be created - Execution Manager on selected computer manages
process creation
27Process CreationSingle and Multiple Services
- Single process creation service
- Similar to the services found in traditional
systems supporting parallel processing - Requires executable image to be downloaded from
disk for each parallel process to be created - Multiple process creation service
- Supports the concurrent instantiation of a number
of processes on a given computer through one
creation call - When many computers are involved in multiple
process creation, each computer is addressed in a
sequential manner - Executable image of a parallel child process must
be downloaded separately for each computer
involved scalability problem
28Process CreationGroup
- Group process creation combines multiple process
creation and group communication - Group process creation service
- allows multiple process to be created
concurrently on many computers - Single executable is downloaded from a file
server using group communication
29Group Process CreationBehavior
30Process DuplicationSingle Local and Remote
- Parallel processes are instantiated on selected
computers by employing process duplication
supported by process migration - Three forms of process duplication
- Single local and remote
- Multiple local and remote
- Group remote
- Single local and remote process duplication
- Duplication is invoked when the Execution Manager
receives a twin request from a parent process - Execution Manager notifies Global Scheduler
- Global Scheduler sends a location on which twin
should be placed - If this computer is remote process migration is
employed
31Process DuplicationMultiple Local and Remote
- Multiple local and remote process duplication is
an enhancement of single process duplication - Duplication is invoked when the Execution Manager
receives a multiple duplication request from a
parent process - Execution Manager notifies Global Scheduler
- Global Scheduler sends a location on which twin
should be placed - If computer is local
- Process Manager and Space Manager are requested
to duplicate multiple copies of process entries
and memory spaces - If computer is remote
- the parent process is migrated to this
destination - multiple copies of the parent process are
duplicated - the parent process on the remote computer is
killed - Child processes should be duplicated on many
computers - Remote process duplication is performed for each
selected computer
32Process DuplicationGroup Remote
- When more than one remote computer is involved in
process duplication the overall performance
decreases - Decrease is caused by migrating a parent process
to each remote computer sequentially - Performance is improved by employing group
process migration - Process Managers and Execution Managers each join
a relevant group and use group communication - The parent process is concurrently migrated to
all selected remote computers involved in process
duplication
33Group Remote Process DuplicationBehavior
34Process Migration
- Designed to separate policy from mechanism
- Process Migration Manager acts as the coordinator
for migration of various resources that combine
to form a process - Migration of resources memory, process entries,
buffers is carried out by the Space, Process and
IPC Managers, respectively - Two forms of process migration single and group
- Single process migration
- Global Scheduler provides which process to
where computer - Local Manager requests its remote peer to prepare
for a process - Local Migration Manager requests Space, Process
and IPC Managers to migrate respective resources - Remote Manager informs its local peer of
successful migration - Local Manager requests Space, Process and IPC
Managers to delete the respective resources of
the migrated process
35Process MigrationBehavior
36Group Process Migration
- Enhancement of the single process migration
- Modifying the single communication between the
peer Migration Managers, Process Managers, Space
Managers and IPC Managers to that of group
communication - Global Scheduler provides which process to
where computers - Each server migrates their respective resources
to multiple destination computers in a single
message using group communication - Parent process is duplicated on each remote
computer - At the end of successful migration the parent
process on each remote computer is killed
37Global Scheduling
- Makes policy decisions of which processes should
be mapped to which computers - Input provided by the Resource Discovery Manager
- Relies on mechanisms of
- Single, multiple an group process creation and
duplication services - Single and group process migration
- The server combines services of
- Static allocation at the initial stage of
parallel processing - Dynamic load balancing to react to load
fluctuations - Currently, the Global Scheduler is implemented as
a centralized server
38GENESIS Programming Interface
- Designed and developed to provide both
communication paradigms - Message passing
- Shared memory
39Message Passing
- Basic Message Passing
- Exploits basic interprocess communication
concepts - Transparent and reliable local and remote IPC
- Integral component of GENESIS
- Offers standard message passing and RPC
primitives - GENESIS PVM
- PVM added to provide a well known parallelism
programming tool - Ported from the UNIX based PVM
- Implemented within a library in GENESIS
- Mapping of the standard PVM services onto the
GENESIS services - Performance improvement of PVM on GENESIS
- No additional classic PVM server processes
required - Direct interprocess communication model instead
of the default model - Load balancing provided
40Architecture of PVM on Unix
41Architecture of PVM on GENESIS
42Distributed Shared Memory
- DSM is an integral component of the operating
system - Since DSM is a memory management function the DSM
system is integrated into the Space Manager - Shared memory used as though it were physically
shared - Easy to use shared memory
- Low overhead, improved performance
- Two consistency models supported
- Sequential implemented using invalidation model
- Release implemented using write-update model
- Synchronization and coordination of processes
- Semaphores - owned by Space Manager on
particular computer - Gaining ownership is distributed and mutually
exclusive - Barriers used for coordination their management
is centralized
43Distributed Shared Memory
44GENESIS Primitives Execution
- Two groups of primitives
- to support execution services
- for the provision of communication and
coordination services
45GENESIS PrimitivesCommunication and Coordination
46Easy to Use and Program Environment
- GENESIS system
- Provides and efficient and transparent
environment for execution of parallel
applications - Offers transparency
- Relieves programmers from activities such as
- Selection of computers for a virtual a machine
for the given application - Setting up a virtual machine
- Mapping processes to virtual machine
- Process instantiation using process creation and
duplication supported by process migration - Load balancing
47Easy to Use and Program Environment
- In the GENESIS system
- Location of the remote computer(s) of the cluster
is selected automatically by Global Scheduler - Users do not know process location
- Programming of parallel applications has been
made easy by providing - Message passing standard and PVM
- Distributed Shared Memory
- Powerful primitives implement sequences of
operations and provide transparency
process_ncreate(GROUP_CREATE,n, child_prog) - Process instantiation using process creation and
duplication supported by process migration - Load balancing
48Performance of Standard Parallel Applications
- GENESIS System
- 13 Sun3/50 Workstations
- 12 Computation 1 File Server
- 10 Mbit/sec shared Ethernet
- Influence of process instantiation on execution
performance - GENESIS PVM vs. Unix PVM
- Standard parallel applications
- Successive Over Relaxation
- Quicksort
- Traveling Salesman Problem
49Influence of Process Instantiation on Execution
PerformanceParallel Simulation (5, 25, 50 Second
Workload)
- Simulation - amount of work relates to the
overall exec time - Two parameters
- Work load (5, 25, 50 Seconds)
- Number of workstations (1 ..12)
- Global scheduler migration
- Speedups for comp proc
50GENESIS PVM vs. Unix PVMIPC Latency
- Support for IPC provided by the PVM server in
Unix was substituted with GENESIS operating
system mechanisms - To measure the time saved by removing the server,
a simple PVM application that exchanges messages
(1kbyte 100kbytes) was used - Round-trip time (including data packing and
unpacking) was measured
51GENESIS PVM vs. Unix PVMSpeedup
- Application used to study the influence of
process instantiation - amount of work relates to
the overall exec time was studied - Parameters
- Number of workstations
- GENESIS with and without load balancing
52Successive Over Relaxation
- Parallel applications developed based on
algorithms of Rice University - Rice superior cluster hardware DEC
station-5000/240 fast ATM net - For 8 computers array size Rice - 512 x 2048
elements with 101 iterations GENESIS 128 x 128
elements with 10 iterations - DSM TreadMarks 6.3 GENESIS 4.4
- PVM Rice 6.91 GENESIS 5.1
53Quicksort
- Parallel applications developed based on
algorithms of Rice - Rice superior cluster hardware DEC
station-5000/240 fast ATM net - For 8 computers array size Rice - 256 x 1024
integers GENESIS 256 x 256 integers - DSM TreadMarks 5.3 GENESIS 2.5
- PVM Rice 6.79 GENESIS 6.07
54Traveling Salesman Problem
- Parallel applications developed based on
algorithms of Rice University - Rice superior cluster hardware DEC
station-5000/240 fast ATM net - For 8 computers 18 city tour with the minimum
threshold set to 13 cities - DSM TreadMarks 4.74 GENESIS 6.33
- PVM Rice 5.63 GENESIS 5.94
55Summary
- Nondedicated clusters are commonly available
- Force application developers to program operating
system operations - Do not offer transparency
- Application developers need a computer system
that - Processes applications efficiently
- Uses cluster resources well
- Allows to see cluster as a single powerful
computer rather than as a set of connected
computers - Proposal employ a cluster operating system
- Design cluster operating system with three
logical levels - Distributed operating system
- Parallelism management and transparency system
- Programming environment
56Summary
- GENESIS designed and developed as a proof of
concept - GENESIS is a system that satisfies user
requirements - GENESIS approach is unique
- Offers both message passing (MP and PVM) and DSM
environment - Services providing parallelism management are
integral components of an operating system - Provides a comprehensive environment to
transparently manage system resources - Programmers do not have to be involved in
parallelism management - Use of the cluster is has been made easy
- Complete transparency is offered
- Good performance results have been achieved
57Future Work
- Port GENESIS to an Intel like platform
- Use virtual memory to support DSM
- Offer reliable parallel computing services on
clusters by employing - Reliable group communication
- Checkpointing to offer fault tolerance