Iosif Legrand - PowerPoint PPT Presentation

About This Presentation

Title:

Iosif Legrand

Description:

The new MONARC Simulation Framework Iosif Legrand California Institute of Technology – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 31

Provided by: cleg150

Learn more at: http://pcbunn.cacr.caltech.edu

Category:

more less

Transcript and Presenter's Notes

Title: Iosif Legrand

1

The new MONARC Simulation Framework

Iosif Legrand
California Institute of Technology

2
The GOALS of the Simulation Framework

The aim of this work is to continue and improve
the development of the MONARC simulation
framework
To perform realistic simulation and modelling of
large scale distributed computing systems,
customised for specific HEP applications.
To offer a dynamic and flexible simulation
environment to be used as a design tool for
large distributed systems
To provide a design framework to evaluate the
performance of a range of possible computer
systems, as measured by their ability to provide
the physicists with the requested data in the
required time, and to optimise the cost.

3
A Global View for Modelling

MONITORING
REAL Systems
Testbeds
4
Design Considerations

This Simulation framework is not intended to
be a detailed simulator for basic components such
as operating systems, data base servers or
routers.
Instead, based on realistic mathematical models
and measured parameters on test bed systems for
all the basic components, it aims to correctly
describe the performance and limitations of large
distributed systems with complex interactions.

5
Simulation Engine

6
Design Considerations of the Simulation Engine

A process oriented approach for discrete event
simulation is well suited to describe concurrent
running programs.
Active objects (having an execution thread, a
program counter, stack...) provide an easy way
to map the structure of a set of distributed
running programs into the simulation environment.
The Simulation engine supports an interrupt
scheme
This allows effective correct simulation
for concurrent processes with very different time
scale by using a DES approach with a continuous
process flow between events

7
Tests of the Engine
Processing a TOTAL of 100 000 simple jobs in
1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs
using the same number of parallel threads
more tests http//monalisa.cacr.caltech.edu/MONA
RC/
8
Basic Components
9
Basic Components

These Basic components are capable to simulate
the core functionality for general distributed
computing systems. They are constructed based on
the simulation engine and are using efficiently
the implementation of the interrupt functionality
for the active objects .
These components should be considered the basic
classes from which specific components can be
derived and constructed

10
Basic Components

Computing Nodes
Network Links and Routers , IO protocols
Data Containers
Servers
Data Base Servers
File Servers (FTP, NFS )
Jobs
Processing Jobs
FTP jobs
Scripts Graph execution schemes
Basic Scheduler
Activities ( a time sequence of jobs )

11
Multitasking Processing Model

Concurrent running tasks share resources (CPU,
memory, I/O) Interrupt driven scheme For each
new task or when one task is finished, an
interrupt is generated and all processing times
are recomputed.
12
LAN/WAN Simulation Model
Link
Node
LAN
ROUTER
Internet Connections
Interrupt driven simulation for each new
message an interrupt is created and for all the
active transfers the speed and the estimated
time to complete the transfer are recalculated.
ROUTER
Continuous Flow between events ! An efficient and
realistic way to simulate concurrent transfers
having different sizes / protocols.
13
Output of the simulation
Node
Simulation Engine
DB
Output Listener Filters
GRAPHICS
Router
Output Listener Filters
Log Files EXEL
User C
Any component in the system can generate generic
results objects Any client can subscribe with a
filter and will receive the results it is
Interested in . VERY SIMILAR structure as in
MonALISA . We will integrate soon The output of
the simulation framework into MonaLISA
14
Specific Components
15
Specific Components

These Components should be derived from the
basic components and must implement the specific
characteristics and way they will operate.
Major Parts
Data Model
Data Flow Diagrams from Production and
especially for Analysis Jobs
Scheduling / pre-allocation policies
Data Replication Strategies

16
Data Model

Generic Data
Container
Size
Event Type
Event Range
Access Count
INSTANCE

META DATA Catalog Replication Catalog
Network FILE
FILE
Data Base
Custom Data Server
FTP Server Node
DB Server
NFS Server
Export / Import
17
Data Model (2)
META DATA Catalog Replication Catalog
Data Processing JOB
Data Request
Data Container
Select from the options
JOB
List Of IO Transactions
18
Data Flow Diagrams for JOBS
Input and output is a collection of data. This
data is described by type and range
Input
Processing 1
Process is described by name
A fine granularity decomposition of processes
which can be executed independently and the way
they communicate can be very useful for
optimization and parallel execution !
Output
Input
Output
Processing 2
Processing 4
10x
Output
Output
Input
Input
Processing 3
Processing 4
Output
Input
19
Job Scheduling Centralized Scheme
Site A
Site B
JobScheduler
JobScheduler
Dynamically loadable module
GLOBAL Job Scheduler
20
Job Scheduling Distributed Scheme market
model
COST
Site A
Site B
JobScheduler
JobScheduler
Request
DECISION
JobScheduler
Site A
21
Computing Models
22
Activities Arrival Patterns

A flexible mechanism to define the Stochastic
process of how users perform data processing
tasks
Dynamic loading of Activity tasks, which are
threaded objects and are controlled by the
simulation scheduling mechanism
Physics Activities Injecting Jobs
Each Activity thread generates data processing
jobs
These dynamic objects are used to model the users
behavior
23
Regional Centre Model

Complex Composite
Object

Simplified topology of the Centers
D
A
B
E
C
24
Monitoring
25
Real Need for Flexible Monitoring Systems

It is important to measure monitor the Key
applications in a well defined test environment
and to extract the parameters we need for
modeling
Monitor the farms used today, and try to
understand how they work and simulate such
systems.
It requires a flexible monitoring system able to
dynamically add new parameters and provide
access to historical data
Interfacing monitoring tools to get the
parameters we need in simulations in a nearly
automatic way
MonALISA was designed and developed based on the
experience with the simulation problems.

26
Input for the Data Models

We need information related with all the
possible data types,
expected size and distribution.
Which mechanism for data access will be used for
activities
like production and analysis
Flat files and FTP like transfer to the local
disk
Network file system
Data Base access ( batch queries with independent
threads )
Root like file system Client / Server
Web Services
To simulate access to hot spots data into the
system we need
a range of probabilities for such activities

27
Input for how jobs are executed

How the parallel decomposition of a job is done ?
Scheduler using a Job description language,
Master / slaves model (parallel root )
Centralized or distributed job scheduler ?
What types of policies we should consider for
inter-site job scheduling ?
Which data should be replicated ?
Which are the predefined data replication
policies
Should we consider dynamic replication / caching
for (selected) data which are used more
frequently ?

28
Status

The engine was tested (performance and quality)
on several platforms and it is working well.
We developed all the basic components ( CPU,
Servers, DB, Routers, network links, Jobs, IO
Jobs) and we are now testing/debugging them.
A quite flexible output scheme for simulation is
now included
Examples made with specific components for
production and analysis are being tested.
A quite general model for the data catalog and
data replication is under development it will
be soon integrated.

29
Still to de done

Continue the testing of Basic Components ,
Network servers and start modeling and real
farms, Web Services , peer to peer systems .
Improve the Documentation
Improve the graphical output , interface with
MonALISA and create a service to extract
simulation parameters from real-systems
Gather information from the current computing
systems and future possible architectures and
start building the Specific Components
Computing Models scenarios.
Include Risk Analysis into the system
Development / evaluation of different scheduling
and replication strategies

30
Summary

Modelling and understanding current systems,
their performance and limitations, is essential
for the design of the large scale distributed
processing systems. This will require continuous
iterations between modelling and monitoring
Simulation and Modelling tools must provide the
functionality to help in designing complex
systems and evaluate different strategies and
algorithms for the decision making units and the
data flow management.

http//monalisa.cacr.caltech.edu/MONARC/

Write a Comment

User Comments (0)