Iosif Legrand - PowerPoint PPT Presentation

About This Presentation
Title:

Iosif Legrand

Description:

The new MONARC Simulation Framework Iosif Legrand California Institute of Technology – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 31
Provided by: cleg150
Category:

less

Transcript and Presenter's Notes

Title: Iosif Legrand


1

The new MONARC Simulation Framework
  • Iosif Legrand
  • California Institute of Technology

2
The GOALS of the Simulation Framework
  • The aim of this work is to continue and improve
    the development of the MONARC simulation
    framework
  • To perform realistic simulation and modelling of
    large scale distributed computing systems,
    customised for specific HEP applications.
  • To offer a dynamic and flexible simulation
    environment to be used as a design tool for
    large distributed systems
  • To provide a design framework to evaluate the
    performance of a range of possible computer
    systems, as measured by their ability to provide
    the physicists with the requested data in the
    required time, and to optimise the cost.

3
A Global View for Modelling

MONITORING
REAL Systems
Testbeds
4
Design Considerations
  • This Simulation framework is not intended to
    be a detailed simulator for basic components such
    as operating systems, data base servers or
    routers.
  • Instead, based on realistic mathematical models
    and measured parameters on test bed systems for
    all the basic components, it aims to correctly
    describe the performance and limitations of large
    distributed systems with complex interactions.

5
Simulation Engine

6
Design Considerations of the Simulation Engine
  • A process oriented approach for discrete event
    simulation is well suited to describe concurrent
    running programs.
  • Active objects (having an execution thread, a
    program counter, stack...) provide an easy way
    to map the structure of a set of distributed
    running programs into the simulation environment.
  • The Simulation engine supports an interrupt
    scheme
  • This allows effective correct simulation
    for concurrent processes with very different time
    scale by using a DES approach with a continuous
    process flow between events

7
Tests of the Engine
Processing a TOTAL of 100 000 simple jobs in
1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs
using the same number of parallel threads
more tests http//monalisa.cacr.caltech.edu/MONA
RC/
8
Basic Components
9
Basic Components
  • These Basic components are capable to simulate
    the core functionality for general distributed
    computing systems. They are constructed based on
    the simulation engine and are using efficiently
    the implementation of the interrupt functionality
    for the active objects .
  • These components should be considered the basic
    classes from which specific components can be
    derived and constructed

10
Basic Components
  • Computing Nodes
  • Network Links and Routers , IO protocols
  • Data Containers
  • Servers
  • Data Base Servers
  • File Servers (FTP, NFS )
  • Jobs
  • Processing Jobs
  • FTP jobs
  • Scripts Graph execution schemes
  • Basic Scheduler
  • Activities ( a time sequence of jobs )

11
Multitasking Processing Model

Concurrent running tasks share resources (CPU,
memory, I/O) Interrupt driven scheme For each
new task or when one task is finished, an
interrupt is generated and all processing times
are recomputed.
12
LAN/WAN Simulation Model
Link
Node
LAN
ROUTER
Internet Connections
Interrupt driven simulation for each new
message an interrupt is created and for all the
active transfers the speed and the estimated
time to complete the transfer are recalculated.
ROUTER
Continuous Flow between events ! An efficient and
realistic way to simulate concurrent transfers
having different sizes / protocols.
13
Output of the simulation
Node
Simulation Engine
DB
Output Listener Filters
GRAPHICS
Router
Output Listener Filters
Log Files EXEL
User C
Any component in the system can generate generic
results objects Any client can subscribe with a
filter and will receive the results it is
Interested in . VERY SIMILAR structure as in
MonALISA . We will integrate soon The output of
the simulation framework into MonaLISA
14
Specific Components
15
Specific Components
  • These Components should be derived from the
    basic components and must implement the specific
    characteristics and way they will operate.
  • Major Parts
  • Data Model
  • Data Flow Diagrams from Production and
  • especially for Analysis Jobs
  • Scheduling / pre-allocation policies
  • Data Replication Strategies

16
Data Model
  • Generic Data
  • Container
  • Size
  • Event Type
  • Event Range
  • Access Count
  • INSTANCE

META DATA Catalog Replication Catalog
Network FILE
FILE
Data Base
Custom Data Server
FTP Server Node
DB Server
NFS Server
Export / Import
17
Data Model (2)
META DATA Catalog Replication Catalog
Data Processing JOB
Data Request
Data Container
Select from the options
JOB
List Of IO Transactions
18
Data Flow Diagrams for JOBS
Input and output is a collection of data. This
data is described by type and range
Input
Processing 1
Process is described by name
A fine granularity decomposition of processes
which can be executed independently and the way
they communicate can be very useful for
optimization and parallel execution !
Output
Input
Output
Processing 2
Processing 4
10x
Output
Output
Input
Input
Processing 3
Processing 4
Output
Input
19
Job Scheduling Centralized Scheme
Site A
Site B
JobScheduler
JobScheduler
Dynamically loadable module
GLOBAL Job Scheduler
20
Job Scheduling Distributed Scheme market
model
COST
Site A
Site B
JobScheduler
JobScheduler
Request
DECISION
JobScheduler
Site A
21
Computing Models
22
Activities Arrival Patterns

A flexible mechanism to define the Stochastic
process of how users perform data processing
tasks
Dynamic loading of Activity tasks, which are
threaded objects and are controlled by the
simulation scheduling mechanism
Physics Activities Injecting Jobs
Each Activity thread generates data processing
jobs
These dynamic objects are used to model the users
behavior
23
Regional Centre Model
  • Complex Composite
    Object

Simplified topology of the Centers
D
A
B
E
C
24
Monitoring
25
Real Need for Flexible Monitoring Systems
  • It is important to measure monitor the Key
    applications in a well defined test environment
    and to extract the parameters we need for
    modeling
  • Monitor the farms used today, and try to
    understand how they work and simulate such
    systems.
  • It requires a flexible monitoring system able to
    dynamically add new parameters and provide
    access to historical data
  • Interfacing monitoring tools to get the
    parameters we need in simulations in a nearly
    automatic way
  • MonALISA was designed and developed based on the
    experience with the simulation problems.

26
Input for the Data Models
  • We need information related with all the
    possible data types,
  • expected size and distribution.
  • Which mechanism for data access will be used for
    activities
  • like production and analysis
  • Flat files and FTP like transfer to the local
    disk
  • Network file system
  • Data Base access ( batch queries with independent
    threads )
  • Root like file system Client / Server
  • Web Services
  • To simulate access to hot spots data into the
    system we need
  • a range of probabilities for such activities

27
Input for how jobs are executed
  • How the parallel decomposition of a job is done ?
    Scheduler using a Job description language,
  • Master / slaves model (parallel root )
  • Centralized or distributed job scheduler ?
  • What types of policies we should consider for
    inter-site job scheduling ?
  • Which data should be replicated ?
  • Which are the predefined data replication
    policies
  • Should we consider dynamic replication / caching
    for (selected) data which are used more
    frequently ?

28
Status
  • The engine was tested (performance and quality)
    on several platforms and it is working well.
  • We developed all the basic components ( CPU,
    Servers, DB, Routers, network links, Jobs, IO
    Jobs) and we are now testing/debugging them.
  • A quite flexible output scheme for simulation is
    now included
  • Examples made with specific components for
    production and analysis are being tested.
  • A quite general model for the data catalog and
    data replication is under development it will
    be soon integrated.

29
Still to de done
  • Continue the testing of Basic Components ,
    Network servers and start modeling and real
    farms, Web Services , peer to peer systems .
  • Improve the Documentation
  • Improve the graphical output , interface with
    MonALISA and create a service to extract
    simulation parameters from real-systems
  • Gather information from the current computing
    systems and future possible architectures and
    start building the Specific Components
    Computing Models scenarios.
  • Include Risk Analysis into the system
  • Development / evaluation of different scheduling
    and replication strategies

30
Summary
  • Modelling and understanding current systems,
    their performance and limitations, is essential
    for the design of the large scale distributed
    processing systems. This will require continuous
    iterations between modelling and monitoring
  • Simulation and Modelling tools must provide the
    functionality to help in designing complex
    systems and evaluate different strategies and
    algorithms for the decision making units and the
    data flow management.

http//monalisa.cacr.caltech.edu/MONARC/
Write a Comment
User Comments (0)
About PowerShow.com