Workflow management within DIET - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Workflow management within DIET

Description:

Workflow management within DIET. Rapha l Bolze. LIP ENS Lyon, CNRS. INRIA Rh ne-Alpes, ... Distributed Interactive Engineering Toolbox. RPC and grid-computing : ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 41
Provided by: frdr73
Category:

less

Transcript and Presenter's Notes

Title: Workflow management within DIET


1
Workflow management within DIET
  • Raphaël Bolze
  • LIP ENS Lyon, CNRSINRIA Rhône-Alpes,
  • GRAAL project
  • http//graal.ens-lyon.fr

2
Introduction
  • Distributed Interactive Engineering Toolbox
  • RPC and grid-computing gridRPC
  • DIET goals
  • DIET environment architecture
  • Request management
  • Research topics features
  • DIET and workflow management
  • Needs
  • Language
  • Architectures
  • Scheduling propose
  • Target applications
  • PipeAlign
  • Docking
  • Robinson
  • Cosmology
  • Current works

3
Distributed Interactive Engineering Toolbox

4
RPC and Grid-Computing GridRPC
  • One simple idea
  • One simple (and efficient) paradigm for grid
    computing offering (or leasing) computational
    power and/or storage capacity through the
    Internet
  • One simple solution implementing the RPC
    programming model over the Grid
  • Using resources accessible through the network
  • Mixed parallelism model (data-parallel model at
    server level and task parallelism between
    servers)
  • Features needed
  • Load-balancing (resource localization and
    performance evaluation, scheduling),
  • Data and replica management,
  • Security,
  • Fault-tolerance,
  • Interoperability with other systems,
  • Design of a standard interface
  • within the GGF/OGF (GridRPC WG, C. Lee)
  • www.ogf.org, forge.gridforum.org/projects/gridrpc-
    wg
  • Existing implementations GridSolve, Ninf, DIET,
    XtremWeb

5
RPC and Grid Computing Grid RPC
AGENT(s)
Client
Op(C, A, B)
S4
S3
S1
S2
6
DIETs Goals
  • Our goals
  • To develop a toolbox for the deployment of
    environments using the Application Service
    Provider (ASP) paradigm with different
    applications
  • Use as much as possible public domain and
    standard software
  • To obtain a high performance and scalable
    environment
  • Implement and validate our more theoretical
    results
  • Scheduling for heterogeneous platforms, data
    (re)distribution and replication, performance
    evaluation, algorithmic for heterogeneous and
    distributed platforms,
  • Based on CORBA, NWS, LDAP, and our own software
    developments
  • CoRI for performance evaluation,
  • FAST
  • CoRI-easy
  • LogService for monitoring,
  • VizDIET for the visualization,
  • GoDIET for the deployment
  • Several applications in different fields
    (simulation, bioinformatic, cosmological
    application)
  • Release 2.1 available on the web
  • Release 2.2 coming soon

http//graal.ens-lyon.fr/DIET/
7
DIET Environment
CLIENT
8
DIET Architecture
Client
Master Agent
MA
ServerDeamons
LA
LA
Local Agent
9
Requests Management
estimate() predExecTime()
10
Research Topics
  • Scheduling
  • Distributed scheduling
  • Plug-in schedulers
  • Data-management
  • Scheduling of computation requests and links with
    data-management
  • Replication, data prefetching
  • Deployment
  • Mapping components on available (selected)
    resources
  • Software platform deployment with or without
    dynamic connections between components
  • Performance evaluation
  • Application modeling
  • Dynamic information about the platform (network,
    clusters)
  • Fault Tolerance
  • Failure Detection
  • Application recovery

11
Scheduling
12
DIET Scheduling
  • SeD level
  • Performance estimation function
  • Estimation metric vector (estVector_t) - dynamic
    collection of performance estimation values
  • Performance measures available through DIET
  • FAST-NWS performance metrics
  • Time elapsed since the last execution
  • CoRI (Collector of Resource Information)
  • Developer defined values
  • Standard estimation tags for accessing the
    fields of an estVector_t
  • EST_FREEMEM
  • EST_TCOMP
  • EST_TIMESINCELASTSOLVE
  • EST_FREECPU
  • Aggregation Methods
  • Defining mechanism how to sort SeD responses
    associated with the service and defined at SeD
    level
  • Tunable comparison/aggregation routines for
    scheduling
  • Priority Scheduler
  • Performs pairwise server estimation comparisons
    returning a sorted list of server responses
  • Can minimize or maximize based on SeD estimations
    and taking into consideration the order in which
    the request for those performance estimations was
    specified at SeD level.

13
DIET Scheduling
  • Collector of Resource Information (CoRI)
  • CoRI-Easy provides basic measurements of the
    environment
  • CoRI Manager manage the use of different
    collectors

Other Collectors like Ganglia
FAST Software
14
Data management
15
Data/replica management
  • Two needs
  • Keep the data in place to reduce the overhead of
    communications between clients and servers
  • Replicate data whenever possible
  • Two approaches for DIET
  • DTM (LIFC, Besançon)
  • Hierarchy similar to the DIETs one
  • Distributed data manager
  • Redistribution between servers
  • JuxMem (Paris, Rennes)
  • P2P data cache
  • Work done within the GridRPC Working Group (OGF)
  • Relations with workflow management

16
Data management with DTM within DIET
  • Persistence at the server level
  • To avoid useless data transfers
  • Intermediate results
  • Between clients and servers
  • Between servers
  • transparent for the client
  • Data Manager/Loc Manager
  • Hierarchy mapped on the DIET one
  • modularity
  • Proposition to the Grid-RPC WG (OGF)
  • Data handles
  • Persistence flag
  • Data management functions

17
JUXMEM
PARIS project, IRISA, France
  • A peer-to-peer architecture for a data-sharing
    service in memory
  • Persistence and data coherency mechanism
  • Transparent data localization
  • Toolbox for the development of P2P applications
  • Set of protocols
  • One peer
  • Unique ID
  • Several communication protocols (TCP, HTTP, )

Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer
Peer
Peer
Peer
Peer
Peer
Peer
Peer
Peer
TCP/IP
Firewall
Peer
Peer
Peer
Firewall
Peer
Peer
HTTP
18
Deployment and visualization
19
(No Transcript)
20
VizDIET
21
Workflow management

22
Workflow Management needs ?
  • Workflow representation
  • Direct Acyclic Graph (DAG)
  • Each vertex is a tasks
  • Each directed edge represents communication
    between tasks
  • Questions
  • Ordering problem ?
  • Mapping problem ?

23
Workflow Management goals
  • Goals
  • Build and execute workflow
  • Use different heuristic methods to solve
    scheduling problems
  • Extensibility to address mutli-workflows
    submission and large grid platform
  • Manage heterogeneity and variability of
    environment

24
Workflow Management existing languages ?
  • Workflows languages
  • No standard (XML, scripts)
  • Exemples
  • Condor DAGman script
  • Pegasus DAX (xml)
  • Taverna XScuffl (xml)
  • 2 levels of description
  • Abstract application description
  • Concrete execution description

25
Workflow Management
  • Workflow description in DIET
  • Xml format
  • DIET profile problem (id), parameters (in,
    inout ,out)
  • Description of tasks and data dependency
  • lt!-- NORMD 2 --gt
  • ltnode id"normd2" path"normd"gt
  • ltin name"in_file" type"DIET_FILE"
    source"rascal1out_file" /gt
  • ltout name"normd_value" type"DIET_FLOAT" /gt
  • ltout name"srv_time" type"DIET_DOUBLE" /gt
  • ltprec id"rascal1" /gt
  • lt/nodegt
  • lt!-- LEON 1 --gt
  • ltnode id"leon1" path"leon"gt
  • ltarg name"protein_name" type"DIET_STRING"
    value"P07942" /gt

26
Workflow Management architecture
  • 2 Architectures
  • Meta scheduler in the client side
  • Meta scheduler distributed in the client and in
    the MA-DAG

27
Workflow Management Meta scheduler client
  • Architecture 1
  • Meta scheduler in the client side

MA
Client
LA
LA
LA
SeD
SeD
SeD
SeD
SeD
28
Workflow management Meta scheduler client
  • Disadvantages
  • No coordination between the different clients
  • Depends on client capability
  • Benefits
  • More flexible for evolution
  • Client can use his own algorithm.
  • More scalable, depends on client capability.

29
Workflow management
  • Architecture 2
  • Meta scheduler distributed in the client and in
    the MA-DAG

MA DAG
Client
MA
LA
LA
LA
SeD
SeD
SeD
SeD
SeD
30
Workflow management - Meta scheduler
  • Base Scheduler
  • No ranking, respect the topological order of the
    DAG
  • HEFT heuristic
  • Flexibility
  • Architecture 1
  • Client can have his own schedule
  • No needs to re-build the platform
  • Architecture 2
  • Schedulers are define at the compile time.
  • Needs to re-build the platform if some decide the
    change.

Abstract Workflow Scheduler
Virtual void execute() Virtual void
reSchedule()
User defined Scheduler
Virtual void execute() Virtual void
reSchedule()
31
Target applications

32
Docking Application
  • Detection of protein-protein and protein-DNA
    interactions.
  • Screening a database containing thousands of
    proteins for functional sites involved in binding
    to other proteins, DNA or ligand targets.

33
PipeAlign Application
  • The sequence-to-function relationship can be
    understood through the analysis of conserved
    patterns and evolution of protein organization
    mainly based on amino acid sequence comparisons
    in the context of the multiple alignments.

blastall
ballast
filtering
clustalw
rascal
normd
normd
leon
normd
34
Robinson application
  • This application annotate human genes according
    to their expression in neurological or muscular
    tissues, but also to the expression of their
    homolog other species.

35
Cosmology application
  • Simulate the evolution of dark matter particles
    during time to compare it to the real observation.

Centre de Recherche en Astronomie de Lyon
36
Current Work
37
Multi-Workflow
  • Deal with multiple workflow submission
  • On-line scheduling, different submission time
  • Implements fair scheduling strategies
  • Implements specific scheduling heuristics
  • Distribute the workflow management

?
grid
38
Multi-Workflow
  • Simulations
  • Real experiments on Grid5000

39
Conclusion
  • DIET
  • Workflow enabled
  • Data management DTM, JuXMEM
  • Performance information CoRI, FAST
  • Plugin schedulers
  • Multi-Applications

40
Questions ?
http//graal.ens-lyon.fr
Write a Comment
User Comments (0)
About PowerShow.com