Title: Creating and Managing Distributed Scientific Workflows: Techniques and Tools
1Creating and Managing Distributed Scientific
WorkflowsTechniques and Tools
- Omer F. Rana
- School of Computer Science and Welsh eScience
Centre - Cardiff University, UK
- o.f.rana_at_cs.cardiff.ac.uk
2Thanks to
- David Walker, Ian Taylor, Matthew Shields, Lican
Huang, Ali Shaikh Ali at Cardiff - Bertram Ludaescher at UC Davis
- Cecilia Gomes at UNL-Lisbon
- John Domingue at Open University
- Steve McGough, John Darlington at Imperial
College
Some material contained in this tutorial has been
obtained from the individuals mentioned above.
3Overview
- Introduction to Workflow Techniques
- Constructing and Managing Workflow
- Application Example Distributed Data Mining
using FAEHIM - Adaptive Workflows
- Workflow-related research themes
4Usage Problem Solving Environments
5Problem Solving Environments
- An envir
- Visual Composition
- Data Flow or Control Flow
- Language based
- Functional Languages
- Abstraction based
- Petri nets, Process (Composition) Algebras
- IRIS Explorer, ADL, Gateway/WebFlow, ARCADE,
Mathematica, MatLab etc - - Good survey by Grid Computing Environments
group -- see http//www.gridforum.org/ -
6Workflow
Adapted From Aleksander Slominski
- 70s Skip Ellis And Gary Nutt (OfficeTalk)
- Xerox Parc Office Automation Systems
- to reduce the complexity of the user's interface
to the office information system, control the
flow of information, and enhance the overall
efficiency of the office. (Ellis, Nutt 1980) - Representation, Specification, and Automation of
Office Procedures (Michael D. Zisman, PhD
Thesis, University of Pennsylvania, Warton School
of Business, 1977) - Often seen as a technique to automate existing
processes - Very popular in the business world
- Over 20 years gap
- Availability of Computer Networks
- Workflow (Business Process) was integral part of
applications
Ellis, C. A. Nutt, G. J. Office Information
Systems and Computer Science. In ACM Computing
Surveys, 12 (1980) 1, pp. 27-60.
7Historical Perspective
From Aleksander Slominski
- 65-75 Decompose Applications
- Data And Code Separated
- 75-85 Database Management
- DBMS Used To Share Data
- 85-95 User Interface Management
- UIMS User Interface Separated
- 95-05 Workflow Management
- Isolate Business Process
- Emerging standards such as those based on the
Service Oriented Architecture
Workflow Management Aalst, van Hee
8Workflow
From Aleksander Slominski
- The automation of a business process, in whole
or part, where documents, information or tasks
are passed from one participant to another to be
processed, according to a set of procedural rules
- Workflow Management Coalition (WfMC)
9WFMS And WF Engine
From Aleksander Slominski
- Workflow Management System (WFMS)
- A system that defines, creates and manages the
execution of workflows through the use of
software, running on one or more workflow
engines, which is able to interpret the process
definition, interact with workflow participants
and, where required, invoke the use of IT tools
and applications. - Workflow Engine
- A software service or "engine" that provides the
run time execution environment for a process
instance.
10Workflow Levels
From Aleksander Slominski
- Inside domain
- One unit/organization/Virtual Organization
- Level Up Above
- Multiple Virtual Organizations
- Global Model More dynamic
- Global Model
- Global Process
- Peer-to-Peer
11Categories Of Workflows
From Aleksander Slominski
Scientific
Business Value ?
Repetition ?
Production Workflows (Leyman, Roller)
12Workflow Lifecycle
From Aleksander Slominski
- Design
- Typical workflow is graph oriented
- Language how expressive is workflow
- GUI Visual Service Composition Environment
- Deployment
- Workflow Description is sent to Workflow Engine
- Possibly validated and compiled
- Execution
- Workflow Engine enacts Workflow Description
- Monitoring
- Events reflecting from workflow and services
execution - Refinement
13Control Flow vs. Data Flow
- Control Flow
- Managed via use of specialist control constructs
(conditions may be simple conjunction/disjunctio
n, or more complex operators) - Unit/component execution managed through these
control constructs - Types
- Transition only
- Switch, flow, while, etc
- Data Flow
- Execution managed via transmission of data
14Dealing with Loops and Conditionals
- Often difficult to achieve often ignored
- Conditional
- Specified as control-blocks
- Implemented through the use of scripts
- Loops
- Specified as meta-blocks blocks implemented
over sub-workflows - Implemented through the use of scripts
- Must be supported in the Enactment Engine
15Loops 2
KEPLER
Triana
- In Triana and Kepler use of specialist Loop
components - Components can be explicit
- Implemented as hidden command
16Loops 3
Init() Iteration() isExitLoop(Object
data) (Allows for user defined objects to
specify loop exit condition)
17Workflow System Architecture
Composition and Modelling/Analysis
Enactment/Mapping
Execution
18Workflow System Architecture
Composition and Modelling/Analysis
Enactment/Mapping
Execution
Information Service
Data Migration
User Interaction
User Services
Planning Engine
Checkpointing
Scheduling
19Workflow Taxonomy
Workflow System
Workflow design And specification
Scheduling and Enactment
Operational Attributes
Data Management
Component/Service Discovery
structure
composition
Model/spec
20Workflow Composition
Composition
Automated
User Directed
Planner
Graph-based
Language-based
Templates
Petri Net
DAG
Design Patterns
Logic
Markup
UML
Functional
Sub-workflows
Process Calculi
Process Calculi
User defined
Factory
scripting
21Process Markup Languages (http//www.ebpml.org/sta
tus.htm)
22BPEL4WS some definitions
- Business Protocol
- Mutually visible message exchange behaviour of
each of the parties involved in the protocol
without revealing internal behaviour - Business Process
- Executable Behaviour of a participant (actor) in
a business interaction - Abstract couple interface definitions with
behavioural specifications - Service Interface
- Set of operations that can me invoked on an actor
(participant)
23Abstract vs. Executable
- abstract processes
- - public behaviour
- define business protocols
- hide things that do not affect partner
- constrain only the message exchange
- what the possible replies are, not why one is
chosen - executable processes
- private behaviour
- fully define behaviour
- portable between compliant
- environments
- WS-Choreography
- Defines abstract behaviour
- BPEL_A hides parts that exist in BPEL_B
From Peter Furniss, Choreology
24Main pieces of a BPEL document
- Communication - offering and using web-services
- inbound and outbound
- Partners
- who we do things with
- Variables
- what is communicated
- Correlation sets
- instance identity - how do we know who we are
- Activities
- what is done
- Handlers
- coping with the (slightly) unexpected
- CompensationHandler
From Peter Furniss, Choreology
25BPEL Process
Containers play important part in data exchange,
and can be mapped to each other
From http//www.ebpml.org/bpel4ws.htm
26Communication (1)
- incoming
- ltreceive partnerLink"purchasing"
portType"lnspurchaseOrderPT"
operation"sendPurchaseOrder"
variable"PO"gtlt/receivegt - from whoever is the other role in purchasing
- involving the portType and operation given
- keep the inbound message in our variable PO
- can have other stuff in the body e.g.
correlation sets - used for
- in of our wsdl in-out incoming synchronous
request - in of our wsdl in incoming one-way
- includes semantic reply to an outgoing one-way
From Peter Furniss, Choreology
27Communication (2)
- reply to a synchronous receive
- ltreply partnerLink"purchasing"
portType"lnspurchaseOrderPT"
operation"sendPurchaseOrder"
variable"Invoice"/gt - back to whoever is the other role in purchasing
- involving the portType and operation given
- load the outbound message from our variable
Invoice
From Peter Furniss, Choreology
28Communication (3)
- using an asynchronous (wsdl in) service
- ltinvoke partnerLink"invoicing"
portType"lnscomputePricePT"
operation"initiatePriceCalculation"
inputVariable"PO"gtlt/invokegt - invoke on whoever is the other role in
invoicing - involving the portType and operation given
- load the outbound message from our variable PO
From Peter Furniss, Choreology
29Communication (4)
- using a synchronous (wsdl in-out) service
- ltinvoke partnerLink"shipping"
portType"lnsshippingPT" operation"requestShi
pping" inputVariable"shippingRequest"
outputVariable"shippingInfo"gtlt/invokegt - to whoever is the other role in shipping
- involving the portType and operation given
- load the outbound message from our variable
shippingRequest - store inbound message in our variable
shippingInfo
From Peter Furniss, Choreology
30partnerLinkType
- defines a type of relationship or conversation
- gives role name to a portType
- can tie two opposite portTypes together, as
asynchronous conversation pair - ltplnkpartnerLinkType name"shippingLT"gt
ltplnkrole name"shippingService"gt
ltplnkportType name"posshippingPT"/gt
lt/plnkrolegt ltplnkrole name"shippingRequester
"gt ltplnkportType name"posshippingCallbac
kPT"/gt lt/plnkrolegtlt/plnkpartnerLinkTypegt
From Peter Furniss, Choreology
31partnerLinks
- named instance of a partnerLinkType
- could be multiple partnerLinks of same type
- states which is me and which is him in a
conversation - ltpartnerLink name"shipping" partnerLinkType"
lnsshippingLT" myRole"shippingRequester" par
tnerRole"shippingService"/gt
From Peter Furniss, Choreology
32variables
- typed
- WSDL message
- XML schema simple types
- XML scheme element
- manipulation
- set from inbound, to outbound messages
- assign from/to
- other variables
- parts properties of variables of type wsdl
message - partnerLink endpoints
- simple expressions
- literals
- Xpath expressions to get inside complex variables
From Peter Furniss, Choreology
33correlation sets
- one process definition may have lots of instances
- which message is for/from which instance ?
- dont require/rely on environment or carrier
protocol to identify - define which fields of the messages distinguish
the instance - e.g. purchase order number username their
taskid - fields to be used are declared as properties of a
message variable - properties defined as bits of variable using
XPath (or XQuery) - if there is context id or request id field use
that - this may require bending the wsdl
From Peter Furniss, Choreology
34activities
- structured activities can contain other
activities ltsequencegt one after the
other ltflowgt in parallel ltpickgt choose by
inbound message ltswitchgt choose by expression
evaluation ltwhilegt iteration ltscopegt nest,
with declarations and handlers, synchronize - communication ltinvokegt send msg to partner
possibly receive response ltreceivegt accept msg
from partner ltreplygt send msg to partner as
response to ltreceivegt - other ltassigngt manipulate variables ltwaitgt f
or duration / until time ltterminategt end the
process ltcompensategt run compensation handler
of inner scope ltthrowgt exit with fault to
outer scope ltemptygt do nothing
From Peter Furniss, Choreology
35BPEL Activity
From http//www.ebpml.org/bpel4ws.htm
36links
- support Directed Activity Graph style
- activities can be source and target of links
- activity with target links does not run till
source completes normally - links can cross structured activity boundaries
- Why links AND structure ?
- BPEL is merge of two specifications and
approaches -
From Peter Furniss, Choreology
37handlers
- handlers are declared for process or scope
- watching for the lifetime of the scope
- eventHandler
- onMessage a ltreceivegt that could happen any
time - onAlarm time dependent
- faultHandler
- catches ltthrowgtn or generated fault
- scope exits abnormally
- watching after the scope has exited normally
- compensationHandler
From Peter Furniss, Choreology
38compensationHandler
- compensationHandler is installed when scope ends
- triggered from faultHandler or compensationHandler
of enclosing scope - uninstalled when process ends
- BUT
- whole process can have a compensationHandler
- triggered by unspecified means
- uninstalled when unspecified
From Peter Furniss, Choreology
39XPDL from WfMC
- One of the oldest workflow languages
- Transition-based (with guarded transitions)
- Activities are related to form a control flow
- Activity unit of work, which will be processed
by a combination of resource and/or computer
application - Activity can be manual (user) or automatic
(application) - Automatic activity invoke or receive
- Supports subflow
- In/Out parameters for data exchange
- No support for faults or exception handling
- Participant types
- Resource set, Resource, Organisational Unit,
Role, Human or System - No support for long running processes
40Data Type Definitions
- Standard types String and Number
- User defined types
- Type match undertaken manually
- Namespaces and XML Schema
- Identify reference to a namespace via a URI
41Enactment Strategies I
- Centralised Enactor
- Single graph coordinated through a centralised
enactor - The enactor manages execution of components in
some sequence - Distributed Enactors
- Graph divided into sub-graphs and handed to
different enactors - Each enactor responsible for executing local
graph
42Enactment Strategies II
- Event-based
- Each component on completion generates an event
- Use of publish-subscribe mechanism
- Each component also activated through the
generation of an event - Can have multiple event types
- Blackboard/Shared memory
- Component/Enactor writes to a shared space
- Monitored by components/enactor
- Blocks on availability of particular data items
in shared space
43Enactment for Automated Composition
- Enactment engine enlists use of other components
- Discovery Service
- Planning Engine
- Enactment is goal-oriented
- Define requirement, rather than components
- Conflict detection support
- Mechanisms to chose between alternatives
(constraints) - Difficult to do in practice
44Workflow ( Enactment)
From Aleksander Slominski
Launch, configure And control
Orchestration Service
Workflow Engine
Workflow Instance
Workflow Instance
Workflow Instance
Resource layer 1000s of PCs -gtmassive
supercomputers and data sources
Network
45Scientific Workflows
- What makes it different (how it is applied)?
- Support for large data flows
- Need to do parameterized execution of large
number of jobs - Need to monitor and control workflow execution
including ad-hoc changes - Need to execute in dynamic environment where
resources are not know a priori and may need to
adapt to changes - Hierarchical execution with sub-workflows created
and destroyed when necessary - Science Domain specific requirements.
- Triana
- Taverna/SCUFL
- GridAnt
- Condor DAG
- CoG DAG
- SWFL
- BioOpera
- BEPL4WS
- OASIS WSBEPL
- YAWL
- GSFL
- etc
- Origin (?)
- Problem Solving Environments
- (MatLab, Mathematica, SciRun, NetSolve, Ninf,
Nimrod etc)
http//www.nesc.ac.uk/action/esi/contribution.cfm?
Title303
http//www.extreme.indiana.edu/swf-survey/
Problems with Predictability
Workflow World
46Enactment Engines
- Employ a variety of techniques for enactment
- Integrated with a Portal others based on a
command line interface (some also provide a
scripting language) - Generally for constructing graphs others also
support execution of components within a graph - Support for third-party services
- Monitoring, Registry, etc
47Why Wrap Legacy Codes as Components?
- Pre-existing codes, mostly in C or Fortran
- Generally domain-specific
- Hard to re-use in other applications
- They are still useful
- They are often large, complex monoliths with
little structure. - Support Re-use
- Support Remote Execution
- Support Remote Discovery
- Support Remote Data Input/Output
Re-write? - try convincing App Scientists
48Wrapping Approaches
Similar name in DBs, but different approach
- Wrapping executables - As-Is Approach
- No source available (or provided)
- Maintain execution environment
- Wrapping Source - Source-Update Approach
- Some source provided (generally I/O)
- Executable can relinquish some control
- Data type conversions
- Source split Wrapping - Unit-Mapping Approach
- Split source into units -- wrap units
- Maintain unit execution environment overall
manager - Application Supported Wrapping - App-Wrap
- Steering support
- Data management support
49Wrapping Approaches
Similar name in DBs, but different approach
- Wrapping executables - As-Is Approach
- No source available (or provided)
- Maintain execution environment
- Wrapping Source - Source-Update Approach
- Some source provided (generally I/O)
- Executable can relinquish some control
- Data type conversions
- Source split Wrapping - Unit-Mapping Approach
- Split source into units -- wrap units
- Maintain unit execution environment overall
manager - Application Supported Wrapping - App-Wrap
- Steering support
- Data management support
- Provide Isolation between existing code, in its
present - form, and need to re-use and execute code
remotely - Enable properties of code to be specified (in
terms, - perhaps of its interface), to enable a discovery
- mechanism to utilise in, say, a particular
application. - Sustain performance, correctness of results,
ownership, - and availability
50Automating Wrapping
- Time consuming and error prone process
- Automate the implementation of interfaces to
access code - via a system wide data model
- Automate interactions between wrapped components
- via a discovery service - registry to a more
complicated lookup service - Can have
- same interface, different implementation
51Component Model and Extensions
Existing Code
52Component Model and Extensions
Existing Code
53Component Model and Extensions
ltpse-defgt ltprefacegt ltname alt"MD1"
id"MD01"gt MDComponentlt/namegt ltpse-typegt
Molecular Dynamics lt/pse-typegt
ltcomponent-directorygt/home/scmlm1/wgen/Componentlt/
component-directorygt ltlegacy-codegt/home/scmlm
1/md/moldynlt/legacy-codegt ltORB-Compilergtidl2j
avalt/ORB-Compilergt ltprocessorsgt8lt/processorsgt
lthost-namegtsapphire.cs.cf.ac.uklt/host-namegt
lt/prefacegt
ltoutportsgt ltoutportnumgt 6 lt/outportnumgt
ltoutport id"1"gt int lt/outportgt ltoutport
id"2"gt float lt/outportgt ltoutport id"3"gt
float lt/outportgt ltoutport id"4"gt float
lt/outportgt ltoutport id"5"gt float lt/outportgt
ltoutport id"6"gt float lt/outportgt lthref
name"file/home/scmlm1/wgen/Component/output.data
" value"output" /gt lt/outportsgt lt/portsgt
XML Data Model
Existing Code
54Component Model and Extensions
Existing Code
External Control Input (for Steering)
55Component Model and Extensions
Data Manager
Existing Code
Runtime support
56Component Model and Extensions
Data Manager
Existing Code
Runtime support
Execution Rules
57Promoter Identification Workflow
Source Matt Coleman (LLNL)
58Source NIH BIRN (Jeffrey Grethe, UCSD)
59Ecology GARP Analysis Pipeline for Invasive
Species Prediction
Source NSF SEEK (Deana Pennington et. al, UNM)
60http//www.gridlab.org/
http//www.trianacode.org/
61GridLab Implementation
http//www.trianacode.org/
GAP Interface
GAT
Gridlab Services
JXTAServe
P2PS
WSPeer
JXTA
Sockets
Web Services
OGSA Services
62Java GAT Prototype
GAP (Java Prototype)
- Advertising
- Discovery
- Communication
OGSA (planned)
Jxta
Web Services
P2PS
And more..
Jxtaserve
GSI Enabled
NS-2
Job Submission (GRMS)
- Generic Job Submission
- Virtual filename data access
- Set of generic Java interfaces
- high level abstractions to Grid services
- Factory design dynamic pluggable services
Data Management
GridLab GAT (www.gridlab.org)
63Triana Architecture
- Plug-in Applications
- flexible apps can use Triana in various ways,
as a - GUI
- remote control GUI
- or in full inc. GAP/GAT
Triana Engine
TCS
Command Service Control
3rd Party Application
Triana TaskGraph Reader
Triana Command Reader
Triana TaskGraph Writer
Triana Command Writer
XML Reader
WSFL Reader
TCom Reader
Other Reader
XML Writer
Other Writer
TCom Writer
Other Writer
3rd Party Application
Interactive
Applications Insert Points
Interactive/Offline
Communication Channels
64Triana Distributed Work-flow
Triana Service Engine
Triana Service Engine
Action Commands
Workflow, e.g. BPEL4WS
Network
- Distributed Triana Work-flow
- flexible distribution based around Triana
Groups - HPC and Pipelined distribution
Triana Controlling Service (TCS)
Triana Service Engine
Triana Engine
Other Engine
Triana Gateway
65Distributing Triana Taskgraphs
- Mapping tasks or groups of tasks to resources
- Two stages
- Taskgraph annotation, XML definition for each
task or group of tasks - extended to specify resources and message
channels - Data distribution, annotated sub-sections of
taskgraph passed to resources
66Custom Distribution
- Distribution units are standard Triana tools,
enabling users to create their own custom
distributions
67Remote Deployment
- User can distribute any task or group of tasks
(sub-workflow) - Using the GAP Interface, Triana automatically
launches a remote service providing that
sub-workflow. - Input, Output and Control Pipes are connected
using the current GAP binding (e.g. JXTA Pipes)
68Deploying and Connecting To Remote Services
- Running services are automatically discovered via
the GAP Interface, and appear in the tool tree - User can drag remote services onto the workspace
and connect cables to them like standard tools
(except the cables represent actual JXTA/P2PS
pipes)
Remote Services
69Web Service Discovery 1
- Triana allows users to query UDDI repositories
- Alternatively, users can import services directly
from WSDL
70Web Service Discovery 2
- Discovered/Imported Web Services are converted
into Triana tools - (service name tool name)
- (input message parts in nodes)
- (output message parts out nodes)
- etc
- Web Service tools are displayed in the users
Tool Tree (alongside local tools)
71Connecting Workflows
- Web Service tools can be dropped onto the users
workspace and connected like local tools - A workflow can contain both local and Web Service
tools
72Complex Data Types
- Users can build their own interface for
creating/mediating between complex types - Alternatively, Triana can dynamically generate an
interface from the WSDL2Java generated bean class
73GEMSS Maxillo-facial Surgery Simulation
74GEO 600 Inspiral Search
- Background
- Compact binary stars orbiting each other in a
close orbit - among the most powerful sources of gravitational
waves - As the orbital radius decreases a characteristic
chirp waveform is produced - amplitude and
frequency increase with time until eventually the
two bodies merge together - Computing
- Need 10 Gigaflops to keep up with real time data
(modest search..) - Data 8kHz in 24-bit resolution (stored in 4
bytes) -gt Signal contained within 1 kHz 2000
samples/second - divided into chunks of 15 minutes in duration
(i.e. 900 seconds) 8MB - Algorithm
- Data is transmitted to a node
- Node initialises i.e. generates its templates
(around 10000) - fast correlates its templates with data
75Coalescing Binary Search
76Triana Prototype
GEO 600 Coalescing Binary Search
77Coalescing Binary Scenario
Controller
Email, SMS notification
Logical File Name
GW Data Distributed Storage
GAT (GRMS, Adaptive)
GW Data
- Submit Job
- Optimised Mapping
GAT (Data Management)
CB Search
Gridlab Test-bed
78Discovery Net Workflow
- Workflow Construction
- Integrate information resources/software
applications cross-domain - Warehousing workflows for scientists
- Manage discovery processes within an organisation
- Construct an enterprise process knowledge bank
- Deployment workflow to scientists
- Turn a workflows into reusable applications/servic
es
79An Integrative Analysis Example
80The KEPLER/Ptolemy II GUI (Vergil)
Directors define the component interaction
execution semantics
Large, polymorphic component (Actors) and
Directors libraries (drag drop)
81Actor-Oriented Design
What flows through an object is sequential
control (cf. CCA, MPI)
class name
data
methods
call
return
What flows through an object is a stream of data
tokens (in SWFs/KEPLER also references!!)
- Actor/Dataflow orientation
actor name
data (state)
parameters
Input data
Output data
ports
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
82Object-Oriented vs.Actor-Oriented Interfaces
Object Oriented
OO interface gives procedures that have to be
invoked in an order not specified as part of the
interface definition.
AO interface definition says Give me text and
Ill give you speech
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
83Ptolemy II Actor-Oriented Modeling
- Director acts as an enactor
- In this instance, interaction semantics are not
maintained within a component - This is equivalent to having a centralised
enactor - Different directors for different modeling and
execution needs - Hence, a variety of directors can operate on the
same components - Better abstraction, modeling, component reuse,
84Behavioral Polymorphism in Ptolemy
These polymorphic methods implement the
communication semantics of a domain in Ptolemy
II. The receiver instance used in communication
is supplied by the director, not by the
component. (cf. CCA, WS-??, GBPL4??, !)
IOPort
Behavioral polymorphism is the idea that
components can be defined to operate with
multiple models of computation and multiple
middleware frameworks.
consumer
producer
actor
actor
Receiver
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
85Component Composition Interaction
- Components linked via ports
- Dataflow (and msg/ctl-flow)
- Where is the component interaction semantics
defined?? - each component is its own director!
- But still useful for special applications, e.g.
parallel programs (MPI, )
Source GRIST/SC4DEVO workshop, July 2004, Caltech
86Data/Control-Flow Spectrum
message passing, control flow
clean data(ctl)-flow
special tokens flow
- Data (tokens) flow
- (almost) no other side effects
- WYSIWYG (usually)
- References flow
- token reference type may be http-get,
ftp-get, hsi put - generic handling still possible
- Application specific tokens flow
- e.g. current Nimrod job management in Resurgence
- invisible contract between components
- Director is unaware of whats going on (sounds
familiar? -) - Specific messages passing protocols (e.g., CSP,
MPI) - for systems of tightly coupled components
87Domains and Directors Semantics for Component
Interaction
- CI Push/pull component interaction
- CSP concurrent threads with rendezvous
- CT continuous-time modeling
- DE discrete-event systems
- DDE distributed discrete events
- FSM finite state machines
- DT discrete time (cycle driven)
- Giotto synchronous periodic
- GR 2-D and 3-D graphics
- PN process networks
- SDF synchronous dataflow
- SR synchronous/reactive
- TM timed multitasking
For (finer-grained) concurrent jobs!?
For (coarse grained) Scientific Workflows!
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
88Polymorphic Actor Components Working Across Data
Types and Domains
- Actor Data Polymorphism
- Add numbers (int, float, double, Complex)
- Add strings (concatenation)
- Add complex types (arrays, records, matrices)
- Add user-defined types
- Actor Behavioral Polymorphism
- In dataflow, add when all connected inputs have
data - In a time-triggered model, add when the clock
ticks - In discrete-event, add when any connected input
has data, and add in zero time - In process networks, execute an infinite loop in
a thread that blocks when reading empty inputs - In CSP, execute an infinite loop that performs
rendezvous on input or output - In push/pull, ports are push or pull (declared or
inferred) and behave accordingly - In real-time CORBA, priorities are associated
with ports and a dispatcher determines when to
add
By not choosing among these when defining the
component, we get a huge increment in component
re-usability. But how do we ensure that the
component will work in all these circumstances?
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
89Directors and Combining Different Component
Interaction Semantics
- Possible app. in SWF
- time-series aware
- parameter-sweep aware
- XY aware
- execution models
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/ptolemyII/
90Web Services ? Actors (WS Harvester)
1
2
4
3
- ? Minute-made (MM) WS-based application
integration - Similarly MM workflow design sharing w/o
implemented components
91KEPLER Actors
92FAEHIM
- Use of Web Services composition with
distributed services - Wrap third party services (Mathematica, GNUPlot)
- WEKA Service template
- Triana Workflow
- Services provided by third parties
- WSDL interfaces (avoid use of specialist
languages unless really necessary) - SOAP-based message exchange
- Use of attachments
- Access to local and remote data sets
- Support for data streaming
- Wrapping of existing algorithms (important
requirement)
http//users.cs.cf.ac.uk/Ali.Shaikhali/faehim/
93FAEHIM Architecture
94(No Transcript)
95(No Transcript)
96(No Transcript)
97(No Transcript)
98Demo 1
99Demo 2
100Inside the FAEHIM Toolbox
101Usage Overview
Dataset
Classify
Weka Algorithm
URL
ClassifyR
102Classifier
- J48 Classifier
- Class for generating an (un)pruned C4.5 decision
tree. For more information, see Ross Quinlan
(1993). C4.5 Programs for Machine Learning,
Morgan Kaufmann Publishers, San Mateo, CA. - Operationsclassify( )Input DataHandler
dataset, String attributeNameoutput DataHandler
decisionTree - classifyRemoteDataset( )Input String url,
String attributeNameoutput DataHandler
decisionTree
103Clustering Support
- Cobweb Web Service
- Operations
- cluster( )Input DataHandler datasetoutput
String result - clusterRemoteInstance( )Input String
datasetURL - output String resultclusterByPercentage(
)Input DataHandler dataset, int
percentageoutput String result
104Graph Plotting Service
- Plotting Web Service
- Operations
- plot3D( )Input DataHandler data, String
plotTypeoutput DataHandler graph - getPlotTypes( )Input nulloutput String
plotTypes -
105Registry Usage
Execution Resource
UDDI
Registry of Algorithms
Algorithm 1
Algorithm 1
Algorithm 2
Web Service
106Classifier ... 2
- Classifier Template
- This Web service implements a complete list of
classifiers, i.e. trees, rules, functions etc.
OperationsclassifyInstance() - classifyRemoteInstance()getClassifiers( )
- getOptions()
?
?
?
?
Input DataHandler dataset String
classifierName String options String
attributeName output String result
Input null output String listOfClassifiers
Input String classifierName output String
listOfApplicableOptions
Input String datasetURL String classifierName
String options String attributeName output
String result
107Parallel Execution
Resource Allocation Manager
UDDI
Registry of Algorithms
Algorithm 1
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 1
Algorithm 1
Algorithm 1
Execution Resource
Execution Resource
Execution Resource
Execution Resource
108Distributed Workflow
Community
Performance Info.
WF Enactor
Service Provider
Manager
Registry
109Workflow Optimisation
- Types of workflow optimisation
- Through service selection
- Through workflow re-ordering
- Through exploitation of parallelism
- When is optimisation performed?
- At design time (early binding)
- Upon submission (intermediate binding)
- At runtime (late binding)
110Workflow Partitioning (Pegasus)
- Full Graph vs Partial Graph Scheduling
- Schedule
- Total workflow Graph
- Sub-graph
- Each node
111Service Binding Models
- Late binding of abstract service to concrete
service instance means - We use up-to-date information to decide which
service to use when there are. multiple
semantically equivalent services - We are less likely to try to use a service that
is unavailable.
112Late Binding Case
- Search registry for all services that are
consistent with abstract service description. - Select optimal service based on current
information, e.g, host load, etc. - Execute this service.
- Doesnt take into account time to transfer inputs
to the service. - In early and late binding cases we can optimise
overall workflow.
113WOSE Architecture
Work at Cardiff has focused on implementing a
late binding model for dynamic service discovery,
based on a generic service proxy, and service
discovery and optimisation services.
114Service Discovery Issues
- Service discovery and optimisation is based on
service metadata. - Could store in a database.
- Could obtain by interrogating service.
115Optimisation by Re-Ordering
- Work at Imperial has looked at static
optimisation - Optimise the runtime execution of workflow before
it is executed - Achieves the goal through
- Re-ordering of components
- Addition of components
- Substitution of components
- Pruning of the workflow
- Performance and workflow aware Scheduling
- Runtime Optimisation
- through monitoring, check-pointing and migration
116Component Manipulation
- Re-ordering Workflows (often composed from
composite workflows) may contain non-optimal
ordering of components - Use re-ordering to improve performance
117Component Addition
- Addition For a component requiring a specific
format of data as input, a transformer component
could be added to achieve the desired format. - Allows more optimal components to be used
together
Input required in MPS format
Output in LP format
C 1
LP to MPS
C 2
118Component Substitution
- Substitution
- A Jacobi Iteration linear solver replaced by
Conjugate Gradient linear solver according to the
output of the Discretizer (FEM) - Based on observing the meta-data associated with
previous components
A (sparse and diagonally dominant)
JI linear solver
FEM
b
119Pruning
- Workflow Pruning
- Workflows may contain unused components.
Especially when composed from other sub-workflows - Remove redundant components
120Performance Aware Scheduling
Globus Resources
Performance Repository
Globus Launcher
JSDL
Query
Component Repository
Single Resource Launcher
Scheduler
Query
Request Reservation
SGE Resources
Reservation Launcher
Reservation Service
Negotiate Reservation WS-Agreement
121Execution Pipeline
122Workflow Patterns
- Identify and reuse common idioms in some
scientific domain and across different scientific
domains. - An idiom captures common knowledge and
experience and describe how a similar set of
experiments are to be set-up and managed.
From Cecilia Gomes
123Usage
- To allow computational scientists and developers
to capture design patterns that express common
usage of software infrastructure within
scientific domains - To provide a software engineering tool that
supports - application configuration,
- execution control, and
- reconfiguration of software services
From Cecilia Gomes
124Approach
- Patterns are divided in two categories for
flexibility - Co-ordination (Behavioural) patterns
- Capture interactions between software sub-systems
- Structural patterns
- Capture connectivity between particular types of
Grid software/hardware components
From Cecilia Gomes
125Approach
- Patterns as first class entities both at design,
execution, and reconfiguration times - Pattern templates are manipulated through Pattern
Operators - Structural operators
- Behavioural operators
From Cecilia Gomes
126Structural Pattern Templates
- Encode component connectivity. Ex Pipeline,
Ring, Star, Façade, Adapter, Proxy.
From Cecilia Gomes
127Structural Operators
- Manipulate structural patterns keeping their
structural constraints. - Examples
- Increase, Decrease,
- Extend, Reduce,
- Embed, Extract,
- Group,
- Rename/Reshape,
From Cecilia Gomes
128Structural Operators
- Manipulate structural patterns keeping their
structural constraints. - Examples
- Increase, Decrease,
- Extend, Reduce,
- Embed, Extract,
- Group,
- Rename/Reshape,
From Cecilia Gomes
129Increase Structural Operator
Pattern
Result Pattern
From Cecilia Gomes
130Extend Structural Operator
Pattern
Result Pattern
From Cecilia Gomes
131Behavioural Pattern Templates
- Capture temporal or (data/control) flow
dependencies between components. - Examples
- Client/Server,
- Master/Slave,
- Streaming,
- Service Adapter,
- Service Migration,
- Broker Service
- Service Aggregator/Decomposer,
From Cecilia Gomes
132Behavioural Operators
- Act over the temporal or flow dependencies for
execution control and reconfiguration. - Examples
- Start, Terminate,
- Log,
- Stop, Resume,
- Restart, Limit,
- Repeat,
From Cecilia Gomes
133Pattern Operators - example
- main
- P1P2
- section1
- Rename(P1,P2)
- Replicate(P2,3)
- Owner(P2, scmofr)
- section2
- Start(P2)
- Log(P2)
- Limit(30,P2)
- section1 section2
From Cecilia Gomes
134Workflow Planning/Adaptation
- Goal-oriented
- Abstract ? Concrete workflow translation
- May utilise a number of different infrastructure
services (Pegasus) - Level of automation can vary
- Find components
- Find sub-workflows
- Find infrastructure services
- Publish output data at specific locations
135Chimera is developed at ANL By I. Foster, M.
Wilde, and J. Voeckler
From Ewa Deelman
136HTN Planning (Activity Composition)
HTN Planning Use of Methods (task decomp)
and Operators (task execution)
Introduce activities to achieve
preconditions Resolve interactions between
conditions and effects Handle constraints (e.g.
world state, resource, spatial, etc.)
From Austin Tate (Edinburgh)
137HTN Planning (Initial Plan Stated as Goals)
Initial Plan can be any combination of Activities
and Constraints
From Austin Tate (Edinburgh)
138Composer Enactor
From Austin Tate (Edinburgh)
139Product Model Refinement Step Using ltI-N-C-Agt
Framework
From Austin Tate (Edinburgh)
140BDI Planning
- Situated so actions, percepts, time
- Fire engine
- See nearby fires road conditions, hear messages
from other agents, hear civilian calls for help. - Move, squirt, tell (broadcast), say, plan route
(internal)
Percepts
Choose an action a ? As
Action
From Michael Winikoff, RMIT
141BDI Planning 2
- Reactive so events(significant occurrence)
- New fire, fire extinguished, fire urgent, help
requested - Proactive so goals
- Put out fire, discover fire, assist, coordinate
From Michael Winikoff, RMIT
142BDI Planning 3
- Implementation uses plans and beliefs
- Cache for means, and world information
respectively - Beliefs Map (incl. fires, buildings), fire
assignment and priority - Plans Put out fire, roam,
Percepts
Events
Beliefs
Goals
Actions
Action
Plans
From Michael Winikoff, RMIT
143BDI agents (based on AgentSpeak(L))
- Chosen plan added to intention stack (can be
either an event (posted) or action (executed))
144BDI-based Enactor
- Enactor can maintain local plan library
- update of plan library as new conditions are
detected - Useful in a dynamic environment (Grid) -- as
agents are goal directed - Execution of a plan leads to update of beliefs
- useful mechanism to adapt agent behaviour in a
Grid context - Potentially useful to allow detection of plan
conflicts - Traditional approach
- number of tasks fixed, resources identical
- fixed number of resources, tasks pre-defined
- Delegate scheduling priorities to each resource
and task agent (no central schedulers)
145Planning as Model Checking
- Planning based on
- Non-determinism cannot predict interactions with
external processes - Partial Observability can only observe external
interactions (as BPEL) not internal status - Extended Goals behaviour of the process is
important, and not just the final goal - Conditional Preferences may require multiple
conditions to hold for goal to be satisfied - Given current state, evaluate possible likely
states (may require an exhaustive checking of
possibilities)
146Planning
Context captures state
147Web Services Modelling Ontology (WSMO)
- Use of Semantic Web Services to aid automated
composition - Given a goal, identify how services could be
composed to achieve the goal - Specifies a complete set of infrastructure that
is necessary to achieve this - Provides three main components
- Web Services Modelling Ontology
- Web Services Modelling Language
- Execution Environment
From John Domingue, Open University
148WSMO Working Groups
A Conceptual Model for SWS
A Formal Language for WSMO
Execution Environment for WSMO
A Rule-based Language for SWS
From John Domingue, Open University
149WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
- Semantic description of Web Services
- Capability (functional)
- Interfaces (usage)
Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
150Conceptual Model (WSMX-O)
151Non-Functional Properties List
Dublin Core Metadata Contributor Coverage
Creator Description Format Identifier
Language Publisher Relation Rights Source
Subject Title Type
Quality of Service Accuracy NetworkRelatedQoS Pe
rformance Reliability Robustness Scalability
Security Transactional Trust
Other Financial Owner TypeOfMatch Version
Service Descriptions make use of this
From John Domingue, Open University
152WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
- Semantic description of Web Services
- Capability (functional)
- Interfaces (usage)
Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
153Ontology Specification
- Non functional properties (see before)
- Imported Ontologies importing existing
ontologies where no heterogeneities arise - Used mediators OO Mediators (ontology import
with terminology mismatch handling) - Ontology Elements
- Concepts set of concepts that belong to the
ontology, incl. - Attributes set of attributes that belong to a
concept - Relations define interrelations between several
concepts - Functions special type of relation (unary range
return value) - Instances set of instances that belong to the
represented ontology - Axioms axiomatic expressions in ontology (logical
statement)
From John Domingue, Open University
154WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
- Semantic description of Web Services
- Capability (functional)
- Interfaces (usage)
Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
155WSMO Web Service Description
- complete item description
- quality aspects
- Web Service Management
- Advertising of Web Service
- Support for WS Discovery
Capability functional description
Non-functional Properties DC QoS Version
financial
- realization of functionality by aggregating
- other Web Services
- functional
- decomposition
- WS composition
- client-service interaction interface for
consuming WS - External Visible
- Behavior
- - Communication
- Structure
- - Grounding
Web Service Implementation (not of interest in
Web Service Description)
Choreography --- Service Interfaces ---
Orchestration
From John Domingue, Open University
156Capability Specification
- Non functional properties
- Imported Ontologies
- Used mediators
- OO Mediator importing ontologies with mismatch
resolution - WG Mediator link to a Goal wherefore service is
not usable a priori - Pre-conditions What a web service expects in
order to be able to - provide its service. They define conditions
over the input. - Assumptions Conditions on the state of the
world that has to hold before - the Web Service can be executed
- Post-conditions
- describes the result of the Web Service in
relation to the input, - and conditions on it
- Effects
- Conditions on the state of the world that hold
after execution of the - Web Service (i.e. changes in the state of the
world)
From John Domingue, Open University
157Service Interface Description Model
- Vocabulary ?
- ontology schema(s) used in service interface
description - usage for information interchange in, out,
shared, controlled - States ?(O)
- a stable status in the information space
- defined by attribute values of ontology instances
- Guarded Transition GT(?)
- state transition
- general structure if (condition) then (action)
- different for Choreography and Orchestration
From John Domingue, Open University
158WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
- Semantic description of Web Services
- Capability (functional)
- Interfaces (usage)
Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University