Creating and Managing Distributed Scientific Workflows: Techniques and Tools - PowerPoint PPT Presentation

1 / 172
About This Presentation
Title:

Creating and Managing Distributed Scientific Workflows: Techniques and Tools

Description:

David Walker, Ian Taylor, Matthew Shields, Lican Huang, Ali Shaikh Ali at Cardiff ... YAWL. GSFL ... etc. Origin (?): Problem Solving Environments ... – PowerPoint PPT presentation

Number of Views:404
Avg rating:3.0/5.0
Slides: 173
Provided by: COM136
Category:

less

Transcript and Presenter's Notes

Title: Creating and Managing Distributed Scientific Workflows: Techniques and Tools


1
Creating and Managing Distributed Scientific
WorkflowsTechniques and Tools
  • Omer F. Rana
  • School of Computer Science and Welsh eScience
    Centre
  • Cardiff University, UK
  • o.f.rana_at_cs.cardiff.ac.uk

2
Thanks to
  • David Walker, Ian Taylor, Matthew Shields, Lican
    Huang, Ali Shaikh Ali at Cardiff
  • Bertram Ludaescher at UC Davis
  • Cecilia Gomes at UNL-Lisbon
  • John Domingue at Open University
  • Steve McGough, John Darlington at Imperial
    College

Some material contained in this tutorial has been
obtained from the individuals mentioned above.
3
Overview
  • Introduction to Workflow Techniques
  • Constructing and Managing Workflow
  • Application Example Distributed Data Mining
    using FAEHIM
  • Adaptive Workflows
  • Workflow-related research themes

4
Usage Problem Solving Environments
5
Problem Solving Environments
  • An envir
  • Visual Composition
  • Data Flow or Control Flow
  • Language based
  • Functional Languages
  • Abstraction based
  • Petri nets, Process (Composition) Algebras
  • IRIS Explorer, ADL, Gateway/WebFlow, ARCADE,
    Mathematica, MatLab etc
  • - Good survey by Grid Computing Environments
    group -- see http//www.gridforum.org/

6
Workflow
Adapted From Aleksander Slominski
  • 70s Skip Ellis And Gary Nutt (OfficeTalk)
  • Xerox Parc Office Automation Systems
  • to reduce the complexity of the user's interface
    to the office information system, control the
    flow of information, and enhance the overall
    efficiency of the office. (Ellis, Nutt 1980)
  • Representation, Specification, and Automation of
    Office Procedures (Michael D. Zisman, PhD
    Thesis, University of Pennsylvania, Warton School
    of Business, 1977)
  • Often seen as a technique to automate existing
    processes
  • Very popular in the business world
  • Over 20 years gap
  • Availability of Computer Networks
  • Workflow (Business Process) was integral part of
    applications

Ellis, C. A. Nutt, G. J. Office Information
Systems and Computer Science. In ACM Computing
Surveys, 12 (1980) 1, pp. 27-60.
7
Historical Perspective
From Aleksander Slominski
  • 65-75 Decompose Applications
  • Data And Code Separated
  • 75-85 Database Management
  • DBMS Used To Share Data
  • 85-95 User Interface Management
  • UIMS User Interface Separated
  • 95-05 Workflow Management
  • Isolate Business Process
  • Emerging standards such as those based on the
    Service Oriented Architecture

Workflow Management Aalst, van Hee
8
Workflow
From Aleksander Slominski
  • The automation of a business process, in whole
    or part, where documents, information or tasks
    are passed from one participant to another to be
    processed, according to a set of procedural rules
  • Workflow Management Coalition (WfMC)

9
WFMS And WF Engine
From Aleksander Slominski
  • Workflow Management System (WFMS)
  • A system that defines, creates and manages the
    execution of workflows through the use of
    software, running on one or more workflow
    engines, which is able to interpret the process
    definition, interact with workflow participants
    and, where required, invoke the use of IT tools
    and applications.
  • Workflow Engine
  • A software service or "engine" that provides the
    run time execution environment for a process
    instance.

10
Workflow Levels
From Aleksander Slominski
  • Inside domain
  • One unit/organization/Virtual Organization
  • Level Up Above
  • Multiple Virtual Organizations
  • Global Model More dynamic
  • Global Model
  • Global Process
  • Peer-to-Peer

11
Categories Of Workflows
From Aleksander Slominski
Scientific
Business Value ?
Repetition ?
Production Workflows (Leyman, Roller)
12
Workflow Lifecycle
From Aleksander Slominski
  • Design
  • Typical workflow is graph oriented
  • Language how expressive is workflow
  • GUI Visual Service Composition Environment
  • Deployment
  • Workflow Description is sent to Workflow Engine
  • Possibly validated and compiled
  • Execution
  • Workflow Engine enacts Workflow Description
  • Monitoring
  • Events reflecting from workflow and services
    execution
  • Refinement

13
Control Flow vs. Data Flow
  • Control Flow
  • Managed via use of specialist control constructs
    (conditions may be simple conjunction/disjunctio
    n, or more complex operators)
  • Unit/component execution managed through these
    control constructs
  • Types
  • Transition only
  • Switch, flow, while, etc
  • Data Flow
  • Execution managed via transmission of data

14
Dealing with Loops and Conditionals
  • Often difficult to achieve often ignored
  • Conditional
  • Specified as control-blocks
  • Implemented through the use of scripts
  • Loops
  • Specified as meta-blocks blocks implemented
    over sub-workflows
  • Implemented through the use of scripts
  • Must be supported in the Enactment Engine

15
Loops 2
KEPLER
Triana
  • In Triana and Kepler use of specialist Loop
    components
  • Components can be explicit
  • Implemented as hidden command

16
Loops 3
Init() Iteration() isExitLoop(Object
data) (Allows for user defined objects to
specify loop exit condition)
17
Workflow System Architecture
Composition and Modelling/Analysis
Enactment/Mapping
Execution
18
Workflow System Architecture
Composition and Modelling/Analysis
Enactment/Mapping
Execution
Information Service
Data Migration
User Interaction
User Services
Planning Engine
Checkpointing
Scheduling
19
Workflow Taxonomy
Workflow System
Workflow design And specification
Scheduling and Enactment
Operational Attributes
Data Management
Component/Service Discovery
structure
composition
Model/spec
20
Workflow Composition
Composition
Automated
User Directed
Planner
Graph-based
Language-based
Templates
Petri Net
DAG
Design Patterns
Logic
Markup
UML
Functional
Sub-workflows
Process Calculi
Process Calculi
User defined
Factory
scripting
21
Process Markup Languages (http//www.ebpml.org/sta
tus.htm)
22
BPEL4WS some definitions
  • Business Protocol
  • Mutually visible message exchange behaviour of
    each of the parties involved in the protocol
    without revealing internal behaviour
  • Business Process
  • Executable Behaviour of a participant (actor) in
    a business interaction
  • Abstract couple interface definitions with
    behavioural specifications
  • Service Interface
  • Set of operations that can me invoked on an actor
    (participant)

23
Abstract vs. Executable
  • abstract processes
  • - public behaviour
  • define business protocols
  • hide things that do not affect partner
  • constrain only the message exchange
  • what the possible replies are, not why one is
    chosen
  • executable processes
  • private behaviour
  • fully define behaviour
  • portable between compliant
  • environments
  • WS-Choreography
  • Defines abstract behaviour
  • BPEL_A hides parts that exist in BPEL_B

From Peter Furniss, Choreology
24
Main pieces of a BPEL document
  • Communication - offering and using web-services
  • inbound and outbound
  • Partners
  • who we do things with
  • Variables
  • what is communicated
  • Correlation sets
  • instance identity - how do we know who we are
  • Activities
  • what is done
  • Handlers
  • coping with the (slightly) unexpected
  • CompensationHandler

From Peter Furniss, Choreology
25
BPEL Process
Containers play important part in data exchange,
and can be mapped to each other
From http//www.ebpml.org/bpel4ws.htm
26
Communication (1)
  • incoming
  • ltreceive partnerLink"purchasing"
    portType"lnspurchaseOrderPT"
    operation"sendPurchaseOrder"
    variable"PO"gtlt/receivegt
  • from whoever is the other role in purchasing
  • involving the portType and operation given
  • keep the inbound message in our variable PO
  • can have other stuff in the body e.g.
    correlation sets
  • used for
  • in of our wsdl in-out incoming synchronous
    request
  • in of our wsdl in incoming one-way
  • includes semantic reply to an outgoing one-way

From Peter Furniss, Choreology
27
Communication (2)
  • reply to a synchronous receive
  • ltreply partnerLink"purchasing"
    portType"lnspurchaseOrderPT"
    operation"sendPurchaseOrder"
    variable"Invoice"/gt
  • back to whoever is the other role in purchasing
  • involving the portType and operation given
  • load the outbound message from our variable
    Invoice

From Peter Furniss, Choreology
28
Communication (3)
  • using an asynchronous (wsdl in) service
  • ltinvoke partnerLink"invoicing"
    portType"lnscomputePricePT"
    operation"initiatePriceCalculation"
    inputVariable"PO"gtlt/invokegt
  • invoke on whoever is the other role in
    invoicing
  • involving the portType and operation given
  • load the outbound message from our variable PO

From Peter Furniss, Choreology
29
Communication (4)
  • using a synchronous (wsdl in-out) service
  • ltinvoke partnerLink"shipping"
    portType"lnsshippingPT" operation"requestShi
    pping" inputVariable"shippingRequest"
    outputVariable"shippingInfo"gtlt/invokegt
  • to whoever is the other role in shipping
  • involving the portType and operation given
  • load the outbound message from our variable
    shippingRequest
  • store inbound message in our variable
    shippingInfo

From Peter Furniss, Choreology
30
partnerLinkType
  • defines a type of relationship or conversation
  • gives role name to a portType
  • can tie two opposite portTypes together, as
    asynchronous conversation pair
  • ltplnkpartnerLinkType name"shippingLT"gt
    ltplnkrole name"shippingService"gt
    ltplnkportType name"posshippingPT"/gt
    lt/plnkrolegt ltplnkrole name"shippingRequester
    "gt ltplnkportType name"posshippingCallbac
    kPT"/gt lt/plnkrolegtlt/plnkpartnerLinkTypegt

From Peter Furniss, Choreology
31
partnerLinks
  • named instance of a partnerLinkType
  • could be multiple partnerLinks of same type
  • states which is me and which is him in a
    conversation
  • ltpartnerLink name"shipping" partnerLinkType"
    lnsshippingLT" myRole"shippingRequester" par
    tnerRole"shippingService"/gt

From Peter Furniss, Choreology
32
variables
  • typed
  • WSDL message
  • XML schema simple types
  • XML scheme element
  • manipulation
  • set from inbound, to outbound messages
  • assign from/to
  • other variables
  • parts properties of variables of type wsdl
    message
  • partnerLink endpoints
  • simple expressions
  • literals
  • Xpath expressions to get inside complex variables

From Peter Furniss, Choreology
33
correlation sets
  • one process definition may have lots of instances
  • which message is for/from which instance ?
  • dont require/rely on environment or carrier
    protocol to identify
  • define which fields of the messages distinguish
    the instance
  • e.g. purchase order number username their
    taskid
  • fields to be used are declared as properties of a
    message variable
  • properties defined as bits of variable using
    XPath (or XQuery)
  • if there is context id or request id field use
    that
  • this may require bending the wsdl

From Peter Furniss, Choreology
34
activities
  • structured activities can contain other
    activities ltsequencegt one after the
    other ltflowgt in parallel ltpickgt choose by
    inbound message ltswitchgt choose by expression
    evaluation ltwhilegt iteration ltscopegt nest,
    with declarations and handlers, synchronize
  • communication ltinvokegt send msg to partner
    possibly receive response ltreceivegt accept msg
    from partner ltreplygt send msg to partner as
    response to ltreceivegt
  • other ltassigngt manipulate variables ltwaitgt f
    or duration / until time ltterminategt end the
    process ltcompensategt run compensation handler
    of inner scope ltthrowgt exit with fault to
    outer scope ltemptygt do nothing

From Peter Furniss, Choreology
35
BPEL Activity
From http//www.ebpml.org/bpel4ws.htm
36
links
  • support Directed Activity Graph style
  • activities can be source and target of links
  • activity with target links does not run till
    source completes normally
  • links can cross structured activity boundaries
  • Why links AND structure ?
  • BPEL is merge of two specifications and
    approaches

From Peter Furniss, Choreology
37
handlers
  • handlers are declared for process or scope
  • watching for the lifetime of the scope
  • eventHandler
  • onMessage a ltreceivegt that could happen any
    time
  • onAlarm time dependent
  • faultHandler
  • catches ltthrowgtn or generated fault
  • scope exits abnormally
  • watching after the scope has exited normally
  • compensationHandler

From Peter Furniss, Choreology
38
compensationHandler
  • compensationHandler is installed when scope ends
  • triggered from faultHandler or compensationHandler
    of enclosing scope
  • uninstalled when process ends
  • BUT
  • whole process can have a compensationHandler
  • triggered by unspecified means
  • uninstalled when unspecified

From Peter Furniss, Choreology
39
XPDL from WfMC
  • One of the oldest workflow languages
  • Transition-based (with guarded transitions)
  • Activities are related to form a control flow
  • Activity unit of work, which will be processed
    by a combination of resource and/or computer
    application
  • Activity can be manual (user) or automatic
    (application)
  • Automatic activity invoke or receive
  • Supports subflow
  • In/Out parameters for data exchange
  • No support for faults or exception handling
  • Participant types
  • Resource set, Resource, Organisational Unit,
    Role, Human or System
  • No support for long running processes

40
Data Type Definitions
  • Standard types String and Number
  • User defined types
  • Type match undertaken manually
  • Namespaces and XML Schema
  • Identify reference to a namespace via a URI

41
Enactment Strategies I
  • Centralised Enactor
  • Single graph coordinated through a centralised
    enactor
  • The enactor manages execution of components in
    some sequence
  • Distributed Enactors
  • Graph divided into sub-graphs and handed to
    different enactors
  • Each enactor responsible for executing local
    graph

42
Enactment Strategies II
  • Event-based
  • Each component on completion generates an event
  • Use of publish-subscribe mechanism
  • Each component also activated through the
    generation of an event
  • Can have multiple event types
  • Blackboard/Shared memory
  • Component/Enactor writes to a shared space
  • Monitored by components/enactor
  • Blocks on availability of particular data items
    in shared space

43
Enactment for Automated Composition
  • Enactment engine enlists use of other components
  • Discovery Service
  • Planning Engine
  • Enactment is goal-oriented
  • Define requirement, rather than components
  • Conflict detection support
  • Mechanisms to chose between alternatives
    (constraints)
  • Difficult to do in practice

44
Workflow ( Enactment)
From Aleksander Slominski
Launch, configure And control
Orchestration Service
Workflow Engine
Workflow Instance
Workflow Instance
Workflow Instance
Resource layer 1000s of PCs -gtmassive
supercomputers and data sources
Network
45
Scientific Workflows
  • What makes it different (how it is applied)?
  • Support for large data flows
  • Need to do parameterized execution of large
    number of jobs
  • Need to monitor and control workflow execution
    including ad-hoc changes
  • Need to execute in dynamic environment where
    resources are not know a priori and may need to
    adapt to changes
  • Hierarchical execution with sub-workflows created
    and destroyed when necessary
  • Science Domain specific requirements.
  • Triana
  • Taverna/SCUFL
  • GridAnt
  • Condor DAG
  • CoG DAG
  • SWFL
  • BioOpera
  • BEPL4WS
  • OASIS WSBEPL
  • YAWL
  • GSFL
  • etc
  • Origin (?)
  • Problem Solving Environments
  • (MatLab, Mathematica, SciRun, NetSolve, Ninf,
    Nimrod etc)

http//www.nesc.ac.uk/action/esi/contribution.cfm?
Title303
http//www.extreme.indiana.edu/swf-survey/
Problems with Predictability
Workflow World
46
Enactment Engines
  • Employ a variety of techniques for enactment
  • Integrated with a Portal others based on a
    command line interface (some also provide a
    scripting language)
  • Generally for constructing graphs others also
    support execution of components within a graph
  • Support for third-party services
  • Monitoring, Registry, etc

47
Why Wrap Legacy Codes as Components?
  • Pre-existing codes, mostly in C or Fortran
  • Generally domain-specific
  • Hard to re-use in other applications
  • They are still useful
  • They are often large, complex monoliths with
    little structure.
  • Support Re-use
  • Support Remote Execution
  • Support Remote Discovery
  • Support Remote Data Input/Output

Re-write? - try convincing App Scientists
48
Wrapping Approaches
Similar name in DBs, but different approach
  • Wrapping executables - As-Is Approach
  • No source available (or provided)
  • Maintain execution environment
  • Wrapping Source - Source-Update Approach
  • Some source provided (generally I/O)
  • Executable can relinquish some control
  • Data type conversions
  • Source split Wrapping - Unit-Mapping Approach
  • Split source into units -- wrap units
  • Maintain unit execution environment overall
    manager
  • Application Supported Wrapping - App-Wrap
  • Steering support
  • Data management support

49
Wrapping Approaches
Similar name in DBs, but different approach
  • Wrapping executables - As-Is Approach
  • No source available (or provided)
  • Maintain execution environment
  • Wrapping Source - Source-Update Approach
  • Some source provided (generally I/O)
  • Executable can relinquish some control
  • Data type conversions
  • Source split Wrapping - Unit-Mapping Approach
  • Split source into units -- wrap units
  • Maintain unit execution environment overall
    manager
  • Application Supported Wrapping - App-Wrap
  • Steering support
  • Data management support
  • Provide Isolation between existing code, in its
    present
  • form, and need to re-use and execute code
    remotely
  • Enable properties of code to be specified (in
    terms,
  • perhaps of its interface), to enable a discovery
  • mechanism to utilise in, say, a particular
    application.
  • Sustain performance, correctness of results,
    ownership,
  • and availability

50
Automating Wrapping
  • Time consuming and error prone process
  • Automate the implementation of interfaces to
    access code
  • via a system wide data model
  • Automate interactions between wrapped components
  • via a discovery service - registry to a more
    complicated lookup service
  • Can have
  • same interface, different implementation

51
Component Model and Extensions
Existing Code
52
Component Model and Extensions
Existing Code
53
Component Model and Extensions
ltpse-defgt ltprefacegt ltname alt"MD1"
id"MD01"gt MDComponentlt/namegt ltpse-typegt
Molecular Dynamics lt/pse-typegt
ltcomponent-directorygt/home/scmlm1/wgen/Componentlt/
component-directorygt ltlegacy-codegt/home/scmlm
1/md/moldynlt/legacy-codegt ltORB-Compilergtidl2j
avalt/ORB-Compilergt ltprocessorsgt8lt/processorsgt
lthost-namegtsapphire.cs.cf.ac.uklt/host-namegt
lt/prefacegt
ltoutportsgt ltoutportnumgt 6 lt/outportnumgt
ltoutport id"1"gt int lt/outportgt ltoutport
id"2"gt float lt/outportgt ltoutport id"3"gt
float lt/outportgt ltoutport id"4"gt float
lt/outportgt ltoutport id"5"gt float lt/outportgt
ltoutport id"6"gt float lt/outportgt lthref
name"file/home/scmlm1/wgen/Component/output.data
" value"output" /gt lt/outportsgt lt/portsgt
XML Data Model
Existing Code
54
Component Model and Extensions
Existing Code
External Control Input (for Steering)
55
Component Model and Extensions
Data Manager
Existing Code
Runtime support
56
Component Model and Extensions
Data Manager
Existing Code
Runtime support
Execution Rules
57
Promoter Identification Workflow
Source Matt Coleman (LLNL)
58
Source NIH BIRN (Jeffrey Grethe, UCSD)
59
Ecology GARP Analysis Pipeline for Invasive
Species Prediction
Source NSF SEEK (Deana Pennington et. al, UNM)
60
http//www.gridlab.org/
http//www.trianacode.org/
61
GridLab Implementation
http//www.trianacode.org/
GAP Interface
GAT
Gridlab Services
JXTAServe
P2PS
WSPeer
JXTA
Sockets
Web Services
OGSA Services
62
Java GAT Prototype
GAP (Java Prototype)
  • Advertising
  • Discovery
  • Communication

OGSA (planned)
Jxta
Web Services
P2PS
And more..
Jxtaserve
GSI Enabled
NS-2
Job Submission (GRMS)
  • Generic Job Submission
  • Virtual filename data access
  • Set of generic Java interfaces
  • high level abstractions to Grid services
  • Factory design dynamic pluggable services

Data Management
GridLab GAT (www.gridlab.org)
63
Triana Architecture
  • Plug-in Applications
  • flexible apps can use Triana in various ways,
    as a
  • GUI
  • remote control GUI
  • or in full inc. GAP/GAT

Triana Engine
TCS
Command Service Control
3rd Party Application
Triana TaskGraph Reader
Triana Command Reader
Triana TaskGraph Writer
Triana Command Writer
XML Reader
WSFL Reader
TCom Reader
Other Reader
XML Writer
Other Writer
TCom Writer
Other Writer
3rd Party Application


Interactive
Applications Insert Points
Interactive/Offline
Communication Channels
64
Triana Distributed Work-flow
Triana Service Engine
Triana Service Engine
Action Commands
Workflow, e.g. BPEL4WS
Network
  • Distributed Triana Work-flow
  • flexible distribution based around Triana
    Groups
  • HPC and Pipelined distribution

Triana Controlling Service (TCS)
Triana Service Engine
Triana Engine
Other Engine
Triana Gateway
65
Distributing Triana Taskgraphs
  • Mapping tasks or groups of tasks to resources
  • Two stages
  • Taskgraph annotation, XML definition for each
    task or group of tasks
  • extended to specify resources and message
    channels
  • Data distribution, annotated sub-sections of
    taskgraph passed to resources

66
Custom Distribution
  • Distribution units are standard Triana tools,
    enabling users to create their own custom
    distributions

67
Remote Deployment
  • User can distribute any task or group of tasks
    (sub-workflow)
  • Using the GAP Interface, Triana automatically
    launches a remote service providing that
    sub-workflow.
  • Input, Output and Control Pipes are connected
    using the current GAP binding (e.g. JXTA Pipes)

68
Deploying and Connecting To Remote Services
  • Running services are automatically discovered via
    the GAP Interface, and appear in the tool tree
  • User can drag remote services onto the workspace
    and connect cables to them like standard tools
    (except the cables represent actual JXTA/P2PS
    pipes)

Remote Services
69
Web Service Discovery 1
  • Triana allows users to query UDDI repositories
  • Alternatively, users can import services directly
    from WSDL

70
Web Service Discovery 2
  • Discovered/Imported Web Services are converted
    into Triana tools
  • (service name tool name)
  • (input message parts in nodes)
  • (output message parts out nodes)
  • etc
  • Web Service tools are displayed in the users
    Tool Tree (alongside local tools)

71
Connecting Workflows
  • Web Service tools can be dropped onto the users
    workspace and connected like local tools
  • A workflow can contain both local and Web Service
    tools

72
Complex Data Types
  • Users can build their own interface for
    creating/mediating between complex types
  • Alternatively, Triana can dynamically generate an
    interface from the WSDL2Java generated bean class

73
GEMSS Maxillo-facial Surgery Simulation
74
GEO 600 Inspiral Search
  • Background
  • Compact binary stars orbiting each other in a
    close orbit
  • among the most powerful sources of gravitational
    waves
  • As the orbital radius decreases a characteristic
    chirp waveform is produced - amplitude and
    frequency increase with time until eventually the
    two bodies merge together
  • Computing
  • Need 10 Gigaflops to keep up with real time data
    (modest search..)
  • Data 8kHz in 24-bit resolution (stored in 4
    bytes) -gt Signal contained within 1 kHz 2000
    samples/second
  • divided into chunks of 15 minutes in duration
    (i.e. 900 seconds) 8MB
  • Algorithm
  • Data is transmitted to a node
  • Node initialises i.e. generates its templates
    (around 10000)
  • fast correlates its templates with data

75
Coalescing Binary Search
76
Triana Prototype
GEO 600 Coalescing Binary Search
77
Coalescing Binary Scenario
Controller
Email, SMS notification
Logical File Name
GW Data Distributed Storage
GAT (GRMS, Adaptive)
GW Data
  • Submit Job
  • Optimised Mapping

GAT (Data Management)
CB Search
Gridlab Test-bed
78
Discovery Net Workflow
  • Workflow Construction
  • Integrate information resources/software
    applications cross-domain
  • Warehousing workflows for scientists
  • Manage discovery processes within an organisation
  • Construct an enterprise process knowledge bank
  • Deployment workflow to scientists
  • Turn a workflows into reusable applications/servic
    es

79
An Integrative Analysis Example
80
The KEPLER/Ptolemy II GUI (Vergil)
Directors define the component interaction
execution semantics
Large, polymorphic component (Actors) and
Directors libraries (drag drop)
81
Actor-Oriented Design
  • Object orientation

What flows through an object is sequential
control (cf. CCA, MPI)
class name
data
methods
call
return
What flows through an object is a stream of data
tokens (in SWFs/KEPLER also references!!)
  • Actor/Dataflow orientation

actor name
data (state)
parameters
Input data
Output data
ports
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
82
Object-Oriented vs.Actor-Oriented Interfaces
  • Actor/Dataflow
  • Oriented

Object Oriented
OO interface gives procedures that have to be
invoked in an order not specified as part of the
interface definition.
AO interface definition says Give me text and
Ill give you speech
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
83
Ptolemy II Actor-Oriented Modeling
  • Director acts as an enactor
  • In this instance, interaction semantics are not
    maintained within a component
  • This is equivalent to having a centralised
    enactor
  • Different directors for different modeling and
    execution needs
  • Hence, a variety of directors can operate on the
    same components
  • Better abstraction, modeling, component reuse,

84
Behavioral Polymorphism in Ptolemy
These polymorphic methods implement the
communication semantics of a domain in Ptolemy
II. The receiver instance used in communication
is supplied by the director, not by the
component. (cf. CCA, WS-??, GBPL4??, !)
IOPort
Behavioral polymorphism is the idea that
components can be defined to operate with
multiple models of computation and multiple
middleware frameworks.
consumer
producer
actor
actor
Receiver
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
85
Component Composition Interaction
  • Components linked via ports
  • Dataflow (and msg/ctl-flow)
  • Where is the component interaction semantics
    defined??
  • each component is its own director!
  • But still useful for special applications, e.g.
    parallel programs (MPI, )

Source GRIST/SC4DEVO workshop, July 2004, Caltech
86
Data/Control-Flow Spectrum
message passing, control flow
clean data(ctl)-flow
special tokens flow
  • Data (tokens) flow
  • (almost) no other side effects
  • WYSIWYG (usually)
  • References flow
  • token reference type may be http-get,
    ftp-get, hsi put
  • generic handling still possible
  • Application specific tokens flow
  • e.g. current Nimrod job management in Resurgence
  • invisible contract between components
  • Director is unaware of whats going on (sounds
    familiar? -)
  • Specific messages passing protocols (e.g., CSP,
    MPI)
  • for systems of tightly coupled components

87
Domains and Directors Semantics for Component
Interaction
  • CI Push/pull component interaction
  • CSP concurrent threads with rendezvous
  • CT continuous-time modeling
  • DE discrete-event systems
  • DDE distributed discrete events
  • FSM finite state machines
  • DT discrete time (cycle driven)
  • Giotto synchronous periodic
  • GR 2-D and 3-D graphics
  • PN process networks
  • SDF synchronous dataflow
  • SR synchronous/reactive
  • TM timed multitasking

For (finer-grained) concurrent jobs!?
For (coarse grained) Scientific Workflows!
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
88
Polymorphic Actor Components Working Across Data
Types and Domains
  • Actor Data Polymorphism
  • Add numbers (int, float, double, Complex)
  • Add strings (concatenation)
  • Add complex types (arrays, records, matrices)
  • Add user-defined types
  • Actor Behavioral Polymorphism
  • In dataflow, add when all connected inputs have
    data
  • In a time-triggered model, add when the clock
    ticks
  • In discrete-event, add when any connected input
    has data, and add in zero time
  • In process networks, execute an infinite loop in
    a thread that blocks when reading empty inputs
  • In CSP, execute an infinite loop that performs
    rendezvous on input or output
  • In push/pull, ports are push or pull (declared or
    inferred) and behave accordingly
  • In real-time CORBA, priorities are associated
    with ports and a dispatcher determines when to
    add

By not choosing among these when defining the
component, we get a huge increment in component
re-usability. But how do we ensure that the
component will work in all these circumstances?
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/
89
Directors and Combining Different Component
Interaction Semantics
  • Possible app. in SWF
  • time-series aware
  • parameter-sweep aware
  • XY aware
  • execution models

Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/ptolemyII/
90
Web Services ? Actors (WS Harvester)
1
2
4
3
  • ? Minute-made (MM) WS-based application
    integration
  • Similarly MM workflow design sharing w/o
    implemented components

91
KEPLER Actors
92
FAEHIM
  • Use of Web Services composition with
    distributed services
  • Wrap third party services (Mathematica, GNUPlot)
  • WEKA Service template
  • Triana Workflow
  • Services provided by third parties
  • WSDL interfaces (avoid use of specialist
    languages unless really necessary)
  • SOAP-based message exchange
  • Use of attachments
  • Access to local and remote data sets
  • Support for data streaming
  • Wrapping of existing algorithms (important
    requirement)

http//users.cs.cf.ac.uk/Ali.Shaikhali/faehim/
93
FAEHIM Architecture
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
Demo 1
99
Demo 2
100
Inside the FAEHIM Toolbox
101
Usage Overview
Dataset
Classify
Weka Algorithm
URL
ClassifyR
102
Classifier
  • J48 Classifier
  • Class for generating an (un)pruned C4.5 decision
    tree. For more information, see Ross Quinlan
    (1993). C4.5 Programs for Machine Learning,
    Morgan Kaufmann Publishers, San Mateo, CA.
  • Operationsclassify( )Input DataHandler
    dataset, String attributeNameoutput DataHandler
    decisionTree
  • classifyRemoteDataset( )Input String url,
    String attributeNameoutput DataHandler
    decisionTree

103
Clustering Support
  • Cobweb Web Service
  • Operations
  • cluster( )Input DataHandler datasetoutput
    String result
  • clusterRemoteInstance( )Input String
    datasetURL
  • output String resultclusterByPercentage(
    )Input DataHandler dataset, int
    percentageoutput String result

104
Graph Plotting Service
  • Plotting Web Service
  • Operations
  • plot3D( )Input DataHandler data, String
    plotTypeoutput DataHandler graph
  • getPlotTypes( )Input nulloutput String
    plotTypes

105
Registry Usage
Execution Resource
UDDI
  • Discover
  • Select
  • Invoke

Registry of Algorithms
Algorithm 1
Algorithm 1
Algorithm 2
Web Service
106
Classifier ... 2
  • Classifier Template
  • This Web service implements a complete list of
    classifiers, i.e. trees, rules, functions etc.
    OperationsclassifyInstance()
  • classifyRemoteInstance()getClassifiers( )
  • getOptions()

?
?
?
?
Input DataHandler dataset String
classifierName String options String
attributeName output String result
Input null output String listOfClassifiers
Input String classifierName output String
listOfApplicableOptions
Input String datasetURL String classifierName
String options String attributeName output
String result
107
Parallel Execution
Resource Allocation Manager
UDDI
  • Discover
  • Select
  • Invoke

Registry of Algorithms
Algorithm 1
Algorithm 1
Algorithm 2
Algorithm 1
Algorithm 1
Algorithm 1
Algorithm 1
Execution Resource
Execution Resource
Execution Resource
Execution Resource
108
Distributed Workflow
Community
Performance Info.
WF Enactor
Service Provider
Manager
Registry
109
Workflow Optimisation
  • Types of workflow optimisation
  • Through service selection
  • Through workflow re-ordering
  • Through exploitation of parallelism
  • When is optimisation performed?
  • At design time (early binding)
  • Upon submission (intermediate binding)
  • At runtime (late binding)

110
Workflow Partitioning (Pegasus)
  • Full Graph vs Partial Graph Scheduling
  • Schedule
  • Total workflow Graph
  • Sub-graph
  • Each node

111
Service Binding Models
  • Late binding of abstract service to concrete
    service instance means
  • We use up-to-date information to decide which
    service to use when there are. multiple
    semantically equivalent services
  • We are less likely to try to use a service that
    is unavailable.

112
Late Binding Case
  • Search registry for all services that are
    consistent with abstract service description.
  • Select optimal service based on current
    information, e.g, host load, etc.
  • Execute this service.
  • Doesnt take into account time to transfer inputs
    to the service.
  • In early and late binding cases we can optimise
    overall workflow.

113
WOSE Architecture
Work at Cardiff has focused on implementing a
late binding model for dynamic service discovery,
based on a generic service proxy, and service
discovery and optimisation services.
114
Service Discovery Issues
  • Service discovery and optimisation is based on
    service metadata.
  • Could store in a database.
  • Could obtain by interrogating service.

115
Optimisation by Re-Ordering
  • Work at Imperial has looked at static
    optimisation
  • Optimise the runtime execution of workflow before
    it is executed
  • Achieves the goal through
  • Re-ordering of components
  • Addition of components
  • Substitution of components
  • Pruning of the workflow
  • Performance and workflow aware Scheduling
  • Runtime Optimisation
  • through monitoring, check-pointing and migration

116
Component Manipulation
  • Re-ordering Workflows (often composed from
    composite workflows) may contain non-optimal
    ordering of components
  • Use re-ordering to improve performance

117
Component Addition
  • Addition For a component requiring a specific
    format of data as input, a transformer component
    could be added to achieve the desired format.
  • Allows more optimal components to be used
    together

Input required in MPS format
Output in LP format
C 1
LP to MPS
C 2
118
Component Substitution
  • Substitution
  • A Jacobi Iteration linear solver replaced by
    Conjugate Gradient linear solver according to the
    output of the Discretizer (FEM)
  • Based on observing the meta-data associated with
    previous components

A (sparse and diagonally dominant)
JI linear solver
FEM
b
119
Pruning
  • Workflow Pruning
  • Workflows may contain unused components.
    Especially when composed from other sub-workflows
  • Remove redundant components

120
Performance Aware Scheduling
Globus Resources
Performance Repository
Globus Launcher
JSDL
Query
Component Repository
Single Resource Launcher
Scheduler
Query
Request Reservation
SGE Resources
Reservation Launcher
Reservation Service
Negotiate Reservation WS-Agreement
121
Execution Pipeline
122
Workflow Patterns
  • Identify and reuse common idioms in some
    scientific domain and across different scientific
    domains.
  • An idiom captures common knowledge and
    experience and describe how a similar set of
    experiments are to be set-up and managed.

From Cecilia Gomes
123
Usage
  • To allow computational scientists and developers
    to capture design patterns that express common
    usage of software infrastructure within
    scientific domains
  • To provide a software engineering tool that
    supports
  • application configuration,
  • execution control, and
  • reconfiguration of software services

From Cecilia Gomes
124
Approach
  • Patterns are divided in two categories for
    flexibility
  • Co-ordination (Behavioural) patterns
  • Capture interactions between software sub-systems
  • Structural patterns
  • Capture connectivity between particular types of
    Grid software/hardware components

From Cecilia Gomes
125
Approach
  • Patterns as first class entities both at design,
    execution, and reconfiguration times
  • Pattern templates are manipulated through Pattern
    Operators
  • Structural operators
  • Behavioural operators

From Cecilia Gomes
126
Structural Pattern Templates
  • Encode component connectivity. Ex Pipeline,
    Ring, Star, Façade, Adapter, Proxy.

From Cecilia Gomes
127
Structural Operators
  • Manipulate structural patterns keeping their
    structural constraints.
  • Examples
  • Increase, Decrease,
  • Extend, Reduce,
  • Embed, Extract,
  • Group,
  • Rename/Reshape,

From Cecilia Gomes
128
Structural Operators
  • Manipulate structural patterns keeping their
    structural constraints.
  • Examples
  • Increase, Decrease,
  • Extend, Reduce,
  • Embed, Extract,
  • Group,
  • Rename/Reshape,

From Cecilia Gomes
129
Increase Structural Operator
Pattern
Result Pattern
From Cecilia Gomes
130
Extend Structural Operator
Pattern
Result Pattern
From Cecilia Gomes
131
Behavioural Pattern Templates
  • Capture temporal or (data/control) flow
    dependencies between components.
  • Examples
  • Client/Server,
  • Master/Slave,
  • Streaming,
  • Service Adapter,
  • Service Migration,
  • Broker Service
  • Service Aggregator/Decomposer,

From Cecilia Gomes
132
Behavioural Operators
  • Act over the temporal or flow dependencies for
    execution control and reconfiguration.
  • Examples
  • Start, Terminate,
  • Log,
  • Stop, Resume,
  • Restart, Limit,
  • Repeat,

From Cecilia Gomes
133
Pattern Operators - example
  • main
  • P1P2
  • section1
  • Rename(P1,P2)
  • Replicate(P2,3)
  • Owner(P2, scmofr)
  • section2
  • Start(P2)
  • Log(P2)
  • Limit(30,P2)
  • section1 section2

From Cecilia Gomes
134
Workflow Planning/Adaptation
  • Goal-oriented
  • Abstract ? Concrete workflow translation
  • May utilise a number of different infrastructure
    services (Pegasus)
  • Level of automation can vary
  • Find components
  • Find sub-workflows
  • Find infrastructure services
  • Publish output data at specific locations

135
Chimera is developed at ANL By I. Foster, M.
Wilde, and J. Voeckler
From Ewa Deelman
136
HTN Planning (Activity Composition)
HTN Planning Use of Methods (task decomp)
and Operators (task execution)
Introduce activities to achieve
preconditions Resolve interactions between
conditions and effects Handle constraints (e.g.
world state, resource, spatial, etc.)
From Austin Tate (Edinburgh)
137
HTN Planning (Initial Plan Stated as Goals)
Initial Plan can be any combination of Activities
and Constraints
From Austin Tate (Edinburgh)
138
Composer Enactor
From Austin Tate (Edinburgh)
139
Product Model Refinement Step Using ltI-N-C-Agt
Framework
From Austin Tate (Edinburgh)
140
BDI Planning
  • Situated so actions, percepts, time
  • Fire engine
  • See nearby fires road conditions, hear messages
    from other agents, hear civilian calls for help.
  • Move, squirt, tell (broadcast), say, plan route
    (internal)

Percepts
Choose an action a ? As
Action
From Michael Winikoff, RMIT
141
BDI Planning 2
  • Reactive so events(significant occurrence)
  • New fire, fire extinguished, fire urgent, help
    requested
  • Proactive so goals
  • Put out fire, discover fire, assist, coordinate

From Michael Winikoff, RMIT
142
BDI Planning 3
  • Implementation uses plans and beliefs
  • Cache for means, and world information
    respectively
  • Beliefs Map (incl. fires, buildings), fire
    assignment and priority
  • Plans Put out fire, roam,

Percepts
Events
Beliefs
Goals
Actions
Action
Plans
From Michael Winikoff, RMIT
143
BDI agents (based on AgentSpeak(L))
  • Chosen plan added to intention stack (can be
    either an event (posted) or action (executed))

144
BDI-based Enactor
  • Enactor can maintain local plan library
  • update of plan library as new conditions are
    detected
  • Useful in a dynamic environment (Grid) -- as
    agents are goal directed
  • Execution of a plan leads to update of beliefs
  • useful mechanism to adapt agent behaviour in a
    Grid context
  • Potentially useful to allow detection of plan
    conflicts
  • Traditional approach
  • number of tasks fixed, resources identical
  • fixed number of resources, tasks pre-defined
  • Delegate scheduling priorities to each resource
    and task agent (no central schedulers)

145
Planning as Model Checking
  • Planning based on
  • Non-determinism cannot predict interactions with
    external processes
  • Partial Observability can only observe external
    interactions (as BPEL) not internal status
  • Extended Goals behaviour of the process is
    important, and not just the final goal
  • Conditional Preferences may require multiple
    conditions to hold for goal to be satisfied
  • Given current state, evaluate possible likely
    states (may require an exhaustive checking of
    possibilities)

146
Planning
Context captures state
147
Web Services Modelling Ontology (WSMO)
  • Use of Semantic Web Services to aid automated
    composition
  • Given a goal, identify how services could be
    composed to achieve the goal
  • Specifies a complete set of infrastructure that
    is necessary to achieve this
  • Provides three main components
  • Web Services Modelling Ontology
  • Web Services Modelling Language
  • Execution Environment

From John Domingue, Open University
148
WSMO Working Groups

A Conceptual Model for SWS
A Formal Language for WSMO
Execution Environment for WSMO
A Rule-based Language for SWS
From John Domingue, Open University
149
WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
  • Semantic description of Web Services
  • Capability (functional)
  • Interfaces (usage)

Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
150
Conceptual Model (WSMX-O)
151
Non-Functional Properties List
Dublin Core Metadata Contributor Coverage
Creator Description Format Identifier
Language Publisher Relation Rights Source
Subject Title Type
Quality of Service Accuracy NetworkRelatedQoS Pe
rformance Reliability Robustness Scalability
Security Transactional Trust
Other Financial Owner TypeOfMatch Version
Service Descriptions make use of this
From John Domingue, Open University
152
WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
  • Semantic description of Web Services
  • Capability (functional)
  • Interfaces (usage)

Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
153
Ontology Specification
  • Non functional properties (see before)
  • Imported Ontologies importing existing
    ontologies where no heterogeneities arise
  • Used mediators OO Mediators (ontology import
    with terminology mismatch handling)
  • Ontology Elements
  • Concepts set of concepts that belong to the
    ontology, incl.
  • Attributes set of attributes that belong to a
    concept
  • Relations define interrelations between several
    concepts
  • Functions special type of relation (unary range
    return value)
  • Instances set of instances that belong to the
    represented ontology
  • Axioms axiomatic expressions in ontology (logical
    statement)

From John Domingue, Open University
154
WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
  • Semantic description of Web Services
  • Capability (functional)
  • Interfaces (usage)

Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
155
WSMO Web Service Description
  • complete item description
  • quality aspects
  • Web Service Management
  • Advertising of Web Service
  • Support for WS Discovery

Capability functional description
Non-functional Properties DC QoS Version
financial
  • realization of functionality by aggregating
  • other Web Services
  • functional
  • decomposition
  • WS composition
  • client-service interaction interface for
    consuming WS
  • External Visible
  • Behavior
  • - Communication
  • Structure
  • - Grounding

Web Service Implementation (not of interest in
Web Service Description)
Choreography --- Service Interfaces ---
Orchestration
From John Domingue, Open University
156
Capability Specification
  • Non functional properties
  • Imported Ontologies
  • Used mediators
  • OO Mediator importing ontologies with mismatch
    resolution
  • WG Mediator link to a Goal wherefore service is
    not usable a priori
  • Pre-conditions What a web service expects in
    order to be able to
  • provide its service. They define conditions
    over the input.
  • Assumptions Conditions on the state of the
    world that has to hold before
  • the Web Service can be executed
  • Post-conditions
  • describes the result of the Web Service in
    relation to the input,
  • and conditions on it
  • Effects
  • Conditions on the state of the world that hold
    after execution of the
  • Web Service (i.e. changes in the state of the
    world)

From John Domingue, Open University
157
Service Interface Description Model
  • Vocabulary ?
  • ontology schema(s) used in service interface
    description
  • usage for information interchange in, out,
    shared, controlled
  • States ?(O)
  • a stable status in the information space
  • defined by attribute values of ontology instances
  • Guarded Transition GT(?)
  • state transition
  • general structure if (condition) then (action)
  • different for Choreography and Orchestration

From John Domingue, Open University
158
WSMO Top Level Notions
Objectives that a client wants to achieve by
using Web Services
Provide the formally specified terminology of the
information used by all other components
  • Semantic description of Web Services
  • Capability (functional)
  • Interfaces (usage)

Connectors between components with mediation
facilities for handling heterogeneities
From John Domingue, Open University
Write a Comment
User Comments (0)
About PowerShow.com