AI Planning and Knowledgebased approaches to workflow - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

AI Planning and Knowledgebased approaches to workflow

Description:

Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) ... Investigating patterns of data descriptions for more efficient planning. Digital sky survey ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 41
Provided by: jimb172
Category:

less

Transcript and Presenter's Notes

Title: AI Planning and Knowledgebased approaches to workflow


1
AI Planning and Knowledge-based approaches to
workflow
  • Jim Blythe,
  • Ewa Deelman, Yolanda Gil, Carl Kesselman
  • USC Information Sciences Institute
  • http//www.isi.edu/ikcap/cognitive-grids

2
The connection between workflows and the semantic
grid
  • Intelligent workflow maintenance systems require
    semantics at several levels
  • purpose of each computational task
  • explicit constraints
  • reasons for choices policies of security, access
    rights, fairness
  • The appropriate semantics for data depend on the
    operations that are performed in workflows
  • Workflow creation and maintenance can determine
    minimal semantic requirements for data
    descriptions

3
Outline
  • Requirements for intelligent workflow support
  • Our work to date on Pegasus at ISI
  • AI planning to generate executable workflows for
    grids
  • Used in GriPhyNs LIGO pulsar search application
  • How we use semantic information
  • Knowledge required for intelligent workflow
    maintenance
  • Where planning knowledge comes from
  • Pervasive knowledge sources and intelligent
    reasoners, smart workflows

4
Example LIGO Experiment(Laser Interferometer
Gravitational-Wave Observatory)
  • Aims to detect gravitational waves predicted
  • by theory of relativity.
  • Can be used to detect
  • binary pulsars
  • mergers of black holes
  • starquakes in neutron stars
  • Two installations in Louisiana (Livingston) and
    Washington State
  • Other projects Virgo (Italy), GEO (Germany),
    Tama (Japan)
  • Data collected during experiments is a collection
    of time series (multi-channel)
  • Analysis is performed in time and Fourier domains

5
LIGOs Pulsar Search(Laser Interferometer
Gravitational-wave Observatory)
Extract channel
Short Fourier Transform
transpose
Long time frames
30 minutes
Short time frames
Single Frame
Time-frequency Image
Extract frequency range
event DB
Construct image
Find Candidate
Store
6
Motivation using todays grid
  • Users have high level requirements naturally
    stated in terms of the application domain
  • Ex Obtain frequency spectrum for signal S in
    instrument I and timeframe T
  • Users have to turn these requirements into
    executable job workflows in detailed scripts
  • must figure out which code generates desired
    products, required inputs as files, physical
    location of the files, hosts that support
    execution given code requirements, availability
    of hosts, access policies, etc.
  • must query Grid middleware metadata catalog,
    replica locator, resource descriptor and
    monitoring, etc.
  • Users must oversee execution

7
Challenges for intelligent workflow support
  • Usability users should not need to be be
    proficient in grid computing
  • Complexity many interrelated choices and dead
    ends
  • Solution cost feasible solutions are already
    hard
  • Global cost optimization with contention and
    collaboration among many users
  • Reliability of execution failure-driven workflow
    repair

8
Outline
  • What we need for intelligent workflow support
  • Our work to date on Pegasus at ISI
  • AI planning techniques exploit knowledge to
    generate executable job workflows for grids
  • Used in GriPhyNs LIGO pulsar search application
  • How we use semantic information
  • Knowledge required for intelligent workflow
    maintenance
  • Where planning knowledge comes from
  • Pervasive knowledge sources and intelligent
    reasoners, smart workflows

9
queue service
10
Existing tools for building workflowsVDL for
abstract workflow generation
  • Chimera
  • Input-output transforms specified on individual
    files, in Virtual Data Language

DV first1-createSFT( b_at_output"H2_SFT_LSC-AS-Q_
714384000_64.gwf", t1"714384000",
t2"714384063", format"frame",
channel"H2LSC-AS-Q", instrument"H2") DV
first2-createSFT( b_at_output"H2_SFT_LSC-AS-Q_714
384064_64.gwf", t1"714384064",
t2"714384127", format"frame",
channel"H2LSC-AS-Q", instrument"H2")
DV third1-pulsar(a_at_input"H2_sSFT_LSC-AS-Q_7143
84000_256_50_1.ilwd", b_at_output"H2_pulsar
_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_2.562
34.ilwd", t1"714384000", t2"714384255",
format"ilwd", channel"LSC-AS-Q",
fcenter"50.5", fband"0.004", instrument"H2",
ra"3.123643", de"2.56234", fderv1"0.0",
fderv2"0.0", fderv3"0.0", fderv4"0.0",
fderv5"0.0")
11
Existing tools 2 concrete planner
  • Assigns specific hosts and data locations for
    tasks
  • Makes random selection of resources and data
  • Provides a feasible solution
  • Reuses existing data products

INPUT
OUTPUT
12
Desired properties for a workflow generator
  • Allow users to refer to data requirements by
    descriptions, not file names
  • Intuitive, allows regression of metadata
    requirements
  • Model variety of constraints declaratively
  • Data dependencies, resource constraints, user
    access rights,
  • Allow more flexible reasoning, easier to maintain
  • Seek high quality workflows
  • Use general reasoning techniques allowing search

13
Workflow Generation as AI Planning
  • Goal (Provided by the user)
  • A metadata specification of the information the
    user requires and the desired location for the
    output file
  • Initial State (Automatically extracted from Grid
    environment)
  • Available hosts, queue lengths, locations for
    existing data,
  • Operators (Encoded for the application domain)
  • Represent application components and chosen host
  • File movements across the network
  • Heuristics as search control rules (Grid or
    application specific)
  • specify options that should be exclusively
    considered at any choice point in the search
    algorithm (e.g., execute close to the data)

14
Operator template
  • (action ?Application-component
  • parameters
  • (?host
  • ?output metadata
  • ?input metadata
  • precondition
  • (and (resource constraints)
  • (files for input metadata available at
    host))
  • effect
  • (files for output metadata created at host)
  • ))

15
Example operator from LIGO domain
  • (action pulsar-search
  • parameters
  • ((?host - (or Condor-pool Mpi)
  • ?file - File-Handle
  • ?start-time - Number
  • ?channel - Channel
  • ?fcenter - Number
  • ?right-ascension - Number
  • ?sample-rate - Number
  • Compute parameters for the
    frequency-extract.
  • ?f0 - (and Number (get-low-freq-from-center-and
    -band

  • ?fcenter ?fband))
  • ?fN - (and Number (get-high-freq-from-center-an
    d-band

  • ?fcenter ?fband))
  • ?run-time - (and Number
  • (estimate-pulsar-search-run-time
  • ?start-time ?end-time ?sample-rate
    ?f0 ?fN ?host ?run-time)))
  • precondition
  • effect
  • (and
  • (created ?file)
  • (at ?file ?host)
  • (add (pulsar ?start-time ?end-time ?channel
  • ?instrument ?format
  • ?fcenter ?fband
  • ?fderv1 ?fderv2 ?fderv3
    ?fderv4 ?fderv5
  • ?right-ascension ?declination ?sample-rate
  • ?file)
  • )
  • ))

16
Seeking high-quality workflowsUsing local
heuristics and global metrics
  • Need local heuristics since search space is
    intractable
  • e.g. prefer to run component on host with high
    bandwidth connection to where the output is
    required
  • Generate many plans and test a global metric
    (e.g. overall runtime) since local heuristics can
    lead to globally poor solution
  • Search control to eliminate redundant solutions

17
Grid-specific domain-independent..
  • (control-rule only-transfer-from-loc-with-greatest
    -bandwidth
  • (if (and (considering transfer-file)
  • (trying-to-achieve (at ?file ?dest))
  • (currently (at ?file ?loc1))
  • (currently (at ?file ?loc2))
  • (higher-bandwidth ?loc1 ?loc2 ?dest)))
  • (then reject value ?loc2 as source))

Grid-specific
Domain-specific
(control-rule prefer-mpi-to-condor-for-pulsar-sear
ch (if (and (considering pulsar-search)
(type-of ?mpi Mpi) (type-of ?condor
Condor-pool))) (then prefer value ?mpi as host
to ?condor as host))
18
Pegasus planning environment
19
Application LIGOs Pulsar Search
  • Used LIGOs data collected during the first
    scientific run of the instrument
  • Targeted a set of 1000 locations known pulsar or
    random locations
  • Performed 200 searches, in 100 hours runtime
    (planning time is negligible)
  • Results of the analysis published to the LIGO
    Scientific Collaboration
  • Used compute and storage resources at Caltech,
    University of Southern California, University of
    Wisconsin Milwaukee.

With A. Arbree, R. Cavanaugh, K. Blackburn, A.
Lazzarini, S. Koranda, G. Mehta, K. Vahi, S.
Patil, S. Rao, G. Singh,. Visualization by M.
Thiebaux
20
References
  • Publications in AI forums
  • The Role of Planning in Grid Computing Jim
    Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman,
    Amit Agarwal, Gaurang Mehta, Karan Vahi.
    International Conference on Automated Planning
    and Scheduling (ICAPS) 2003.
  • Transparent Grid Computing a Knowledge-Based
    ApproachJim Blythe, Ewa Deelman, Yolanda Gil,
    Carl Kesselman. Innovative Applications of
    Artificial Intelligence Conference (IAAI) 2003.
  • Publications in Grid forums
  • "Mapping Abstract Complex Workflows onto Grid
    Environments," Ewa Deelman, Jim Blythe, Yolanda
    Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi,
    Adam Arbree, Richard Cavanaugh, Kent Blackburn,
    Albert Lazzarini, Scott Koranda. Journal of Grid
    Computing, Vol. 1 No. 1, 2003.
  • Workflow Management in GriPhyN, Chapter in The
    Grid Resource Management book, E. Deelman, J.
    Blythe, Y. Gil, Carl Kesselman 2003

21
Outline
  • What we need for intelligent workflow support
  • Our work to date on Pegasus at ISI
  • AI planning techniques exploit knowledge to
    generate executable job workflows for grids
  • Used in GriPhyNs LIGO pulsar search application
  • How we use semantic information
  • Knowledge required for intelligent workflow
    maintenance
  • Where planning knowledge comes from
  • Pervasive knowledge sources and intelligent
    reasoners, smart workflows

22
Workflow planning types of knowledge used
  • Knowledge about application components and hosts
  • Constraints on appropriate hosts for components
  • Explicit preferences for workflow construction
    search
  • Knowledge about data
  • Input-output conditions for components
  • Requires sufficient information for regression
    through workflow
  • Focused file semantics

23
What workflow planning tells us about semantics
in the grid
  • Data and process semantics are closely related
  • Fuzzy boundary between data content descriptions
    and provenance (or reverse provenance)

SFT file for range B Using instrument 1 Using
algm 2 Run on sft.isi.edu Created on
9/17/03 For Jim
A
SFT algm 1
B
C
Instrument 1
SFT algm 2
A
B
C
Instrument 2
24
Integrating with distributed knowledge sources
Current system
Knowledge from several sources must be used
Info from Grid services (RLS, MCS etc)
task requirements
existing data in files
State info (files, resources)
Comp. selector
User policies
Monolithic planner
available resources
KBs combined in one location
Resource selector
Resource queues
Concrete tasks
Exec. monitor
Network bandwidth
Grid task schedulers
25
Where does knowledge used by our planners come
from?
task resource requirements
user policies preferences
  • (Operator
  • (preconditions
  • ..
  • ))
  • (effects
  • ..
  • ))

resource policies
data dependencies (VDL)
Each knowledge component is used for other
purposes beyond planning
26
Automatically generated operators for several
application domains
task resource requirements
  • (Operator
  • (preconditions
  • ..
  • ))
  • (effects
  • ..
  • ))


Digital sky survey LIGO GEO Galaxy
morphology Tomography
policies
data dependencies (VDL)
Investigating patterns of data descriptions for
more efficient planning
27
Longer-term goalIncremental Generation of Smart
Workflows
Users
Workflow refinement
Request
Levels of
abstraction
Policy reasoner
Application
Workflow repair
-level
knowledge
Relevant
components
Logical
tasks
Full
abstract
workflow
Tasks
bound to
Onto-based Matchmaker
resources
and sent for
Partial
execution
execution
Not yet
time
executed
executed
28
Summary
  • Intelligent workflow creation and maintenance
    requires semantic descriptions at many levels
  • Our experiences with implemented LIGO system show
    interesting relations between process and
    data semantics
  • Tremendous opportunity for AI techniques both
    flexible and expressive representations and
    reasoners
  • http//www.isi.edu/ikcap/cognitive-grids

29
Back-up slides
30
Technologies that contribute to the semantic grid
  • The semantic grid can provide
  • expressive representations
  • flexible reasoners
  • Many Artificial Intelligence (AI) techniques are
    relevant
  • Planning to achieve given requirements
  • Scheduling and resource allocation techniques
  • Search
  • Using and combining heuristics
  • Expressive knowledge representation languages
  • Reasoners that can incorporate rules,
    definitions, axioms, etc.

31
Representing appropriate information units with
metadata
  • Previously, application components specified in
    terms of specific files
  • DV run59000-extractSFTData( input_at_inputnSFT.
    59000",,_at_inputnSFT.59999,
  • output_at_output eSFT.59000,,_at_output
    eSFT.59999,
  • t1"714384000", t2"714384063",
    freq1008,band4,instrument"H2")
  • 59 similar clauses
  • DV final-computeFStatistic( input_at_inputeSFT.
    00000,,_at_inputeSFT.59999,)

1000 files
60000 files
32
Metadata representation
  • Replace with two clauses, two input predicates
  • Simpler to model, greater generality, more
    efficient for reasoner
  • (operator run-extractSFTData-range
  • (preconds
  • (( Number)
  • ( (and Number (
    0)))
  • ( (and Number
  • (gen-smaller-number 1000
    ))))
  • (and (range "eSFT" 2 1
    )
  • (range "nSFT" 2 1
    999)))
  • (effects ()
  • ((add (range "eSFT" 2
    )))))

33
Current Work
  • Knowledge-rich computational Grid in support of
    scientific communities
  • Experimental evaluation of performance Blythe,
    Deelman, R Yu (UT Austin)
  • Incorporating execution dynamics and replanning
  • Interactive workflow generation Kim Gil 03
  • Ontology-based resource matchmaking Decker
    Tangmunarunkit 03
  • Planning as a service Blythe Wu (U Maryland)
  • Migration to OGSA Gil Ratnakar

34
Related Work
  • Improving grids with algorithmic approaches
  • GRaDS, GriPhyN (Chimera)
  • Improving grids with knowledge/semantics
  • myGrid (semantic component matching)
  • Semantic grid, Knowledge grid
  • Planning techniques for software composition
  • Lansky et al 94 Chien et al 96 Golden et al
    02 McDermott 02 McIlraith et al 02

35
Need for intelligent infrastructure
  • Next generation IT and problem solving
    environments will require distributed,
    intelligent infrastructure that facilitates the
    collaboration between people, software, hardware,
    data and other infrastructure elements
  • Virtual Organizations
  • The Grid and current distributed intelligent
    systems technology provide critical pieces
  • Web services, OGSA, Semantic Web, Ontologies,
  • Essential to integrate these technologies
  • Bring intelligence to Grid infrastructure
  • Provide robust infrastructure to distributed
    intelligent systems

36
Benefits of knowledge-based approach to workflow
  • Easy to represent goals and components using
    declarative descriptions
  • Use general techniques to search for solutions
  • Explores alternatives, supports backtracking
  • Can incorporate declarative heuristics (as search
    control rules)
  • Allows easy addition of new constraints and rules
  • Incorporate optimality and policy into the search
    for solutions
  • Interleave decisions at various levels
  • Can integrate the generation of workflows across
    users and policies within virtual orgs.

37
Summary
  • The Future Grid
  • Knowledge-based reasoning about resources enables
  • Semantic matchmaking
  • Aggregate resource reasoning
  • Task-level reasoning to plan and schedule jobs
    and resources
  • More agility and coordination
  • Wide range of users can specify high level
    requirements in a mixed-initiative mode
  • Mapping of high-level requirements to details
    required for execution
  • End-to-end resource negotiation and adaptive
    strategies to accommodate failure
  • The Grid Now
  • Syntax-based matchmaking of resources to job
    requirements
  • Condor matchmaker
  • Attribute based discovery and selection
  • Scheduling of jobs based on Grid-able users that
    specify job execution sequences and computing
    requirements
  • Scripting languages
  • Workflow languages,
  • Task graphs
  • Explicit mappings from task to jobs, simple job
    brokers
  • Explicit service negotiation and recovery
    strategies

38
Interacting with related services
  • For example, a matchmaker suggests resources for
    an individual task. Several alternatives
  • Planner calls matchmaker as a service during
    planning
  • Planner calls matchmaker for relevant matches
    prior to planning, or in batch mode
  • Incorporate matchmakers knowledge in planning
    system
  • Planner builds abstract plans, matchmaker called
    for online scheduling

39
Many areas of planning research relevant for grid
  • Planning for a dynamic environment plan
    monitoring and repair, planning under uncertainty
  • Scheduling resource reasoning, temporal
    reasoning
  • Plan quality learning, acquiring preferences,
    local search planning
  • Planning for information gathering integrating
    access to grid services with workflow creation
  • Domain modeling handling multiple ontologies,
    acquiring metadata descriptions, acquiring
    operators

40
Conclusions
  • Implemented system takes data description
    requests from LIGO users, composes workflow and
    executes on the Grid
  • Many interesting challenges for planning and
    scheduling research from Grid applications
  • Relatively fixed set of services, arbitrary tasks
    in workflow
  • http//www.isi.edu/ikcap/cognitive-grids
  • http//www.isi.edu/deelman/pegasus.htm
Write a Comment
User Comments (0)
About PowerShow.com