A Gridbased Extensible, Composable Service Execution - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

A Gridbased Extensible, Composable Service Execution

Description:

Chiral Separation by Cyclocarbohydrates Konkuk University. MGrid: A Molecular Simulation Grid. Experiment: Chiral Separation Database ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 51
Provided by: gridfor
Category:

less

Transcript and Presenter's Notes

Title: A Gridbased Extensible, Composable Service Execution


1
A Grid-based Extensible, Composable Service
Execution
  • Karpjoo Jeong (jeongk_at_konkuk.ac.kr)
  • Konkuk University
  • Suntae Hwang (sthwang_at_kookmin.ac.kr)
  • Kookmin University

2
MGrid An Integrated and Shared Molecular
Simulation Grid/e-Science Infrastructure(Korea
e-Science Initiative)
  • Karpjoo Jeong (jeongk_at_konkuk.ac.kr)
  • Konkuk University KISTI
  • CO-PIs
  • Seunho Jung, Konkuk University
  • Suntae Hwang, Kookmin University
  • Yoonsup Lee, KAIST

3
Molecular Simulation
Computation
  • Wide Application Areas
  • Nanotechnolgy
  • Biotechnology
  • Medical Research
  • Mechanical Engineering
  • Etc.

4
Major Obstacles to Effective Molecular Simulation
  • Obstacle Enormous Computational Requirements
  • Ex, Protein simulation may take months even with
    supercomputers
  • Solution Grid Computing
  • Cost-effective large scale computing
    infrastructure by sharing and aggregation of
    computing resources
  • New obstacle Complexity
  • Grid computing is still too difficult to
    scientists
  • Obstacle Simulation Result Validation
  • Different parameter settings or simulation tools
    may result in different results even for same
    molecules
  • Solutions Comparative Study
  • Perform simulation with various parameter
    settings and tools
  • Comparative analysis of related simulation
    results
  • New Obstacle Exponential Increase in
    Computational Requirements
  • Individual scientists or institutes may not
    afford this approach

5
MGrid Approach Integrated and Shared
  • Integrated Grid Environment
  • Web-based PSE, Computing, Databases, and Analyses
  • Shared Environment
  • Sharing of Simulation Results
  • Comparative and Collective Analyses
  • Research Community
  • Promote Collaborative Simulation Efforts

6
MGrid Approach Collaborative Research Env.
7
Decide Publication from PSE
  • Publish simulation results
  • Select the destination General-purpose Semantic
    Grid or e-Glycoconjugates

8
Register Simulation Result (Insert Meta
Information) into e-Glycoconjugates
9
Search Simulation Results from e-Glycoconjugates
  • Retrieve by user ID

10
Download Simulation Jobs into PSE
11
Analysis and Visualization using Plug-in Program
12
Re-run Simulation Jobs Download Jobs into PC
13
Re-run Simulation Jobs Create New Job
14
Re-run Simulation Jobs Edit Script
15
Re-run Simulation Jobs Upload Input Files
  • Upload a related file such as 3D coordinate,
    parameter, or topology file

16
Re-run Simulation Jobs Execute
  • Click a Auto or Manual button for the job
    running

17
MGrid Software Architecture
18
  • Single System View
  • Centralized monitoring control
  • Long-running jobs

Scheduling
PSE
Legacy software support standard interface
Cluster
Computational/Data/Semantic Grids
19
MGrid Structure
PSE
PSE
Grid Portal
XML interface
Distributed Job Server
Distributed Job Server
Distributed Job Server
20
Challenge Application-specific Support on
General Grid System Structure
Grid Middleware (ex Globus)
Local Resource Manager (ex PBS)
Client PSE
Simulation Server
Application-specific request
Application-independent connection/integration/man
agement
Application-specific service
21
MGrid Approach Shared Info Infrastructure
Grid Middleware (ex Globus)
Local Resource Manager (ex PBS)
Client PSE
Simulation Server
Globus Middleware (ex Globus)
Distributed Directory Repository
Job Metadata Management (Identity, Conf, etc)
Grid Portal
22
Active File System
Web Browser
Grid Middleware (e.g. Globus)
Local Job Management System (e.g. PBS, LSF)
Portal Framework
Job Management
Simulation Server
Global ID Service
23
Web Browser
Portal Framework
MGridJob Management
Active File System
Global ID Service
RSL lt invoker XML gt
GRAM
Globus Toolkit 2
Invoker
Job Execution Framework
Infomation Provider
Event Manager
(MSM) Job Manager
Shared Repository
PBS, LSF, Condor
(WSM) Wrapper Legacy Driver
24
Web Browser
Portal Framework
MGridJob Management
Active File System
Global ID Service
RSL Version 2
JSDL
WS-GRAM
Globus Toolkit 4
Job Execution Framework
Job Factory
(MSM) Job Manager
PBS, LSF, Condor
Infomation Provider
Event Manager
Shared Repository
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
25
Legacy Simulation Package
  • Client software with nice utilities
  • E.g., GUI, visualization, molecule building, data
    management tools)
  • Simulation Program (a kind of engine)
  • A kind of script interpreter
  • Assume a working directory where script file,
    input data files and output files are stored

Analysis Visualization Tools
GUI Utilities
internal data files are program-specific
Simulation Engine
script
26
  • Integration Approach
  • Fine-grained data management. Deal with each
    file. Complicated
  • Coarse-grained data management. Directory as a
    unit. Simple

Grid Middleware
Grid Middleware
Simulation Engine
Simulation Engine
script
27
System Structure
Simulation Service Server
Global Scheduler
Legacy Simulation Package Management System
Legacy Simulation Package
Client System
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
Simulation Working Directory
Simulation Working Directory
Legacy Simulation Package Management System
Synchronization
Legacy Simulation Package
Remote Monitoring System
28
Challenging Design Issues
  • Parametric Design
  • Defined as simulation-system(x) where x CHARMM,
    GAUSSIAN, or AMBER
  • Minimize and localize the legacy software
    dependency
  • Current Design Interface for legacy software
  • Remote program invocation
  • Replicating working directories
  • Standard interface for legacy simulation SW
    management system

control message flow
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
replicated directory synchronization
legacy software-specific data
Simulation Working Directory
Simulation Working Directory
Synchronization
29
  • Working Directory Replication
  • Some data files are architecture-dependent binary
    data
  • Some data files such as log files are very big
    (e.g., a few hundred MBs or more)
  • Synchronization between remote program invocation
    and directory replication is required
  • Current Design intelligent replication mechanism
    controlled by legacy simulation package
    management system

control message flow
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
replicated directory synchronization
legacy software-specific data
Simulation Working Directory
Simulation Working Directory
Synchronization
30
  • Real time Remote Monitoring of Legacy Software
    Execution
  • Complicated by grid computing-blind legacy
    software, remote execution, and
    application-dependent monitoring data (e.g.,
    represented as data files or plotted graphs)
  • Current Design by supporting remote execution of
    traditional monitoring methods(scientists already
    have)
  • Simplified by replicating working directories
  • Performance issue local visualization vs. remote
    visualization

Legacy Simulation Package Management System
Remote Program Invocation System
PSE
simulation directory copy
Simulation Working Directory
Simulation Working Directory
Synchronization
Local Visualization
Remote Visualization
31
CHARMM-based Prototype
Client System
Simulation Service Server
CHARMM Management System in Python
CHARMM Management System in JavaCoG
GRAM-based Remote Program Invocation
PSE
CHARMM
grep gnuplot
GridFTP-based Synchronization
Simulation Data Repository
Simulation Data Repository
32
Virtual Directory-based Design
  • Logically allocated to each simulation job and
    shared by PSEs and Computing Servers
  • Physically implemented by data grids with GridFTP

metadata command
output files
33
Decoupling of Control and Data Channels
  • Control Channels. Do not deal with data inside
    files
  • Data Channels. Do not deal with computing controls

metadata command
output files
34
MGrid Data Grids
  • Distributed Simulation Result Repository
  • A Collection of Virtual Directories for
    simulation results
  • Support global access
  • Information Service
  • Maintain metadata about simulation jobs
    automatically

Information Service
Relocate
35
Synchronization Issues in Data Grids
  • Synch between Control and Data Channels
  • Synch between PSE and Data Grids
  • Synch between Computing Servers and Data Grids

metadata command
output files
36
Active File System- Development of PSE for BT
Applications on Computational GRID -
  • Suntae Hwang
  • School of Computer Science
  • Kookmin University
  • sthwang_at_kookmin.ac.kr

37
Our Approach for PSE
  • Integrate Legacy Software Utility with our PSE
  • Allow scientists to use client software that they
    already have
  • eg. Visualization, molecular structure
    build/analysis tool
  • Design and implement gluing system
  • Workflow Management System
  • Allow scientists to plan and execute experiments
    (a set of simulation tasks and human intervention
    tasks) in a workflow style
  • Designed to support single system view
  • Look as if simulation tasks were run locally
  • Allow centralized monitoring and control
  • Actual simulation execution is delegated to
    distributed simulation platform by grid
    middleware
  • Support BT application first and extend for other
    similar applications
  • Chiral Separation by Cyclocarbohydrates Konkuk
    University
  • MGrid A Molecular Simulation Grid

38
Experiment Chiral Separation Database
  • Differentiate chiral drug candidates (pair) by
    docking them with chiral selectors
  • Chiral drug candidates (guest) 1000 for now
  • Chiral selectors (host) 50 for now
  • Motivation for molecular simulation
  • So far, real experiments have been mostly used,
    but take a couple of years for a single pair of
    guest and host. Selecting a right host is very
    important. Molecular simulation takes much
    shorter time
  • By building databases about host and guest
    docking, develop a host prediction method
  • Estimated computation time
  • For a single workstation, molecular simulation
    for a single pair of guest and host takes about
    two weeks
  • Molecular simulation for 100050 pairs takes
    2,000years with a single workstation. With MGrid,
    we can shorten this time significantly.

39
PSE Design Issues
  • Workflow Management
  • Manage inter-subworkflows dependency manually
  • Using PSE client many legacy SW utilities
  • Manage inter-tasks dependency by automatic engine
  • Activate tasks by dependencies
  • forward triggering
  • By automatic management
  • Using complete data product
  • Sometime require user confirmation to trigger
  • Activate tasks on users demand for results
  • backward triggering
  • For monitoring/analyzing
  • Using intermediate data product

Task
Task
Task
Task
Sub-workflow
Task
Task
Sub-workflow
Task
Task
Task
Task
Sub-workflow
Task
40
PSE Design Issues (cont)
  • Gluing system
  • User manually handles the flows of interactive
    tasks (do not build automatic workflows for them
    because they are too complicated)
  • Need a kind of glue system to keep all activities
    among user interactions and tasks in discipline

PSE
Interactive software utilities
Job preparation tool
Workflow Management for sub-workflows
MC Docking beta-cyclodextrin N-acetyltyrosine
MD Simulation
Compute Energy Field
41
PSE Design Issues (cont)
  • Product oriented view of tasks
  • Task consists of a Product list and an associated
    Creator
  • All application data which affect inter-task
    dependencies must belong to any product list
  • All tasks (manual or automatic) which access to
    an application data are synchronized through the
    Product List

Structure Building beta-cyclodextrin
Structure Building N-acetyltyrosine
MC Docking beta-cyclodextrin N-acetyltyrosine
Application Data
MD Simulation
42
PSE on Active File System
PSE
Job preparation tool
Interactive software utilities
Workflow Management for sub-workflows
Structure Building beta-cyclodextrin
Structure Building N-acetyltyrosine
MC Docking beta-cyclodextrin N-acetyltyrosine
Creator
MD Simulation
Product List
Active File System
Import /FTP
Import /FTP
mc-doc.inp /CHARMM
md-sim.inp /CHARMM
Normal File Access
File path Information
File Access through Product List
Ordinary File System
Ordinary File System
43
Components of Active File System Creator
  • Creator
  • consist of Input file list and output file list
  • Two lists must contain all associated file names
    completely
  • All input file names must be mapped to active
    files
  • Zero or more output file names are mapped to
    active files in associated Product List
  • Special kind of creator
  • Interactive utilities which aware of Active file
    system can be a creator
  • Ex. FTP between ordinary file system and active
    file system(import),
  • Active file system enabled text
    editor(save file on Product List)
  • Input/output file lists
  • Generated automatically or manually by either
    Active File Manager or Job preparation tool
  • Map information
  • Filled automatically or manually by either Active
    File Manager or Job preparation tool
  • Resource information
  • Filled by scheduler automatically or by other
    tools manually

Creator
Inputs
Resource Information
Outputs
44
Components of Active File System Product List
  • Product List
  • contains zero or more active files
  • Must have only one associated creator
  • Can be updated dynamically by adding/removing
    active files whenever user decides to see/discard
    them

Creator
Creator
Creator
Inputs
Outputs
45
Dispatching Task
  • Working directory can be built with Map
    Information
  • File path information can be resolved with
    information of resource allocated for a task
  • Determine that remote files are staged, copied,
    or synched

Task
Task
Dispatching
Creator
Product List
Normal File Access
Creator
Inputs
Creator
Inputs
Resource Information
File path Information
Outputs
Outputs
File Access through Product List
File Staging or Copying
File Staging or Copying
Export
Ordinary File System
Ordinary File System
Working Directory
File Sync On Demand
File Sync On Demand
46
Components of Active File System Active File
  • Active File
  • Consist of an anchor file and an ordinary file
    which may be located in remote site
  • Anchor file contains status among INITIATED,
    CREATING, COMPLETE, and Never Synced, Partially
    Synced, Completely Synched,
  • All access to active file is synchronized
    Multiple Reader/Single Writer Lock at Active File
    System level
  • Reader Visualization utilities, editor(read
    only), FTP(export)
  • Writer Simulation task, editor(save),
    FTP(import)
  • Creator should be matched or compatible when
    writing in Product List
  • Otherwise, save active files in a different or
    new Product List

Creator
Inputs
Creator
Resource Information
Inputs
Creator
Outputs
Product List
Outputs
Normal File Access
Export
File path Information
Ordinary File System
Working Directory
File Access through Product List
File Sync On Demand
47
API for Active File System
  • Access primitives for Active File
  • open
  • create
  • close
  • read
  • write
  • lseek
  • unlink
  • remove
  • fcntl
  • Standard io
  • getc/putc
  • In context
  • access, chmod, chown, link, rename, symlink,
    readlink
  • stat, fstat
  • mkproductlist/rmproductlist
  • mkcreator/rmcreator

48
PSE on Active File System (again)
Legacy Interactive SW Tools

Product
beta-cyclodextrin
mc-doc.inp /CHARMM
Molecular Structure Builder (ex. Insight2)
Molecular Structure Viewer Analyzer (ex.
gOpenMol)
mc-doc.inp /CHARMM
Other Tools (ex. Text viewer)
Active File
Import
MC Docking results
Ordinary File System
Creator
Working Directory
Extract

Ordinary File System
md-sim.inp /CHARMM
md-sim.inp /CHARMM
MD Simulation results
Extract
New Tools for PSE on Active File System
Workflow Manager
Ordinary File
Head/tail filter
Built-in viewer
Filtered OUT file
Working Directory
Creator(Task) Scheduling /Monitoring
File Staging or Copying
Job Preparation Tool
energy.inp /CHARMM
Energy Field

energy.inp /CHARMM
gyration.inp /CHARMM
Gyration Field
File Sync On Demand
Working Directory
Ordinary File Read/Write
Active File System
49
Summary
50
Summary
  • MGrid is an integrated molecular simulation grid
    environment for computing, databases, and
    analyses
  • MGrid software architecture is designed to be
    extensible and composable
  • Make PSE, Distributed Batch System, Job
    Execution as independent as possible
  • Isolate application-dependent operations
  • Decoupling of data and control channels
  • Control channels are application-independent
  • Data channels are application-dependent
  • Active File System
  • Virtual Directory/File-based Design
Write a Comment
User Comments (0)
About PowerShow.com