Application Development (Current toolkits and approaches for grid-enabling applications) - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Application Development (Current toolkits and approaches for grid-enabling applications)

Description:

Application programmers accept the Grid as a computing paradigm only very slowly. ... Aren't we all re-inventing abstraction layers for this all the time? ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 38
Provided by: Hartmut2
Category:

less

Transcript and Presenter's Notes

Title: Application Development (Current toolkits and approaches for grid-enabling applications)


1
Application Development (Current toolkits and
approaches for grid-enabling applications)
  • Hartmut Kaiser hkaiser_at_cct.lsu.edu
  • Center of Computation and Technology
  • Louisiana State University

2
Basic Grid Model
Grid Applications
New devices Sensors Wireless
Common policies Grid Economy Global networks
Application Grid Middleware
Common Infrastructure
Global Resources
3
Why another Grid-API?
  • The situation today
  • Grids everywhere
  • Supposedly. At least many projects ?
  • Grid applications nowhere
  • Almost. At least our experience shows that this
    is difficult, GGF APPS group
  • Why is this?
  • Application programmers accept the Grid as a
    computing paradigm only very slowly.
  • Problems (multifold and often cited - amongst
    others)
  • Interfaces are NOT simple (see next slides. . .)
  • Typical Globus code... ahem... ?
  • Different and evolving interfaces to the Grid
  • Versions, new services, new implementations, WSDL
    does not solve all problems at all
  • Environment changes in many ways
  • Globus, grid members, services, network,
    applications, ...

4
Dynamic Middleware
  • Globus, Unicore, my_service, your_service, . . .
  • The same functionality has different interfaces
    all over the place.
  • But you don't want to recompile your app every
    time, not to speak of recoding...
  • WSDL does not mean end of all problems (see CoG
    code), but begin of new ones... - on application
    level, WSDL is not trivial enough
  • Restricting yourself to Globus does not help
    either version changes every couple of months
  • (2.4.x, 3.2.y, 4.a.b)
  • and gets bug fixes. Changes often are MAJOR - we
    have seen a number of them over the last couple
    of years...
  • The application that runs today will fail
    tomorrow!
  • Right now, it is basically impossible for a
    programmer to focus on the science, not on IT
    (i.e. Grid) problems.

5
Dynamic Grids
  • Services (and interfaces) get exchanged
    (upgraded) on regular basis
  • That is related to the point above, but also a
    social problem!
  • Institutions (resources, services, applications)
    join/leave YOUR grid without (much) notice.
  • The grid is designed to ease and simplify that
    kind of fluctuation - its not a bug, its a
    feature!
  • But the applications are not able to make use of
    that feature right now
  • The Grid changes AT RUNTIME services go down,
    resources get busy/free, disks and storage nodes
    are empty/full, . . . THINGS CONSTANTLY CHANGE.
  • Today Grid middleware allows to cope with that,
    but utilizing that in an intelligent way is a
    major programming effort, and blows the
    application with code that needs constancy
    maintenance...
  • Applications need LOTS of code for handling
    transient problems.
  • Most applications share most of these problems,
    but code reuse is difficult/impossible.
  • We can reuse the Globus libraries, right, but
    isn't every project re-inventing its own
    abstraction layer for these? In our
    experience/projects they do!
  • Arent we all re-inventing abstraction layers for
    this all the time?

6
Existing frameworks
  • Specialized client side libraries
  • Globus pre-WS C client bindings
  • Webservice (WS) and Webservice resource Framework
    (WSRF) interfaces
  • Globus
  • Many other Grid Services out there...

7
Existing frameworks
  • Commodity Grids (Java CoG) Kits
  • Has over the years provided successfully access
    to grid services through the Java framework.
  • Provides client and limited server side
    capabilities
  • Significantly enhances the capabilities of the
    Globus Toolkit by introducing grid workflows,
    control flows, and a task based programming model
  • More information at http//www.cogkits.org

8
Existing frameworks
  • RealityGrid
  • A UK e-Science pilot project that has used
    grid-based computational steering for a wide
    range of scientific applications
  • is defined as any mechanism that enables the
    control of the evolution of a computation, based
    upon the real time analysis and visualization of
    the status of a simulation
  • Oriented towards scientific computation, not a
    generic solution
  • More information http//www.realitygrid.org/

9
Existing frameworks
  • gLite (Lightweight Grid Interface)
  • Provides own middleware stack and related stable
    application interfaces
  • Job management
  • Data management
  • Information services
  • Deployment modules
  • Oriented towards EEGE project needs and
    middleware
  • Very similar to the GAT
  • More Information at http//www.glite.org

10
Copy a File Globus GASS
11
Copy a File CoG/RFT
12
Copy a File Wishful thinking
  • include ltGAT.hppgt
  • GATResult RemoteFileGetFile (GATContext
    context,
  • stdstring source_url, stdstring
    target_url)
  • try
  • GATFile file (context, source_url)
  • file.Copy (target_url)
  • catch (GATException const e)
  • stdcerr ltlt "Some error " ltlt e.what() ltlt
    stdendl
  • return e.Result()
  • return GAT_SUCCESS

13
Copy a File GAT/C
  • include ltGAT.hppgt
  • GATResult RemoteFileGetFile (GATContext
    context,
  • stdstring source_url, stdstring
    target_url)
  • try
  • GATFile file (context, source_url)
  • file.Copy (target_url)
  • catch (GATException const e)
  • stdcerr ltlt e.what() ltlt stdendl
  • return e.Result()
  • return GAT_SUCCESS

14
Code Statistics
15
Simplicity!
  • The key objective for application programmers
  • Remember an applications programmer is a
    physicist, chemist, linguist, medical
  • Simple APIs should
  • be easy to use
  • Simple, finite, consistent API which allows error
    tracing
  • be invariant make upgrades really, really simple
  • Well defined API which rarely changes.
  • Implementation which allows dynamic exchange of
    key elements and provides runtime abstractions.
  • avoid refactoring/recoding/recompilation
  • Same applications runs today and tomorrow here
    and there on Globus and Unicore on Globus 2.2.4
    and Globus 2.4 on Linux and on Mac local and on
    grid
  • focus on well-known programming paradigms
  • (e.g., for a file provide a file API without
    services to services to files. . .)
  • Files are best example expect open, close, read,
    write, seek. Do not introduce fancy things like
    the need to ask a service discovery service to
    tell me the location of an service which is able
    to tell me the location of my file...

16
What Applications want
  • ...and what they GAT
  • Enough of 'we want this' and 'we dont want that'
    - you got the picture, right? So, here is what
    we do
  • An API that allows to implement basic Grid use
    cases
  • Stay simple! As simple as possible! But not
    simpler!
  • Focus on applications, and scientists, rather
    than Grid nerds
  • As you and me
  • Next slides will give an overview of what we
    think is essential, and how we envision usage of
    that. Remind this is version 1 - our first shot
    - we know its not perfect, but we are converging
    to something we can work with already.

17
GAT API Scope
  • Files
  • Monitoring and Events
  • Resources, Jobs
  • Information Exchange
  • Utility classes (error handling, security,
    preferences...)
  • NOTHING ELSE

18
API Sub Systems
  • The pipe stuff in the file subsystem is
    'historical, pipe is VERY simple - we don't do
    real communication and data exchange a la MPI!
  • The actual API is somewhat larger (especially
    the resource part) 34 objects as opposed to 27
    shown here. BUT THATS IT!!!

19
Examples in GAT
  • Read remote physical file
  • Read a logical file
  • Spawn a Subtask
  • Migrate a Subtask

20
Read a remote File
  • try
  • char data25
  • GATFileStream file (context, source_url)
  • file.seek (100, SEEK_SET)
  • file.read (data, sizeof(data))
  • catch (GATException const e)
  • stdcerr ltlt "Some error " ltlt e.what() ltlt
    stdendl
  • return e.Result()
  • Well known paradigm.
  • Whatever service/lib/... implements that, the
    programmer does not know (no reference to
    Globus... ?)
  • Whatever the URL/protocol (ftp//, gsiftp//,
    http// file//) no code changes! No service
    specific parameter settings (can be drawback BUT
    SIMPLIFIES)

21
Read a logical file
  • try
  • GATLogicalFile logical_file (context,
    name)
  • stdlistltGATFilegt files
  • logical_file.get_files()
  • files.front().Copy(dest_url)
  • catch (GATException const e)
  • stdcerr ltlt "Some error " ltlt e.what()
  • ltlt stdendl
  • return e.Result()
  • SAME paradigm 'private name space
  • URL Unknown! Service unknown! complete
    abstraction gt Virtualization!
  • Still simple

22
Spawn a Subtask
  • GATTable sdt sdt.add ("location",
    "/bin/date")
  • GATTable hdt hdt.add ("machine.type",
    "i686")
  • GATSoftwareDescription sd (sdt)
  • GATHardwareResourceDescription hrd (hdt)
  • GATJobDescription jd (context, sd, hrd)
  • GATResourceBroker rb (context, prefs)
  • GATJob j rb.submit (jd)

23
Migrate a Job
  • GATTable sdt sdt.add ("location",
    "/bin/sleep")
  • sdt.add ("arguments",
    "36000")
  • GATTable hdt hdt.add ("machine.type",
    "i386")
  • GATSoftwareDescription sd (sdt)
  • GATHardwareResourceDescription hrd (hdt)
  • GATJobDescription jd (context, sd, hrd)
  • GATResourceBroker rb (context, prefs)
  • GATJob job rb.submit (jd)
  • hdt.set ("machine.name", "fs0.das2.cs.vu.nl")
  • stdlistltGATResourcegt resources
    rb.find_resources (hrd)
  • job.Migrate (resources0)
  • if (GATJobState_Running job.GetState ())

24
How does it work?
  • Architecture
  • Timing diagrams

25
Architecture
26
Architecture
  • API is a very thin layer, provides no
    capabilities by itself
  • (bind to the Grid-Environment)
  • Adaptors implement capabilities mirroring the API
  • Engine mediates between API and adaptors
  • switch adaptors at runtime (shared libraries)
  • error tracing and fallback mechanisms
  • (default local adaptor set)
  • CPI is also well defined - adaptors are
    interchangeable

27
GAT Engine Initialisation
28
Making an API Call
29
An API Call
30
State of Affairs
  • API Status
  • Implementation
  • Demo

31
API Status
  • Version 1.6
  • Is object oriented, but language neutral
  • Defines syntax and semantics of GRID access
  • Specification is open process
  • Hopefully gets input from many communities
  • Will evolve along with the findings of GGFs new
    SAGA-WG (Simple APIs for Grid Applications)

32
Implementation (Engine)
  • C Version fully implemented
  • C Wrapper to C fully implemented
  • Python wrapper to C (90 completed)
  • Java Version fully implemented
  • Even adaptor writing is possible in all of the
    languages, (mixed language programming possible
    at least for C/C and Python)
  • Fortran, Perl (wrappers to C) to follow (SWIG?)
  • Focus portability, lightness, flexibility,
    adaptivity

33
Implementation (Adaptors)
  • Full set of local adaptors implemented (contained
    in the download)
  • Couple of external adaptors implemented as well
    (GridLab services, Globus)
  • Lots of adaptors currently under development
    (DRMAA, GRAM, Curl/wget etc.)
  • http//www.gridlab.org/WorkPackages/wp-1/adaptorre
    leases.html

34
Demo
  • Copying a file
  • Locally
  • Remotely

35
Open problems
  • Memory management is very tedious in the C
    version
  • User has to do a lot by himself
  • Track all GAT object instance copies
  • Free all GAT object instances
  • Call constructor and destructor
  • Error handling is complicated (even with macros)
  • Always check for error codes, cluttered code.
  • All of these problems are solved by the C and
    Python wrappers use these if possible!
  • Current implementation of the GAT is not thread
    safe
  • Asynchronicity is (almost) completely missing

36
SAGA Simple API for Grid Applications
  • A GGF standardization effort that provides a
    simple, stable, and uniform programming interface
    that integrates the most common grid programming
    abstractions.
  • Think of it as GAT V2, incorporating the lesson
    weve learned
  • Asynchronous by design, synchonous API on top
  • Modular architecture allows
  • Later extensions
  • Smooth migration from GAT
  • Minimal library footprint for concrete needs
  • Integrates well with other standards developed by
    the GGF (GridRPC, DRMAA etc.)

37
Conclusions
  • The GAT provides a simple and stable API to
    various Grid environments
  • It is used as a prototype implementation for the
    ongoing standardization process at the SAGA WG of
    the GGF
  • Downloads
  • http//www.gridlab.org/wp-1/
  • Currently, snapshots available for GAT V1.6
  • Platforms Linux, Windows, Mac OS X, SGI Irix,
    True64 UNIX
  • Support via mailing list gat_at_gridlab.org
Write a Comment
User Comments (0)
About PowerShow.com