The Grid: Experience and Practice - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

The Grid: Experience and Practice

Description:

In 1998, Ian Foster and Carl Kesselman provided an initial definition in 'The ... Screen saver/cycle stealers: SETI_at_HOME, fold_at_home, etc... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 76
Provided by: MarkB153
Category:

less

Transcript and Presenter's Notes

Title: The Grid: Experience and Practice


1
The Grid Experience and Practice
Seminar April 14th 2004
  • Mark Baker
  • The Distributed Systems Group
  • University of Portsmouth,
  • http//dsg.port.ac.uk/mab/

2
Outline
  • Characterisation of the Grid.
  • What is not a grid!
  • Evolution of the Grid.
  • Experiences with grid middleware.
  • Comments on grid software.
  • Observations and Summary.
  • DSG Projects
  • GridRM,
  • jGMA,
  • Semantic Logging,
  • MPJ.

3
Characterisation of the Grid
  • In 1998, Ian Foster and Carl Kesselman provided
    an initial definition in The Grid Blueprint for
    a New Computing Infrastructure (see ref 1).
  • A computational grid is a hardware and software
    infrastructure that provides dependable,
    consistent, pervasive, and inexpensive access to
    high-end computational capabilities."
  • This particular definition stems from the earlier
    roots of the Grid, that of inter-connecting high
    performance facilities at various US laboratories
    and universities.

4
Characterisation of the Grid
  • Since this early definition there have been a
    number of other attempts to define what a grid
    is.
  • For example
  • A grid is a software framework providing layers
    of services to access and manage distributed
    hardware and software resources (CCA - see ref
    2).
  • widely distributed network of high-performance
    computers, stored data, instruments, and
    collaboration environments shared across
    institutional boundaries (IPG - see ref 3).

5
Characterisation of the Grid
  • In 2001, Foster, Kesselman and Tuecke refined
    their definition of a grid to
  • "co-ordinated resource sharing and problem
    solving in dynamic, multi-institutional virtual
    organizations" (see ref 4).
  • This latest definition is the one most commonly
    used to day to abstractly define a grid.

6
Characterisation of the Grid
  • Foster later produced a checklist (see ref 5)
    that could be used to help understand exactly
    what can be identified as a grid system, three
    parts
  • Co-ordinated resource sharing with no centralised
    point of control and that the users resided
    within different administrative domains.
  • If not true it is probably the case that this is
    not a grid system!
  • Standard, open, general-purpose protocols and
    interfaces
  • If not, it is unlikely that system components
    will be able to communicate or inter-operate, and
    it is likely that we are dealing with an
    application-specific system, and not the Grid.

7
Characterisation of the Grid
  • Delivering non-trivial qualities of service -
    here we are considering how the components that
    make up a grid can be used in a co-ordinated way
    to deliver combined services, which are
    appreciably greater than sum of the individual
    components.
  • These services may be associated with throughput,
    response time, meantime between failure,
    security, or many other facets.

8
Characterisation of the Grid
  • From a commercial view point, IBM define a grid
    as
  • a standards-based application/resource sharing
    architecture that makes it possible for
    heterogeneous systems and applications to share
    compute and storage resources transparently (see
    ref 6).

9
What is not a Grid!
  • A cluster, a network attached storage device, a
    desktop PC, a scientific instrument, a network
    these are not grids
  • Each might be an important component of a grid,
    but by itself, it does not constitute a grid.
  • Screen saver/cycle stealers
  • SETI_at_HOME, fold_at_home, etc,
  • Other application specific distributed computing.
  • Most of the current Grid providers
  • Proprietary technology with closed model of
    operation.
  • Globus
  • It is a toolkit to build a system that might work
    as or within a grid.
  • Sun Grid Engine, Platform LSF and related.
  • Most anything referred to as a Grid by marketeers!

10
The Evolution of the Grid The First Generation
  • The early to mid 1990s marks the emergence of the
    early metacomputing or grid environments.
  • Typically, the objective of these early
    metacomputing projects was to provide
    computational resources to a range of high
    performance applications.
  • Two representative projects in the vanguard of
    this type of technology were FAFNER (see ref 7)
    and I-WAY (see ref 8) both cica 1995.

11
Convergence of Technologies
  • Both projects attempted to provide metacomputing
    resources from opposite ends of the computing
    spectrum
  • FAFNER was Web-based for factoring the RSA
    challenge, capable of running on any workstation
    with more than 4 Mbytes of memory, and was a
    aimed at a trivially parallel application.
  • IWAY was a means of unifying the resources of
    large US supercomputing centres, and was targeted
    at high-performance applications (compute/data
    intensive).
  • Each project was in the vanguard of metacomputing
    and helped pave the way for many of the
    succeeding projects.
  • FAFNER was the forerunner of the likes of
    SETI_at_home, fold_at_home and Distributed.Net,
  • I-WAY was the same for Globus, Legion, and
    UNICORE.

12
Convergence of Technologies
  • Since the emergence of the second generation of
    systems (e.g. Globus/Legion circa 1995) there
    has been a number of classes of wide-area
    systems that have been developed
  • Grid-based, aimed at HPC compute/data
    intensive, e.g. Globus/Legion/UNICORE
  • Object-based, e.g. CORBA/CCA/Jini/Java-RMI
  • Web, e.g. Javelin, seti_at_home, Charlotte,
    fold_at_home, ParaWeb, distributed.net
  • Enterprise - bespoke systems, such as IBMs
    WebSphere, BAEs WebLogic, and Microsofts .Net
    platform.

13
Convergence of Technologies
  • The developers in these four areas, over the
    years, evolved their systems there were many
    overlaps, various collaborations started, and to
    an extent, a realisation that a unified approach
    to the development of middleware to support
    wide-area applications was arrived at.
  • Unifying standards bodies helped this process
    for example GGF,OASIS, W3C, and IETF.
  • Convergence of WS, HPC, OO, SOA, .
  • A results of this was that the Open Grid Service
    Architecture (OGSA) was announced at GGF4 in Feb
    2002, and was declared their flagship
    architecture in March 2004.
  • OGSA was based on Web Services technologies.

14
Convergence of Technologies
  • The OGSA document, first released at GGF11 in
    June 2004, gave current thinking on the required
    capabilities and was released in order to
    stimulate further discussion.
  • Note instantiations of OGSA depends on emerging
    specifications
  • Currently the OGSA document does not contain
    sufficient in formation to develop an actual
    implementation of OSGA-based system.
  • The first OGSA-based reference implementation was
    GT3 OGSI, released in July 2003.
  • Major problems were identified with OGSI, some
    where political and other were technical.

15
Convergence of Technologies
  • In Jan 2004, a significant shift happened when
    WS-RF was announced.
  • Problems were identified with OGSI
  • Re-implementation of a lot of layers which are
    already standardised in commodity WS, for example
    GSDL,
  • Felt too much in one specification,
  • Did not work well with existing tooling for WS,
  • Too OO!
  • Whereas with WS-RF
  • New mechanisms build on top of existing WS
    standards and adds a few,
  • Basically rebuilding OGSI functionality using WS
    tooling, extending where necessary,
  • Dependant on six new or emerging WS
    specifications!

16
Grid and Web ServicesConvergence!
Grid
GT1
GT2
OGSI
Started far apart
WSRF
WSDL 2, WSDM
WSDL, WS-
Web
HTTP
WSRF means that Grid and Web communities are
moving forward on a common base!
17
Emerging Grid Standards
Latest issue of IEEE Computer
18
Emerging Grid Standards
19
Experiences with the Grid
  • Background
  • First installed Globus at Portsmouth back in
    early 2000 GT1.
  • Developed monitoring system based on Globus
    MDS, and Liquid Crystal Portal,
  • Oct 2003 funded to lead the OGSA Testbed
  • Consortium of Daresbury, Manchester, Reading and
    Westminster,
  • Funded to explore, investigate and feedback our
    experiences installing, maintaining and using
    OGSI (GT3/OGSILite) and deploying our
    applications across the testbed,
  • Details at http//dsg.port.ac.uk/projects/ogsa-tes
    tbed/.

20
The OGSA Testbed Project
21
Recap - Core Globus Services
  • GridFTP - high-performance, secure, reliable data
    transfer protocol for wide-area networks.
  • GRAM (Globus Resource Allocation Manager)
    provides a standard interface for requesting and
    using remote resources for the execution of
    "jobs".
  • The most common use is remote job submission and
    control.
  • MDS (Monitoring and Discovery System) is the
    information services component and provides
    information about the available resources and
    their status.
  • GSI (Grid Security Infrastructure) for secure
    authentication and communication over an open
    network.
  • GSI provides a number of useful services for
    Grids, including mutual authentication and single
    sign-on.

22
Experiences with Globus
  • Documentation
  • 3.2 installation guide is better, lt3.0 was a
    nightmare.
  • Earlier documents had gaps which were glossed
    over and things did not happen for us as the docs
    described.
  • Size of install (GT3)
  • 251 Mbytes for 3.0.2, 320 Mbytes for 3.2.
  • Time to compile
  • 6 hours on a 1 GHz 256 Mbyte PC,
  • 2 hours on dual 2.8GHz with 2 Gbytes of RAM.
  • Setting up GT security and certificates
  • Getting e-Science certificate OK,
  • gridmap file an ACL, fairly easy, problem is
    you need to hand edit file
  • For small organisations with few users this is
    fine, but many users means more work need to
    add GridPP patch.

23
Experiences with Globus
  • Test programs
  • Yes, but they do not test whether a service is
    functioning correctly.
  • We used/developed GT3GITS scripts.
  • Bugs/features - reporting!
  • Yes, via http//bugzilla.globus.org/globus/
  • Week commencing March 20th 2005 - 25 bugs (just
    Monday), marked as new 241 ...
  • Total unresolved 438 and resolved 1950 just for
    globus.org.
  • Issue with which ones get prioritised
  • If you ask a non-standard (newbie) question on
    the mailing list we never got a useful reply,
    just lots of people saying "yep same problem
    here".
  • Application versus installation answers.

24
Experiences with Globus
  • Hardwired software, pinned to a platform!?
  • Pretty good now for Java!
  • Usually works for 32-bit platforms, 64-bit
    platforms, like IBM SP/HP, painful
  • Works on one Linux platform, but not another!
  • Strange how it did not work on some
    distributions, and worked better on Debian than
    some versions of Redhat.
  • No where near as good as most portable projects
    (e.g. Apache) which builds on everything,
    correctly.
  • Implications of frequent updates and reinstalls
  • Often a complete rebuild/reinstall,
  • Software not backwardly compatible,
  • No direct path or bridge from GT 2.x ? 3.2 ? 4.0.

25
Experiences with Globus
  • Opening ports many open!
  • Globus container 8080,
  • Gatekeeper 2119,
  • Grid ftp - 2811 a range of TCP ports (roughly
    256 ports as recommended by UK grid-support
    centre).
  • Makes systems people VERY unhappy!
  • Apache Tomcat as service container
  • GT comes with Tomcat this was a just
    development environment,
  • Needed to deploy GT in Tomcat container
  • Memory problems, GC having problems, eventually
    failed,
  • Sorted out issues and now been running
    continuously for 5 months.
  • Needed to figure out ways of working with Tomcat
    and GT.

26
Other Experiences
  • Portsmouth firewall committee decided to stop
    access to FTP on University systems and also
    updated the firewall system
  • GridFTP stopped working!
  • Took two weeks to convince systems people that
    the GridFTP was not working and was secure!
  • OGSA-DAI middleware based on OGSI for accessing
    distributed databases.
  • Natural to run examples first to test all was
    well, so we did
  • One did not run, had feature that we were
    informed will be fixed in the next release! One
    contained bugs,
  • Confused roadmap now! Trying to support to many
    grid platforms.
  • January 2004 refactoring exercise
  • OGSI to WS-RF,
  • No consultation, a slight hiccup!

27
Summary of Experiences
  • Globus is ambitious effort to produce middleware
    that satisfies the needs of wide-area distributed
    applications.
  • Good for people who are familiar with GT - like
    us now, but, its total disaster for a newbie
  • Expect application scientist to have tech
    knowhow!
  • Very steep learning curve.
  • Globus is a worthy effort, but it is still
    research software, with all the implications of
    such.
  • Many projects are staying with GT2.4, as this
    provides a more stable platform.
  • No new services developed over the last few
    years.
  • DataGrid/EGEE are having a significant affect on
    future grid middleware offerings

28
Other Observations
  • In UK, went to GT2, to early (probably
    12months), GT3 deprecated, now awaiting GT4
    early 2005.
  • OGSI ? WS-RF, done for the right reasons, but
    announcement confounded the community and
    frustrated many developers.
  • GT is not production quality software yet, so
    expect the associated problems.
  • GSI is success, being used widely by the
    community.
  • Need alternative OGSA instantiations, emerging
    systems such as WSRFLite and UNICORE will help
    this diversity.
  • Need hardened and usable software, otherwise
    the Grid will encounter its own AI Winter.
  • UK OMII addressing this area.

29
Other Observations
  • Need money to develop robust middleware
    infrastructure, not just money to do further
    research in future infrastructure and
    applications
  • Question Is a research council (e.g. UK EPSRC)
    the right place to allocate these funds from!?
  • Currently much confusion as to which standard
    to follow
  • WS-RF GT4 - 1st 2nd Quarter 2005,
  • WS-GAF,
  • WS/WS-I/WS-I.
  • Many developers in the UK are using just Web
    Services mainly SOAP WSDL.
  • UDDI does not satisfy the needs for a grid
    information service.

30
DSG Projects
  • The development of a selection grid and cluster
    middleware.

31
DSG Projects
  • GridRM a unifying resource monitoring system,
    capable of be used a number of diverse purposes
    including scheduling, performance, faults, and
    policing QoS or SLA.
  • jGMA a event-based messaging system with an
    integrated P2P-based registry.
  • Semantic Logging RDF-based system unifying and
    annotating log data for a more complete
    analysis of distributed systems.
  • MPJ Java MPI-based message-passing systems and
    runtime infrastructure.
  • Others Portals, OGSA-DAI, investigation of
    NaradaBroker can mention further if interested!

32
GridRM
  • A data gathering framework for monitoring and
    managing the Grid
  • http//gridrm.org/

33
Background
  • Lack of knowledge about the status of the
    resources in any distributed system will hamper
    strategies for optimal scheduling, allocation and
    usage.
  • There is a need for a ubiquitous framework that
    provides information about the health and status
    of Grid resources
  • Gathering resource information, such as
  • Compute (nodes, CPU, memory),
  • Network (inter-site communications links, network
    devices),
  • Sensors (specialised devices, Web cam,
    microphone),
  • Software services (information services,
    schedulers).
  • Need a generic system that does not need another
    local agent, but can utilise whatever exists
  • SNMP, Network Weather Service, NetLogger,
    Ganglia, /proc, MDS, or other services

34
GridRM Structure
  • Global layer of peer-related gateways
  • Which in turn have a local layer that interacts
    with the local data sources, and/or a hierarchy
    child gateways.

35
GridRM Architecture
36
GridRM Local Layer
37
GridRM Layered View
38
GridRM Query API
  • Producing an API is fairly simple, but creating
    one that will be taken up and accepted is another
    matter.
  • We are using an API based on JDBC from Java.
  • Example of API
  • Agent Driver Interface
  • forName(GridRM.sql.agent.NWSDriver)
  • forName(GridRM.sql.agent.SNMPv1Driver)
  • Connection Interface
  • String agentURL GridRMNWS/barney5550/PerfDat
    a
  • Connection con DriverManager.getConnection(agent
    URL)
  • Statement Interface
  • Statement stmt con.createStatement()
  • ResultSet rs stmt.executeQuery(get CPU table)
  • Manipulating Results
  • ResultSet is another interface contains a handful
    of methods for manipulating the data returned
    from the agent.

39
GridRM Naming Schema
  • No single naming schema for this area at the
    moment.
  • We needed something that can markup the
    information that can be gathered by the local
    agents
  • Static and dynamic information
  • Name/IP/OS/Processor/NIC/
  • CPU load/memory available/disk space/network/
  • Did not want to produce our own schema, so choose
    an emerging one that is increasingly being used -
    Grid Laboratory Uniform Environment (GLUE)
    schema
  • A schema that defines the attributes of computer
    system resources (CE/NE/)
  • Others, CIM, UNICORE, etc..

40
GridRM Drivers and Manager
  • The GridRM Driver Manager gets data from the
    Agent API and translates it into something that
    the local agents can understand
  • The Driver Manager also provide other
    functionality that is particular to GridRM such
    as configuration, caching, streaming or
    pushing/pulling data to/from clients.
  • The driver manager includes a simple low-level
    API to interact with the local agents based on
    a common sub-set of information that can be
    retrieved from all the agents.

41
Local Layer Use of SQL
  • SQL used extensively throughout the framework.
  • All resources are seen as databases and queried
    using SQL.
  • Resource queries enter the framework as SQL
    syntax.
  • Pluggable resource drivers are implemented as
    JDBC drivers
  • Translate SQL requests into native protocol.
  • Normalise results according to selected schema.
  • Framework benefits from a single, flexible
    approach to resource interaction.
  • Makes for a simple, extensible framework.

42
GridRM GUI
Homogeneous view of the data sources
43
GridRM Portal
  • The GridRM Portal (gridrm.org) is a demonstration
    of gateways, data sources, SQL and data
    normalisation.
  • An example of the use of GridRM, particularly its
    ability to discover and utilise resource data.
  • An example of a GridRM client which
  • Allows the use of GridRM with no knowledge of the
    underlying technologies.
  • Hides details like SQL, XML, etc
  • Provides an abstraction everyone can use
    clickerity click!).

44
GridRM Portal
45
International Testbed
46
GridRM
47
GridRM
48
Summary
  • Heterogeneous information returned from a diverse
    range of possible data sources.
  • Need to harvest data into a homogeneous form
  • Hide underlying complexity from clients.
  • Provide data in a format that meets a clients
    requirements.
  • Combine legacy resources with modern cluster and
    Grid information servers to provide
  • An over-arching grid information system.
  • Independent of particular middleware and
    services.
  • GridRM promotes homogeneity through
  • JDBC-like data source driver,
  • Standard SQL syntax,
  • The GLUE naming schemas,
  • Request translation and result normalisation,

49
Future Work
  • Provide an example of a job submission system
    using GridRM, several options
  • Other schedulers, Condor, SGE,
  • Further security
  • Integrate UK e-Science certificates for resource
    access control.
  • Secure interface for remote Gateway
    administration.
  • Performance and scalability testing.
  • More translation schema for different resource
  • DBMS, telescope, surf conditions!
  • Use of portlet technologies to provide a better
    Web interface - GridSphere

50
jGMA
  • A event-based messaging system
  • http//dsg.port.ac.uk/projects/jGMA/

51
jGMA
  • Needed a lightweight implementation of the GGF
    Grid Monitoring Architecture in Java for GridRM.
  • There are others
  • R-GMA,
  • pyGMA,
  • Autopilot, MDS, NWS, CODE
  • Found that existing systems were heavyweight,
    complex or not standalone.
  • Decided to produce our own version
  • Aims
  • GMA compliant,
  • Easy to install and use,
  • Easy to program and extend,
  • Java-based.

52
jGMA Architecture
  • GMA Compliance
  • 21 features,
  • GGF document is only a guide,
  • It is very easy to claim to be compliant,
  • For now jGMA is GMA like.

53
jGMA Infrastructure
54
jGMA Demo
55
jGMA Demo
56
jGMA Status and Future Work
  • jGMA messaging API complete.
  • Currently completing the virtual registry
  • Text file/mySQL interfaces complete,
  • Implementing the P2P part.
  • Testing implementation in June versus
    NaradaBroker and R-GMA.
  • jGMA v1 can be download from - http//dsg.port.ac.
    uk/projects/jGMA/
  • A couple of demos linked to the web page.
  • Applying jMGA to GridRM, myGrid applications, and
    eventually as on-line gaming infrastructure.

57
jGMA
58
Semantic Logging
  • Semantic Logging using RDF
  • http//dsg.port.ac.uk/projects/UISB/

59
UISB
  • Had a desire to investigate Semantic Web
    technologies for the purposes of unifying
    Information Services LDAP/MDS/LUS/UDDI used
    RDF as in information store.
  • We can harvest and annotate IS data and store in
    in a centralised RDF store.
  • Initially funded by an IBM Innovation award
    used the Eclipse platform.
  • Have developed all the components needed and
    discovered a number of hurdles and issues.

60
UISB
61
UISB
62
Semantic Logging
  • UISB project has diverged.
  • There was a keen interest to investigate the idea
    that UISB components could be used to unify logs
    events from various sources and provide a better
    overall source of information for analysis of the
    behaviour of distributed systems and
    applications.
  • The idea is we harvest log data (events) from
    the OS, executing middleware and applications
    from a range of systems, store in our RDF-based
    repository, and then visualise ALL the various
    events in the logs in order to better understand
    the systems overall behaviour.

63
Semantic Logging RDF view!
64
Semantic Logging A View of Events
65
MPJ
  • A Java-based message passing system
  • http//dsg.port.ac.uk/projects/MPJ/

66
Introduction
  • A lot of interest in a Java messaging system
  • Wanted to produce a reference pure Java messaging
    system that follows the MPJ API specification
  • Create an MPJ implementation which is the
    corollary of MPICH.
  • What a Java messaging system has to offer?
  • Portability
  • Write once run anywhere.
  • Object oriented programming concepts
  • Higher level of abstraction for parallel
    programming,
  • A extensive set of API libraries
  • Avoids reinventing the wheel.
  • Multi-threaded language
  • Thread-safe.
  • Automatic memory management.
  • Popularity first language and therefore good
    for teaching MP as well.

67
MPJ
68
The New Design
69
(No Transcript)
70
MPJ Status
  • MPJ API complete and being tested,
  • MPJ runtime infrastructure being developed want
    something that works the same on UNIX/Linux and
    Windows!
  • Installation being looked at!
  • Further devices (SHMEM).
  • Release of beta version in June.
  • Applications being ported.

71
Other Projects
  • Grid integrations tests.
  • Optimisation of complex distributed queries using
    OGSA-DAI based on GIS/SDSS data.
  • Investigation of NaradaBroker leading to the
    development of P2P file store.
  • Portal Work for JISC development of a range of
    JSR-168 compliant services.

72
Summary
  • The DSG is involved in a range of projects that
    are developing middleware for clusters and the
    Grid.
  • Attempting to use generally accepted and widely
    used standards may of those being purported
    today are ephemeral!
  • Want to create relatively simple and easy to use
    software not trying to reinvent the wheel,
    which seems a common happening.

73
Shameless Plug
http//www.amazon.co.uk/exec/obidos/ASIN/047009417
6/qid3D1113207878/202-7878523-7639008
74
The End!
  • Any Questions?

75
References
  • Ian Foster and Carl Kesselman (Editors), The
    Grid Blueprint for a New Computing
    Infrastructure, published by Morgan Kaufmann
    Publishers 1st edition (November 1, 1998), ISBN
    1558604758
  • CCA, http//www.extreme.indiana.edu/ccat/glossary.
    html
  • IPG, http//www.ipg.nasa.gov/ipgflat/aboutipg/glos
    sary.html
  • I. Foster, C. Kesselman, and S. Tuecke, The
    Anatomy of the Grid Enabling Scalable Virtual
    Organizations, International J. Supercomputer
    Applications, 15(3), 2001.

76
References
  • Checklist, http//www.gridtoday.com/02/0722/100136
    .html
  • IBM Grid Computing, http//www-1.ibm.com/grid/grid
    _literature.shtml
  • FAFNER, http//www.npac.syr.edu/factoring.html
  • I. Foster, J. Geisler, W. Nickless, W. Smith, S.
    Tuecke Software Infrastructure for the I-WAY
    High Performance Distributed Computing
    Experiment in Proc. 5th IEEE Symposium on High
    Performance Distributed Computing. pp. 562-571,
    1997.
  • LCG, http//lcg.web.cern.ch/LCG/
  • WS-GAF, http//www.neresc.ac.uk/ws-gaf
  • WS-I, http//www.ws-i.org
  • WS-RF, http//www.globus.org/wsrf
Write a Comment
User Comments (0)
About PowerShow.com