Supercomputing 2003, UK e-Science Booth - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Supercomputing 2003, UK e-Science Booth

Description:

The efficient discovery of valuable, non-obvious information from a large collection of data ... revenue, customer contact, schedule, fuel consumption, vehicle ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 22
Provided by: terrysloan
Category:

less

Transcript and Presenter's Notes

Title: Supercomputing 2003, UK e-Science Booth


1
Welcome to the Real World Industry and the
Grid Paul Graham Edinburgh Parallel Computing
Centre United Kingdom p.graham_at_epcc.ed.ac.uk
2
Intro What we do
  • Based at the University of Edinburgh
  • 65 () staff
  • Host Europes largest academic supercomputer
    (1280 CPU IBM HPCx system)
  • Cover a broad range of activities
  • Facilities, HPC Research, Training, Visitor
    programme, European coordination, and
  • Technology Transfer
  • Project-based consultancy to industry and
    commerce
  • Over 30 clients over the last three years, local
    to multinational
  • Funded by businesses, Scottish Enterprise, EC
    and DTI

3
FirstDIG
  • Data Investigation on the Grid
  • First plc
  • UKs largest transport operator over 10,000
    vehicles in the UK
  • Specific business questions
  • How is our revenue affected by punctuality,
    routes etc?
  • Do service disruptions affect customer
    satisfaction, and to what degree?

4
Data Mining
  • to assist in answering these questions
  • The efficient discovery of valuable, non-obvious
    information from a large collection of data
  • EPCCs past experience
  • TSB, CG
  • but huge range of fragmented data sources

5
Data Sources in the Bus Industry
  • Many different kinds of data involved with
    running a bus company
  • Mileage, revenue, customer contact, schedule,
    fuel consumption, vehicle maintenance, routes
  • Many means to collect data
  • Manually entered data at depot
  • Data collected on buses from ticket machines
  • Data collected on buses from GPS systems
  • The problem - data is typically stored in
    disparate databases introduces challenges for
    Data Analysis
  • Require ability to analyse their contents in a
    uniform manner and include cross-database analysis

6
The solution
  • OGSA-DAI
  • Open Grid Services Architecture Data Access and
    Integration
  • DAIS-WG at GGF
  • Grid middleware
  • Assists with the access and integration of data
    from separate data sources via the Grid
  • Represents databases as Grid Services
  • Enables access from other machines in a secure
    manner
  • Built on GT3, extending to WS-RF, WS-I and other
    interfaces in OGSA-DAIT
  • OGSA-DAI Partners
  • Funded under UK e-Science Core program
  • Universities of Edinburgh, Manchester and
    Newcastle
  • IBM and Oracle
  • http//www.ogsadai.org.uk

7
Why OGSA-DAI?
  • Need a way to access these disparate databases
  • Grid-enable them
  • Why the Grid solution?
  • Can be accessed from anywhere within the company
    (assuming appropriate permissions)
  • Independent of the underlying database management
    systems
  • Independent of the underlying operating systems
  • User has more time to think about how to utilise
    the data rather than accessing it
  • Previously impractical analysis becomes practical

8
Issues and Solutions
  • OGSA-DAI did not support MS Access, dBASEIV
  • Initially aimed at XML databases and MySQL etc
  • However, does support JDBC-accessible databases
  • Solution Use the Microsoft provided ODBC driver
    and the Sun provided JDBC-ODBC bridge
  • Field data
  • The BIT data type (Yes/No fields), the Date
    format
  • Out of range character codes - limitation of
    XML
  • Solution OGSA-DAI team
  • Usability
  • Use of XML etc can be confusing
  • Developed a GUI front end to allow the submission
    of queries and display of results
  • Installation
  • Documentation

9
FirstDIG conclusion
  • Successfully demonstrated the use of Grid
    middleware in a real-world environment
  • OGSA-DAI valuable feedback and have taken on
    GUI project would have suffered without close
    relationship with the OGSA-DAI team
  • First have discovered valuable information from
    their data which otherwise would have been
    virtually unattainable
  • This will revolutionise the bus industry -
    Darren Unwin

10
INWA
  • Innovation Node Western Australia
  • Data mining involving commercial and public
    domain data
  • 10 partners from both academia and commerce
    (finance and telecoms)
  • Investigation of mortgage and property trends

11
Grid technology
  • Transfer-queue Over Globus (TOG v1.1) from the UK
    e-Science Sun Data and Compute Grids project
  • Provides access to remote HPC resources
  • OGSA-DAI (release 3.1)
  • To provide access control and discovery of
    distributed heterogeneous data resources
  • FirstDIG GUI data browser
  • Provides SQL access to the OGSA-DAI enabled data
    sources
  • Globus Toolkit 23
  • GridFTP, middleware

12
The INWA Grid
13
Issues
  • Getting the data
  • Traditionally a file export - OGSA-DAI is
    available - but organizations will not
    contemplate external access to operational/sensiti
    ve data
  • So back to a file export
  • UK Land registry
  • Public data source but no OGSA-DAI interface
  • Analysing data
  • Little real support for data integration over the
    Grid
  • OGSA-DQP (Distributed Query Processing) is
    limited
  • Needs Linux, uses OQL which similar to SQL but
    not as common
  • Used FirstDIG browser
  • Relevant data pulled over - joined locally
  • This works but obviously is not ideal
  • A lot of user interaction is required - 7 queries
    are necessary to join two datasets

14
Issues (cont)
  • Grid Computation Large data sets so,
  • Cleaning and mining jobs sent to where data is
    resident (UK and Australia)
  • Globus Toolkit V2.x (GT2), Grid Engine and TOG
    used
  • But
  • Installation issues with GT2
  • Not out-of-the-box, requires significant time,
    effort, expertise
  • Security issues with GT2 TOG
  • Bug in the Globus Java CoG Kit
  • Security flag omission in TOG
  • Although it all now works and is currently being
    used between UK and Australia

15
INWA Summary
  • Trust - Organizations understandably wary about
    installation of software and the access it
    provides
  • Security, Security, Security
  • Not mature enough
  • Software
  • Not robust enough
  • Bugs found in all major software used Globus,
    OGSA-DAI and TOG
  • Sys admin skills still necessary to maintain the
    grid
  • INWA conclusion
  • In this case, middleware probably not mature
    enough for commercial deployment
  • Follow-on work with China

16
PGPGrid
  • Peppers Ghost Productions Ltd (PGP) is a UK
    company that produces Computer-Generated
    Animations (CGA)
  • 3D-Matic Lab at Glasgow University offers
    optical, 3D-data transfer technology to SMEs
  • 24-camera capture rig allows the viewpoint to be
    where you want it, even if there was no camera
    there in the first-place
  • Actors play the part, their expressions are
    abstracted and applied to animated models
  • 3D-Matic code CPU, memory and data-intensive -
    inherently parallel
  • Rendering, a late stage in animation, is
    compute-intensive - inherently parallel, but
    animator control is necessary

17
Modelling
18
Using the Grid
  • Ranging and modelling code
  • System of nodes can solve a problem or pass it
    on, and also pass communications channel on
  • Lookup for available nodes is done through a
    Resource Locator Service based on web services
  • Remote Rendering
  • Animators can use as many CPUs as they can find
    for rendering when they have a contract, but they
    have frequent periods of idling.
  • There are businesses that provide farms for CGA.
  • Animators have little control as they lose
    interactivity
  • Thats costly, as flawed rendered batches need to
    be discarded.
  • The Remote Rendering Service (RRS) is a web
    service which will allow the submission and
    monitoring of rendering jobs to a remote farm

19
Issues and Solutions
  • GridFTP is earmarked for the transfer of files
  • A challenge on its own, we aim to use JavaCoG to
    avoid Globus Toolkit x
  • There is no GridFTP-server implementation for
    Windows - PGP use PCs
  • When rendering is complete, the rendering server
    cannot push the files back
  • GridFTP asymmetric, only the GridFTP client
    initiates transfers
  • Client pings server periodically and pulls the
    files instead
  • No monitoring capabilities for remote rendering
  • We will implement our own - WS
  • Work in progress

20
Conclusions
  • Weve seen three real-world Grid projects
  • There are many challenges to getting industry
    involved with Grid projects, and in making them
    successful
  • Robustness
  • Ease of use
  • Security
  • Support
  • but it is possible and potentially very
    rewarding to all involved.

21
Thankyou
  • Paul Graham, p.graham_at_epcc.ed.ac.uk
  • FirstDIG, http//www.epcc.ed.ac.uk/firstdig
  • INWA, http//www.epcc.ed.ac.uk/inwa
  • PGPGrid, http//www.epcc.ed.ac.uk/pgpgrid
Write a Comment
User Comments (0)
About PowerShow.com