Session 2 Overview of eScience and Distributed Systems - PowerPoint PPT Presentation

1 / 93
About This Presentation
Title:

Session 2 Overview of eScience and Distributed Systems

Description:

Session 2 Overview of eScience and Distributed Systems – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 94
Provided by: jone1
Category:

less

Transcript and Presenter's Notes

Title: Session 2 Overview of eScience and Distributed Systems


1
Session 2 Overview of e-Science and
Distributed Systems
7 July 2008
  • Malcolm Atkinson

2
Overview
  • e-Science Computational thinking
  • A turning point in the history of science
  • Modern challenges
  • Combined approaches
  • Grids in context
  • What can a grid do?
  • What cant it do?
  • Principles
  • Scenarios

3
New modes in Research, Thought and Collaboration
4
Vision
  • We are undergoing a transition in
  • the power of affordable computing
  • the wealth of accessible data and
  • the capacity of digital communication
  • e-Science provides leadership in
    interdisciplinary collaboration
  • By combining these we will provide unprecedented
    ability to address pressing research challenges

5
Definition of e-Science
Computing has become a fundamental tool in all
research disciplines, which often proceed by
assembling and managing large data collections
and exploiting computer models and simulations
(a topic called e-Science) Phil Wadler 2008
e-Science is the invention and application of
computer-enabled methods to achieve new, better,
faster or more efficient research in any
discipline. It draws on advances in mathematical
sciences, informatics, computation and digital
communications. As such it has been an important
tool for researchers for many decades. The data
deluge and the scale and complexity of todays
research challenges have greatly increased its
importance for researchers. As a consequence, in
2001 the UK led the world by initiating a
coordinated e-Science research programme to
stimulate the development of e-Science across all
fields of research.
6
Strengths of e-Science
Communities and e-Infrastructure supporting
research and innovation
7
Computational thinking
  • Transforming the way we think
  • Incremental refinement
  • Solution by composition
  • Layers of abstractions
  • Process models
  • Notations
  • Recursive thinking
  • Simulation, Randomisation
  • Enabled by ubiquitous computers
  • Analogue of the printing press

Jeanette Wing, Computational Thinking,
Communications of the ACM, March 2006, Vol 49,
No. 3, p33-35
8
WWW acting
  • The Long Tail
  • Data is the Next Intel Inside
  • Users Add Value
  • Network Effects by Default
  • Some Rights Reserved
  • The Perpetual Beta
  • Cooperate, Don't Control
  • Software Above the Level of a Single Device
  • Transforming the way we act
  • Data is key ingredient
  • Community action
  • Global collaboration
  • Community thinking
  • Minimal (?) control
  • Minimal reserved rights
  • Composition via wikis
  • Mash ups
  • Enabled by ubiquitous digital communication
  • Analogue of the radio

http//www.oreillynet.com/pub/a/oreilly/tim/news/2
005/09/30/what-is-web-20.html
9
Is e-Science making a difference?
10
Tremendous global challenges
11
Scale, Urgency, Complexity,
12
Achieving the CI Vision requires synergy between
3 types of Foundation wide activities
Transformative Application - to enhance discovery
learning
Provisioning -Creation, deployment and operation
of advanced CI
RD to enhance technical and social dimensions of
future CI systems
Cyberinfrastructure Vision for 21st Century
Discovery, NSF Cyberinfrastructure Council, March
2007
13
The Information Explosion
988EB (2010)
1ZB
?????
161EB (2006 by IDC)
??????
???? ????????
GRID/????
ITS
Slide Satoshi Matsuoka
14
The 21st Century
This is the century of information
Prime Minister Gordon Brown, University of
Westminster, 25 October 2007
Thanks for images to Mark Birkin (MoSeS
Genysis projects) and Michael Batty (GeoVue
project)
15
Historical perspective
16
Timeline
Foundations for Collaborative Behaviour
Today
Wellbeing the global-scale killer app., Sir
Robin Saxby Oct. 2006
17
Healthcare _at_ Home
REFERRAL
REFERRAL
GPHome-mobile-clinic via PDA-laptop-PC-Paper
DiabeticianHome-mobile-clinic via
PDA-laptop-PC-Paper
Various Clinical Specialists (Distributed) e.g.
Ophthalmologist, Podiatrist, Vascular Surgeons,
Renal Specialists, Wound clinic, Foot care
clinic, Neurologists, Cardiologists
ILLNESS
REFERRAL
VARIABLESACCESSMATRIX
CASE
PatientHome-mobile-clinic via TV-PDA-laptop-PC-Pa
per
Diabetes Specialist / Other Specialist
Nurses Home-mobile-clinic via TV-PDA-laptop-PC-Pap
er
Dietitian
Biochemist
Community Nurses / Health Visitors
Slide from Alex Hardisty
18
Distributed Systems History
ARPA net
1960
1970
1980
1990
2000
19
Distributed Systems to Grids
1960
1970
1980
1990
2000
20
e-Infrastructure
  • A shared resource
  • That enables science, research, engineering,
    medicine, industry,
  • It will improve UK/European/ productivity
  • Lisbon Accord 2000
  • e-Science Vision SR2000 John Taylor
  • Commitment by UK government
  • Sections 2.23-2.25
  • Always there multi-purpose
  • c.f. telephones, transport, power
  • OSI report

www.nesc.ac.uk/documents/OSI/index.html
21
A Grid Computing Timeline
US Grid Forum forms at SC 98
Grid Forums merge, form GGF
European AP Grid Forums
I-Way SuperComputing 95
OGSA-WG formed
Physiology paper
Anatomy paper
GGF EGAform OGF
OGSA v1.0
Source Hiro Kishimoto GGF17 Keynote May 2006
22
What is a Grid?
  • A grid is a system consisting of
  • Distributed but connected resources and
  • Software and/or hardware that provides and
    manages logically seamless access to those
    resources to meet desired objectives

Handheld
Supercomputer
Server
Data Center
Cluster
Workstation
Source Hiro Kishimoto GGF17 Keynote May 2006
23
Grid Related Paradigms
  • Cluster
  • Tightly coupled
  • Homogeneous
  • Cooperative working
  • Distributed Computing
  • Loosely coupled
  • Heterogeneous
  • Single Administration
  • Grid Computing
  • Large scale
  • Cross-organizational
  • Geographical distribution
  • Distributed Management

Source Hiro Kishimoto GGF17 Keynote May 2006
24
Views of Grids
25
Grids integrating providing homogeneity
  • Grids are (potentially) Generic Industry
    Supported
  • Grids combine many heterogeneous distributed
    resources
  • Data Information
  • Computation software
  • Instruments, sensors actuators
  • Research processes procedures
  • System operations processes procedures
  • Grids restrict choices
  • Harder for provider to make localised decisions
  • Deployment can be challenging
  • Grids provide virtual homogeneity through
    virtualisation
  • Should be easier to compose services
  • More opportunity to amortise costs
  • A component of e-Infrastructure

Deliberately choosing consistent interfaces,
protocols management controls across a set of
compatible services. Giving up some freedom to
differ.
26
Grids as a Foundation for Solutions
  • The grid per se doesnt provide
  • Supported e-Science methods
  • Supported data information resources
  • Computations
  • Convenient access
  • Collaborative behaviour
  • Grids help organisations provide
  • International national secure e-Infrastructure
  • Standards for interoperation
  • Standard APIs to promote re-use
  • But Research Support must be built
  • What is needed?
  • Who should build it?

27
Grids as a Foundation for Solutions
Much to be done by developers of applications
services and by resource providers
  • The grid per se doesnt provide
  • Supported e-Science methods
  • Supported data information resources
  • Computations
  • Convenient access
  • Collaborative behaviour
  • Grids help organisations provide
  • International national secure e-Infrastructure
  • Standards for interoperation
  • Standard APIs to promote re-use
  • But Research Support must be built
  • What is needed?
  • Who should build it?

28
Grids as a Foundation for Solutions
Much to be done by developers of applications
services and by resource providers
  • Must support many categories of user
  • Application Service developers
  • Tool builders
  • Deployers Operations teams
  • Gateway developers
  • App, tool gateway users
  • The grid per se doesnt provide
  • Supported e-Science methods
  • Supported data information resources
  • Computations
  • Convenient access
  • Grids help providers of these
  • International national secure e-Infrastructure
  • Standards for interoperation
  • Standard APIs to promote re-use
  • But Research Support must be built
  • What is needed?
  • Who should do it?

29
Motives for Grids
30
Why use / build Grids?
  • Research Arguments
  • Enables new ways of working
  • New distributed collaborative research
  • Unprecedented scale and resources
  • Economic Ecological Arguments
  • Reduced system management costs
  • Shared resources ? better utilisation
  • Pooled resources ? increased capacity
  • Greener / less power consumption ?
    environmentally acceptable computing
  • Load sharing utility computing
  • Cheaper disaster recovery

31
Why use / build Grids?
  • Computer Science Arguments
  • New attempt at an old hard problem
  • Frustrating ignorance about existing results
  • New scale, new dynamics, new scope
  • Engineering Arguments
  • Enable autonomous organisations to
  • Write complementary software components
  • Set up run use complementary services
  • Share operational responsibility
  • General consistent environment forAbstraction,
    Automation, Optimisation Tools
  • Generally available code mobility

32
Why use / build Grids?
  • Political Management Arguments
  • Stimulate innovation
  • Promote intra-organisation collaboration
  • Promote inter-enterprise collaboration

33
Collaboration is key
34
Biomedical Research Informatics Delivered by Grid
Enabled Services
Portal
http//www.brc.dcs.gla.ac.uk/projects/bridges/
Slide by Richard Sinnott
35
eDiaMoND Screening for Breast Cancer
1 Trust ? Many Trusts Collaborative Working Audit
capability Epidemiology
  • Other Modalities
  • MRI
  • PET
  • Ultrasound

Better access to Case information And digital
tools
Supplement Mentoring With access to
digital Training cases and sharing Of information
across clinics
Provided by eDiamond project Prof. Sir Mike
Brady et al.
36
climateprediction.net and GENIE
  • Largest climate model ensemble
  • gt45,000 users, gt1,000,000 model years

Response of Atlantic circulation to freshwater
forcing
10K
2K
37
Integrative Biology
  • Tackling two Grand Challenge research questions
  • What causes heart disease?
  • How does a cancer form and grow?
  • Together these diseases cause 61 of all UK
    deaths

Will build a powerful, fault-tolerant Grid
infrastructure for biomedical science Enabling
biomedical researchers to use distributed
resources such as high-performance computers,
databases and visualisation tools to develop
complex models of how these killer diseases
develop.
Slide David Gavaghan IB team, Oxford
38
Foundations of Collaboration
  • Strong commitment by individuals
  • To work together
  • To take on communication challenges
  • Mutual respect mutual trust
  • Strong leadership
  • Distributed technology
  • To support information interchange
  • To support resource sharing
  • To support data integration
  • To support trust building
  • Sufficient time
  • Common goals
  • Complementary knowledge, skills data

Can we predictwhen it will work? Can we
findremedies when itdoesnt?
39
Grid Collaboration Questions
  • Without collaboration little is achievable
  • Must collaboration precede successful grid
    applications?
  • Or will persistently and pervasively available
    grids stimulate collaborations?
  • If we deliver support for collaborative teams,
    will we also support the individual researcher?
  • Can we use grids to democratise computation?
  • Broadening access
  • Open science

40
CARMEN - Scales of Integration
Understanding the brain may be the greatest
informatics challenge of the 21st century
See talk Paul Watson at Google Scalability
conf., Seattle, June 2008www.youtube.com/watch?v
2m4EvnlgL8Q
Slide from Colin Ingram Paul Watson
41
CARMEN Consortium
Leadership e-Infrastructure
Colin Ingram
Paul Watson
Leslie Smith
Jim Austin
Slide from Colin Ingram Paul Watson
42
CARMEN Consortium
International Partners
Slide from Colin Ingram Paul Watson
43
CARMEN Consortium
Commercial Partners
- applications in the pharmaceutical sector
- interfacing of data acquisition software
- application of infrastructure
- commercialisation of tools
Slide from Colin Ingram Paul Watson
44
Summary
45
Grids in context
  • Technology is transforming research
  • Computer power, network speed, data bonanza,
    pervasive devices
  • Social and commercial impact of web-based
    computing
  • Part of a long-term drive for distributed
    computing
  • A new and ambitious form
  • Search for trade-offs multiple uses
  • Leads to many varieties
  • Multiple stake holders
  • Many good reasons for building using grids
  • Questions
  • Will we have many grids?
  • A consistent general purpose foundation grid?
  • What are the minimum standards across the grids
  • Collaboration is a key driver enabler

46
Minimum Grid Functionalities
  • Supports distributed computation
  • Data and computation
  • Over a variety of
  • hardware components (servers, data stores, )
  • Software components (services resource managers,
    computation and data services)
  • With regularity that can be exploited
  • By applications
  • By other middleware tools
  • By providers and operations
  • It will normally have security mechanisms
  • To develop and sustain trust regimes

Users want uniform and consistent access to
computing and data desk top, cloud, cluster,
institutional and regional grids, national and
international facilities
47
Distributed Systems Introduction, Principles
Foundations
48
Principles of Distributed Computing
  • Issues you cant avoid
  • Lack of Complete Knowledge (LoCK)
  • Latency
  • Heterogeneity
  • Autonomy
  • Unreliability
  • Change
  • A Challenging goal
  • balance technical feasibility
  • against virtual homogeneity, stability and
    reliability
  • Balance between usability and productivity
  • Affordable
  • Wide user base to amortise costs
  • Manageable and maintainable

This is NOT easy
49
Lack of Complete Knowledge
  • Technical origins of LoCK
  • Dynamics of systems involve very large state
    spaces
  • Cant track or explore all the states
  • Latency prevents up-to-date knowledge being
    available
  • By the time a notification of a state change
    arrives the state may have changed again
  • Failures inhibit information propagation
  • Unanticipated failure modes
  • If you ask a remote system
  • By the time the answer arrives it may be wrong

Never assume you know the state of a remote
system
50
Lack of Complete Knowledge 2
  • Human origins of LoCK
  • lack of understanding
  • Incomplete simplified models
  • Intractable models
  • Poor incomplete descriptions
  • Erroneous descriptions
  • Socio-Economic effects generate LoCK
  • Autonomous owners do not reveal all
  • About services, resources and performance
  • Intermediaries aggregate simplify
  • Present services they want to sell or you to use
    favourably

51
LoCK Counter Strategies
  • Improve the quality of the available knowledge
  • Better static information
  • Better information collection dissemination
  • Improve quality of Distributed System Models
  • Prove invariants that algorithms can exploit
  • Test axioms with real systems
  • Build algorithms that behave reasonably well
  • When they have incomplete knowledge

52
Latency
  • It is always going to be there
  • Consequence of signal transmission times
  • Consequence of messages / packets in queues
  • Consequence of message processing time
  • Errors cause retries ? multiplied delays
  • It gets worse
  • Geographic scale increases latency
  • System complexity increases number of queues
  • Scale complexity increase processing time
  • Think about
  • How many operations a system can do while a
    message it sent reaches its destination, a reply
    is formed and the reply travels back

53
Latency Counter Strategies
  • Design algorithms that require fewer round trips
  • This is THE complexity measure!
  • Batch requests and responses
  • Shorten distance to get information
  • Caching, pre-fetching replication
  • But may be stale data!
  • Move data to computation
  • But be smart about which data when
  • Move computation to data
  • Succinct computation volumes of data
  • But safety and privacy issues arise

Communication is very expensive
54
Heterogeneity
Some of the variation is wanted and exploited
  • Hardware variation
  • Different computer architectures
  • Big endians v little endians
  • Number representation
  • Address length
  • Performance
  • Different Storage systems
  • Architectures
  • Technologies
  • Available operations
  • Different Instrument systems
  • Accepting different control inputs
  • Generating different output data streams

55
Heterogeneity 2
Some of the variation is unnecessary
  • Operating System variation
  • Different O/S architectures
  • Unix families versions
  • Windows families and versions
  • Specialised O/S, e.g. for Instruments Mobile
    devices
  • Implementation system variation
  • Programming languages
  • Scripting languages
  • Workflow systems
  • Data models
  • Description languages
  • Grid systems
  • Many implementations of same functionality

56
Heterogeneity Counter Measures
  • Invest in virtual Homogeneity
  • Agree standards (formally or de facto)
  • Introduce intermediate code
  • That hides unwanted variation
  • Presenting it in standard form
  • But this has high cost
  • Developing the standard
  • Developing the intermediate code
  • Executing the intermediate code
  • It may hide variations some want
  • Provide direct access to facilities as well
  • But this may inhibit optimisation automation

57
Heterogeneity Counter Measures 2
  • Automatically manage diversity
  • Manual agreement and construction of virtual
    homogeneity will not scale compose
  • Develop abstract and higher level models
  • Describe each component
  • Generate adaptations as needed from descriptions
  • Not yet achievable for general complete systems
  • Relevant for specific domains

58
Autonomy and Change
  • Necessary
  • To persuade organisations individuals to engage
  • They need to control their own facilities
  • They have best knowledge to develop their
    services
  • Their business opportunity
  • Because coordinated change is unachievable
  • Systems workloads are busy
  • Service commitments must be met
  • Large-scale scheduling of work is very hard
  • To correct errors
  • To plug vulnerabilities
  • To obtain new capabilities

59
Autonomy and Change 2
  • What changes local decisions
  • The underlying technology delivering a service
  • The operations available from a service
  • The semantics of the operations
  • Policy changes, e.g. authorisation rules, costs,
  • What changes corporate decisions
  • Some agreed standard is changed
  • E.g. a new version of a protocol is introduced

60
Autonomy and change Counter Measures
  • Users other providers expect stability
  • Agree some standards that are rarely changed
  • As a platform framework
  • As a means of communicating change
  • Introduce change-absorbing technology
  • Mark the protocols and services with version
    information
  • Transform between protocols when changes occur
  • Anneal the change out of the system
  • Develop algorithms tolerant to change
  • Revalidate dependencies where they may change
  • Handle failures due to change

Change is an asset Embrace and Manage it Ignore
it atyour peril
61
Unreliability
  • Failures are inevitable
  • Equipment, software operations errors
  • Network outages, Power outages,
  • Their effects must be localised
  • Cannot afford total system outages
  • This is not easy
  • Each error may occur when system is in any state
  • The system is an unknown composition of
    subsystems
  • Errors often occur while other errors are still
    active
  • Errors often occur during error recovery actions
  • Errors may be caused by deliberate attack
  • Attackers may continue their attack

62
Unreliability Counter Measures
  • Requires much RD
  • Continuous arms race as scale of Grids grow
  • Ideal of a continuously available stable service
  • Not achievable recognise that drops in response
    and local failures must be dealt with
  • Design resilient architectures
  • Design resilient algorithms
  • Improve reliability of each component
  • Distribute the responsibility
  • For failure detection
  • For recovery action

Invest heavily in error detection and recovery
63
Service Oriented Architectures
64
Three Components
Registries
Register an available service Send name
description
Service Consumers
Services
65
Three Components
Registries
Request a service Send a description
Service Consumers
Services
66
Three Components
Registries
Set (possibly empty)of matching services
Service Consumers
Services
67
Three Components
Registries
Service Consumers
Request service operation
Services
68
Three Components
Registries
Service Consumers
Services
Return result or Error
69
Composed behaviour
  • Services are themselves consumers
  • They may compose and wrap other services
  • The registry is itself a consumer
  • A federation of registries may deal with registry
    services reliability performance
  • Observer services may report on quality of
    services and help with diagnostics
  • Agreements between services may be set up
  • Service-Level Agreements
  • Permitting sustained interaction

70
Composed behaviour
  • Services are themselves consumers
  • They may compose and wrap other services
  • The registry is itself a consumer
  • A federation of registries may deal with registry
    services reliability performance
  • Observer services may report on quality of
    services and help with diagnostics
  • Agreements between services may be set up
  • Service-Level Agreements
  • Permitting sustained interaction

Requires Organising as an Architecture
71
Scenarios
72
Why Scenarios
  • Abstraction of what people want to do
  • Catches the essence of their requirement
  • Framework for
  • Discussion
  • Comparison
  • Elaboration
  • Check how technologies cover scenarios
  • Scenarios should not be about implementation
  • Scenario can be decomposed into steps
  • Possibly in many ways
  • These are less abstract requirements

73
Job submission scenario
1 Create or revise a job description Q In what
language? Q What must it / can it say?
74
Job submission scenario
2 Submit the job description Q How? Q With what
extra parameters?
75
Job submission scenario
3 Ask about progress Q How? Q What can they learn
and when? Q Is the reply in user or system terms?
76
Job submission scenario
4 Retrieve results Q How? Q Where can they be
found? Q Are there helpful diagnostics?
77
Job submission scenario
Q Who provides and runs this system? Q How does
it get paid for? Q What are its policies for
allocating resources to JD submissions? Q How
reliable and efficient is it? Users view?
Managers view?
78
Job submission scenario
Q How much effort does it take to submit the same
job to another system? Q How does the code for
the application get to be executed? Q How are
data read or created during the computation
handled? Q How will this system evolve? Will
users need to learn new tricks?
79
Ensemble run scenario
Computing resources any type any where
80
Ensemble run scenario
Computing resources any type any where
Coordinationsystem
resultsstore
81
Ensemble run scenario
Computing resources any type any where
resultsstore
1 Create plan for the ensemble run, e.g.
parameter space to sweep and sampling method
82
Ensemble run scenario
Computing resources any type any where
resultsstore
2 Initiate the production and submission of jobs
83
Ensemble run scenario
Computing resources any type any where
resultsstore
3 Result accumulation
84
Ensemble run scenario
Computing resources any type any where
resultsstore
4 Researcher monitors and steers progress
85
Ensemble run scenario
Computing resources any type any where
resultsstore
5 Researcher recovers and analyses results -
computes derivatives
86
Ensemble run scenario
Computing resources any type any where
resultsstore
6 Researcher completes analyses discards or
archives results
87
Ensemble run scenario with context
Computing resources any type any where
Everything asbefore, plusinterleavedrequests
forcontext datafrom eachjob as it runs
Runs draw data from context stores boundary
conditions, pre-computed data, observations
88
Ensemble run scenario with metadata
Computing resources any type any where
Everything asbefore, plususe andgeneratemetada
ta aseach job runs
Runs organised using metadata and jobs generate
metadata helps manage 1000s of files
89
Repetition of Scenario
  • Normally, users repeatedly perform the same
    scenario
  • Analysis of the next sample
  • Re-analysis by other researchers designers
  • Calibration and normalisation of the latest
    observational run
  • Re-verification against the latest data
  • Evaluation of the risk of the next share purchase
  • (Revising the) design of an(other similar) engine
    component
  • Often with parametric variations
  • Often with progressive refinements
  • A better pattern recogniser
  • A refinement in calibration
  • Code fixes, updates to reference data,
  • How well do the solutions on offer support
    repetition?

90
Data integration scenario
Researcher wants to obtainspecified data from
multipledistributed data sources andto supply
the result to aprocess and then view itsoutput.
1 Researcher formulates query
2 Researcher submits query
3 Query system transforms and distributes query
4 Data services send back local results
5 Query system combines these to form requested
data
6 Query system sends data to process
7 Process system sends derived data to researcher
91
Summary Conclusions
92
Grids
  • Many reasons motivating investment in grids
  • Collaboration for Global Science Business
  • Resource integration sharing
  • New approach to large-scale distributed systems
  • Large coordinated effort necessary
  • Industry Academia
  • Economic Creative niches
  • Can they be assembled to provide all that is
    needed?
  • Many technical and socio-economic challenges
  • Work for you all
  • Many new opportunities
  • Work for you all

93
Summary Take home message
  • e-Infrastructure is arriving
  • Built on Grids Web Services
  • Data and Information grow in importance
  • Must include user support
  • Must be based on good socio-economic
    understanding
  • There is a dramatic rate of change
  • An opportunity for everyone

Can you ride the wave?
94
?
Picture compositionbyLuke Humphrybased on
prior art by Frans Hals
www.omii.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com