Cyberinfrastructure%20and%20Scientific%20Collaboration - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Cyberinfrastructure%20and%20Scientific%20Collaboration

Description:

Cyberinfrastructure and Scientific Collaboration – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 48
Provided by: gridsUcs
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Cyberinfrastructure%20and%20Scientific%20Collaboration


1
Cyberinfrastructure and Scientific Collaboration
  • Science and Technology in the Pacific Century
    (STIP)
  • East Asian Colloquium, sponsored by the East
    Asian Studies Center
  • Indiana University Ballantine Hall 004
  • November 30 2007
  • Geoffrey Fox
  • Computer Science, Informatics, Physics
  • Pervasive Technology Laboratories
  • Indiana University Bloomington IN 47401
  • gcf_at_indiana.edu
  • http//www.infomall.org

2
Abstract
  • eScience (or better eResearch) denotes a research
    model of virtual organizations of people, data,
    instruments, and computers linked across the
    globe.
  • Although the United States pioneered technologies
    that support this model, technology efforts are
    now much stronger in Europe and Asia.
  • NSFs new CDI (Cyber-Enabled Discovery and
    Innovation) and other initiatives are
    establishing the virtual organization implied by
    eResearch.
  • Professor Fox will discuss particular examples of
    well-known international projects and how this
    collaboration model impacts the role of brick and
    mortar organizations.

3
e-moreorlessanything
  • e-Science is about global collaboration in key
    areas of science, and the next generation of
    infrastructure that will enable it. from
    inventor of term John Taylor Director General of
    Research Councils UK, Office of Science and
    Technology
  • e-Science is about developing tools and
    technologies that allow scientists to do faster,
    better or different research
  • Similarly e-Business captures an emerging view of
    corporations as dynamic virtual organizations
    linking employees, customers and stakeholders
    across the world.
  • This generalizes to e-moreorlessanything
    including presumably e-PacificResearch and
    e-olympics, e-education .
  • A deluge of data of unprecedented and inevitable
    size must be managed and understood.
  • People (see Web 2.0), computers, data (including
    sensors and instruments) must be linked.
  • On demand assignment of experts, computers,
    networks and storage resources must be supported

3
4
Community Grids Laboratory Research
  • Develop and apply technology to distributed
    enterprises mainly science
  • Funded by NSF NASA NIH DoE and DoD
  • Cheminformatics High Throughput Screening data
    and filtering PubChem PubMed including document
    analysis
  • Interactive global Particle Physics (and Plasma
    Physics) Data Analysis
  • Earthquake Science predicting earthquakes using
    simulations and satellite and GPS global
    positioning system Sensor Grid
  • Ice Sheet Dynamics melting of Glaciers
  • Navajo Nation Grid Education (Science Gateways)
    and Healthcare supporting digital repositories
    of American Indian cculture
  • Architecture of Air Force Sensor and Decision
    support systems
  • eSports collaboration for real time trainers and
    sportsman with HPER IU School of Health, Physical
    Education, and Recreation.

5
Some Collaborations
  • Naïve Strategy is work with best group in a given
    field
  • This group is rarely at Indiana University does
    this have implications for broad schools like
    Informatics?
  • Earthquake Science with California (UC Davis USC
    JPL) and Australia, Japan, China in a group ACES
    originally set up under APEC
  • Ice Sheet Dynamics with Kansas University and
    Elizabeth City State (North Carolina)
  • Cheminformatics with Cambridge University UK and
    IU Informatics
  • Particle Physics with Caltech and a small
    Business Deep Web Technologies in Santa Fe (SBIR)
  • DoD architectures with Ball Aerospace and a small
    business Anabas in California (SBIR)
  • Visualization for Plasma Physics with General
    Atomics and Anabas (STTR)
  • Minority Serving Institutions with University
    Houston Downtown, AIHEC and HACU
  • Technologies with IU Computer Science, Open Grid
    Forum and many around the world
  • International contacts best with P. R. China and
    United Kingdom

6
Applications, Infrastructure, Technologies
  • This field is confused by inconsistent use of
    terminology I define
  • Web Services, Grids and (aspects of) Web 2.0
    (Enterprise 2.0) are technologies
  • Grids could be everything (Broad Grids
    implementing some sort of managed web) or
    reserved for specific architectures like OGSA or
    Web Services (Narrow Grids)
  • These technologies combine and compete to build
    electronic infrastructures termed
    e-infrastructure or Cyberinfrastructure
  • e-moreorlessanything is an emerging application
    area of broad importance that is hosted on the
    infrastructures e-infrastructure or
    Cyberinfrastructure
  • e-Science or perhaps better e-Research is a
    special case of e-moreorlessanything

7
What is Cyberinfrastructure
  • Cyberinfrastructure is (from NSF) infrastructure
    that supports distributed science (e-Science)
    data, people, computers
  • Clearly core concept more general than Science
  • Exploits Internet technology (Web2.0) adding (via
    Grid technology) management, security,
    supercomputers etc.
  • It has two aspects parallel low latency
    (microseconds) between nodes and distributed
    highish latency (milliseconds) between nodes
  • Parallel needed to get high performance on
    individual large simulations, data analysis etc.
    must decompose problem
  • Distributed aspect integrates already distinct
    components especially natural for data
  • Cyberinfrastructure is in general a distributed
    collection of parallel systems
  • Cyberinfrastructure is made of services
    (originally Web services) that are just
    programs or data sources packaged for distributed
    access

7
8
Underpinnings of Cyberinfrastructure
  • Distributed software systems are being
    revolutionized by developments from e-commerce,
    e-Science and the consumer Internet. There is
    rapid progress in technology families termed Web
    services, Grids and Web 2.0
  • The emerging distributed system picture is of
    distributed services with advertised interfaces
    but opaque implementations communicating by
    streams of messages over a variety of protocols
  • Complete systems are built by combining either
    services or predefined/pre-existing collections
    of services together to achieve new capabilities
  • As well as Internet/Communication revolutions
    (distributed systems), multicore chips will
    likely be hugely important (parallel systems)
  • Industry not academia is leading innovation in
    these technologies

9
Computing and Cyberinfrastructure TeraGrid
TeraGrid resources include more than 250
teraflops of computing capability and more than
30 petabytes of online and archival data storage,
with rapid access and retrieval over
high-performance networks. TeraGrid is
coordinated at the University of Chicago, working
with the Resource Provider sites Indiana
University, Oak Ridge National Laboratory,
National Center for Supercomputing Applications,
Pittsburgh Supercomputing Center, Purdue
University, San Diego Supercomputer Center, Texas
Advanced Computing Center, University of
Chicago/Argonne National Laboratory, and the
National Center for Atmospheric Research.
Grid Infrastructure Group (UChicago)
UW
PSC
UC/ANL
NCAR
PU
NCSA
UNC/RENCI
IU
Caltech
ORNL
USC/ISI
SDSC
TACC
Resource Provider (RP)
Software Integration Partner
10
The first particle physics experiment The Big
Bang
CMB
  • A Brief History of Time
  • 10-43 secs 10-37 secs
  • Gravity Strong forces separate
  • 10-35 secs
  • Inflation
  • 10-10 seconds
  • Quark-AntiQuark Annihilation (CP Violation)
  • 10 microseconds
  • Quarks form protons, neutrons
  • 380,000 years (last scatter)
  • Nuclei capture electrons, form atoms universe
    transparent to light
  • 1.0 Gigayear
  • Galaxies begin to form
  • 13.7 Gigayears Today

LHC
11
Large Hadron Collider CERN, Geneva 2008 Start
  • pp ?s 14 TeV L1034 cm-2 s-1
  • 27 km Tunnel in Switzerland France

CMS
TOTEM
pp, general purpose HI
5000 Physicists 250 Institutes 60
Countries
Atlas
ALICE HI
LHCb B-physics
Higgs, SUSY, Extra Dimensions, CP Violation, QG
Plasma, the Unexpected
Challenges Analyze petabytes of complex data
cooperativelyHarness global computing, data
network resources
12
The LHC Data Grid HierarchyDeveloped at Caltech
(1999)
gt10 Tier1 and 100 Tier2 CentersTransforming
Science
PByte/sec
150-1500 MBytes/sec
Online System
Experiment
CERN Center PBs of Disk Tape Robot
Tier 0 1
Tier 1
10 - 40 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
Tier 2
1-10 Gbps
Tier 3
Institute
Institute
Institute
Institute
Tens of Petabytes by 2007-8An Exabyte 5-7 Years
later100 Gbps Data Networks
Physics data cache
1 to 10 Gbps
Tier 4
Workstations
Emerging Vision A Richly Structured, Global
Dynamic System
13
Tier-2s
The Proliferation of Tier2s? LHC Computing will
beMore Dynamic Network-Oriented
100 Identified Number still growing
J. Knobloch
14
Closing CMS for the first time (July)
15
Higgs diphoton Analysis using Rootlets
16
Data and Cyberinfrastructure
  • DIKW Data ? Information ? Knowledge ? Wisdom
    transformation
  • Applies to e-Science, Distributed Business
    Enterprise (including outsourcing), Military
    Command and Control and general decision support
  • (SOAP or just RSS) messages transport information
    expressed in a semantically rich fashion between
    sources and services that enhance and transform
    information so that complete system provides
  • Semantic Web technologies like RDF and OWL might
    help us to have rich expressivity but they might
    be too complicated
  • We are meant to build application specific
    information management/transformation systems for
    each domain
  • Each domain has Specific Services/Standards (for
    APIs and Information such as KML and GML for
    Geographical Information Systems)
  • and will use Generic Services (like R for
    datamining) and
  • Generic Standards (such as RDF, WSDL)
  • Standards made before consensus or not observant
    of technology progress are dubious

17
Information and Cyberinfrastructure
Raw Data ? Data ? Information ?
Knowledge ? Wisdom
AnotherGrid
Decisions
AnotherGrid
SS
SS
SS
SS
FS
FS
OS
MD
MD
FS
Portal
FS
OS
OS
OS
OS
Inter-Service Messages
FS
FS
FS
FS
AnotherService
FS
MD
MD
OS
MD
OS
OS
FS
Other Service
FS
FS
FS
FS
OS
MD
OS
OS
FS
FS
FS
MD
MD
FS
Filter Service
OS
FS
MetaData
AnotherGrid
FS
FS
FS
MD
Sensor Service
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
AnotherService
18
Information Cyberinfrastructure Architecture
  • The Party Line approach to Information
    Infrastructure is clear one creates a
    Cyberinfrastructure consisting of distributed
    services accessed by portals/gadgets/gateways/RSS
    feeds
  • Services include
  • Computing
  • original data
  • Transformations or filters implementing DIKW
    (Data Information Knowledge Wisdom) pipeline
  • Final Decision Support step converting wisdom
    into action
  • Generic services such as security, profiles etc.
  • Some filters could correspond to large
    simulations
  • Infrastructure will be set up as a System of
    Systems (Grids of Grids)
  • Services and/or Grids just accept some form of
    DIKW and produce another form of DIKW
  • Original data has no explicit input just output

19
Virtual Observatory Astronomy GridIntegrate
Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible X-ray
Galaxy Density Map
20
(No Transcript)
21
Minority Serving Institutions and the Grid
  • Historically the R1 Research University
    powerhouses dominated research due to their
    concentration of expertise
  • Cyberinfrastructure allows others to participate
    in same way it supports distributed collaboration
    in spirit of old distance education
  • Navajo Nation (Colorado Plateau covering over
    25,000 square miles in northeast Arizona,
    northwest New Mexico, and southeast Utah) with
    110 communities and over 40 unemployment.
    Building a wireless grid for education,
    healthcare
  • http//www.win-hec.org/ World Indigenous Nations
    Higher Education Consortium
  • Cyberinfrastructure allows Nations to preserve
    their geographical identity but participate fully
    with world class jobs and research
  • Some 335 MSIs in Alliance have similar hopes for
    Cyberinfrastructure to jump start their
    advancement!

Is this really true? Didnt work for distance
education?
22
Navajo Nation Wireless Grid
  • Internet to Hogan dedicated January 29 2007 at
    Navajo Technical College Crownpoint NM

23
Example Setting up a Polar CI-Grid
  • The North and South poles are melting with
    potential huge environmental impact
  • As a result of MSI meetings, I am working with
    MSI ECSU in North Carolina and Kansas University
    to design and set up a Polar Grid
    (Cyberinfrastructure)
  • This is a network of computers, sensors (on
    robots and satellites), data and people aimed at
    understanding science of ice-sheets and impact of
    global warming
  • We have changed the 100,000 year Glacier cycle
    into a 50 year cycle the field has increased
    dramatically in importance and interest
  • Good area to get involved in as not so much
    established work

24
Jacobshavn
  • Greenlands mass loss doubled in the last decade
  • 0.23 0.08 mm slr / yr in 1996
  • 0.57 0.1 mm slr / yr in 2005
  • 2/3 of the loss is caused by ice dynamics
  • 1/3 is due to enhanced runoff

Rignot and Kanagaratnam, Science (2006)
Jakobshavns Discharge 24 km3 / yr (5.6 mile3 /
yr) in 1996 46 km3 / yr (10.8 mile3 / yr)in
2005
25
(No Transcript)
26
Slide courtesy of Dr. Yehuda Bock
http//sopac.ucsd.edu/input/realtime/CRTN_NGGPSUG.
ppt
27
APEC Cooperation for Earthquake Simulation
  • ACES is a eight year-long collaboration among
    scientists interested in earthquake and tsunami
    predication
  • iSERVO is Infrastructure to supportwork of ACES
  • SERVOGrid is (completed) US Grid that is a
    prototype of iSERVO
  • http//www.quakes.uq.edu.au/ACES/
  • Chartered under APEC the Asia Pacific Economic
    Cooperation of 21 economies

28
RepositoriesFederated Databases
Streaming Data
Sensors
Database
Sensor Grid
Database Grid
Research
Education
SERVOGrid
Compute Grid
Customization Services From Researchto Education
Data FilterServices
ResearchSimulations
Analysis and VisualizationPortal
EducationGrid Computer Farm
Grid of Grids Research Grid and Education Grid
29
SERVOGrid and Cyberinfrastructure
  • Grids are the technology based on Web services
    that implement Cyberinfrastructure i.e. support
    eScience or science as a team sport
  • Internet scale managed services that link
    computers data repositories sensors instruments
    and people
  • There is a portal and services in SERVOGrid for
  • Applications such as GeoFEST, RDAHMM, Pattern
    Informatics, Virtual California (VC), Simplex,
    mesh generating programs ..
  • Job management and monitoring web services for
    running the above codes.
  • File management web services for moving files
    between various machines.
  • Geographical Information System services
  • Quaketables earthquake specific database
  • Sensors as well as databases
  • Context (dynamic metadata) and UDDI system long
    term metadata services
  • Services support streaming real-time data

30
Grid Workflow Datamining in Earth Science
  • Work with Scripps Institute
  • Grid services controlled by workflow process real
    time data from 70 GPS Sensors in Southern
    California

NASA GPS
Earthquake
30
31
Grid-style portal as used in Earthquake Grid
  • The Portal is built from portlets providing
    user interface fragments for each service that
    are composed into the full interface uses OGCE
    technology as does planetary science VLAB portal
    with University of Minnesota

Now to Portals
31
32
(No Transcript)
33
ACES Components
Country and/or Economies Data (shared as part of acollaboration) Earthquake Forecast/Model Wave Motion Infrastructure Institutions
Australia Seismic data, fault database, GPS Finley, LSM PANDAS prototype Access
Canada Polaris Radarsat Pattern Informatics
P.R. China Seismic GPS LURR CAS China National Grid
Japan GPS Seismic Daichi (InSAR) GeoFEM JST-CREST Earth Simulator Naregi
ChineseTaipei FORMOSAT-3/COSMIC (F/C)
U.S.A. QuakeTables Sesismic InSAR PBO (GPS) Pattern Informatics ALLCAL GeoFEST, PARK, VirtualCalifornia TeraShake SERVOGrid GEON SCECGrid Vlab
International IMS Pacific Rim Universities (APRU ) PRAGMA
34
Grid Workflow Data Assimilation in Earth Science
  • Grid services triggered by abnormal events and
    controlled by workflow process real time data
    from radar and high resolution simulations for
    tornado forecasts

Typical graphical interface to service composition
35
Service or Web Service Approach
  • One uses GML, CML etc. to define the data
    structure in a system and one uses services to
    capture methods or programs
  • In eScience, important services fall in three
    classes
  • Simulations
  • Data access, storage, federation, discovery
  • Filters for data mining and manipulation
  • Services could use something like WSDL (Web
    Service Definition Language) to define
    interoperable interfaces but Web 2.0 follows old
    library practice one just specifies interface
  • Service Interface (WSDL) establishes a contract
    independent of implementation between two
    services or a service and a client
  • Services should be loosely coupled which normally
    means they are coarse grain
  • Services will be composed (linked together) by
    mashups (typically scripts) or workflow (often
    XML BPEL)
  • Software Engineering and Interoperability/Standard
    s are closely related

36
Relevance of Web 2.0
  • They say that Web 1.0 was a read-only Web while
    Web 2.0 is the wildly read-write collaborative
    Web
  • Web 2.0 can help e-Science in many ways
  • Its tools can enhance scientific collaboration,
    i.e. effectively support virtual organizations,
    in different ways from grids
  • The popularity of Web 2.0 can provide high
    quality technologies and software that (due to
    large commercial investment) can be very useful
    in e-Science and preferable to Grid or Web
    Service solutions
  • The usability and participatory nature of Web 2.0
    can bring science and its informatics to a
    broader audience
  • Web 2.0 can even help the emerging challenge of
    using multicore chips i.e. in improving parallel
    computing programming and runtime environments

37
Grid Capabilities for Science
  • Open technologies for any large scale distributed
    system that is adopted by industry, many sciences
    and many countries (including UK, EU, USA, Asia)
  • Security, Reliability, Management and state
    standards
  • Service and messaging specifications
  • User interfaces via portals and portlets
    virtualizing to desktops, email, PDAs etc.
  • 20 TeraGrid Science Gateways (their name for
    portals)
  • OGCE Portal technology effort led by Indiana
  • Uniform approach to access distributed
    (super)computers supporting single (large) jobs
    and spawning lots of related jobs
  • Data and meta-data architecture supporting
    real-time and archives as well as federation
  • Links to Semantic web and annotation
  • Grid (Web service) workflow with standards and
    several successful instantiations (such as
    Taverna and MyLead)
  • Many Earth science grids including ESG (DoE),
    GEON, LEAD, SCEC, SERVO LTER and NEON for
    Environment
  • http//www.nsf.gov/od/oci/ci-v7.pdf

38
CICC Chemical Informatics and Cyberinfrastructure
Collaboratory Web Service Infrastructure
Portal Services RSS Feeds User
Profiles Collaboration as in Sakai
Core Grid Services Service Registry Job
Submission and Management Local Clusters IU
Big Red, TeraGrid, Open Science Grid
39
Process Chemistry-Biology Interaction Data from
HTS (High Throughput Screening)
Percent Inhibition or IC50 data is retrieved from
HTS
Scientists at IU prefer Web 2.0 to Grid/Web
Service for workflow
Grids can link data analysis ( e.g image
processing developed in existing Grids),
traditional Chem-informatics tools, as well as
annotation tools (Semantic Web, del.icio.us) and
enhance lead ID and SAR analysis A Grid of Grids
linking collections of services atPubChem ECCR
centers MLSCN centers
Workflows encoding plate control well
statistics, distribution analysis, etc
Question Was this screen successful?
Workflows encoding distribution analysis of
screening results
Question What should the active/inactive cutoffs
be?

Question What can we learn about the target
protein or cell line from this screen?
Workflows encoding statistical comparison of
results to similar screens, docking of compounds
into proteins to correlate binding, with
activity, literature search of active compounds,
etc
Compound data submitted to PubChem
CHEMINFORMATICS
PROCESS
GRIDS
40
Workflows - Taverna (taverna.sourceforge.net)
41
Supporting distributed Enterprise I
  • Technologies support virtual organizations
    which are real organizations linked
    electronically these refer to linkage of
  • Asynchronous There are rather difficult to use
    Grid technologies and powerful but not so
    security/privacy sensitive Web 2.0 technologies
    varying from YouTube, Connotea to email, Wikis
    and Blogs
  • Synchronous There are audio-video conferencing
    and Polycom/WebEx style tools
  • Such real time collaboration tools are still
    unreliable (I have worked on them since 1997) and
    you still need a lot of travel

42
Supporting distributed Enterprise II
  • Technologies support linkage of resources among
    themselves and to people
  • People and data are intrinsically distributed but
    computers are not
  • Particle Physics has one accelerator but raw data
    becomes processed data in some 50 places around
    globe
  • Earthquakes occur all over and so is their data
  • Polar science uses UAV gathering data all over
    poles
  • Cloud Computing offers seamless access to a pile
    of computers anywhere
  • Grid computing integrates multiple computers in
    different places which is harder as must link
    computers with different owners and policies

43
Distance Education etc.
  • 10 years ago, I expected distance education to be
    very important
  • See http//www.old-npac.org/users/gcf/icwujan98/in
    dex.html or http//www.npac.syr.edu/users/gcf/virt
    univ95/index.html
  • Plans for ICWU International Collaborative
    University
  • This describes the plans of NPAC (Fox) and Peking
    University (Li) to set up an International
    Collaborative Web University and offer initial
    courses
  • Initial plans are a 6 course Graduate Internetics
    Program and a 2 course Web/Java High School
    Program Students and Instruction will be spread
    over at least 6 institutions
  • NPAC and Several Chinese Universities are already
    committed and we expect other Asian U.S. and
    European participation
  • Very little was actually done why?
  • The quality of real time interactive experience
    is still poor or needs more infrastructure than
    most people have
  • Tools supporting my distance education classes
    2001-2005 (http//grids.ucs.indiana.edu/ptliupages
    /jsucourse2005/) poor compared to those I had
    1998-2000

44
Teaching Jackson State Fall 97 to Spring 2005
Syracuse
JSU
45
(No Transcript)
46
The Virtual University I
  • Motivated either by decreased cost or increased
    quality of learning environment
  • Will succeed due to market pressures (it will
    offer the best product)
  • Is technologically possible today and can only
    get better
  • Main problem is pervasive Quality of Service for
    digital audio and video
  • In structured settings like briefings, lectures
    etc., support is easier as at fixed times and
    digital video of secondary importance
  • Brainstorming and general collaboration
    technically harder

47
The Virtual University II
  • Centers of Excellence (Hermits Cave Virtual
    University) is natural entity to produce and
    deliver classes
  • Today 1 faculty delivers 2 courses a semester --
    each to say 25 students
  • Instead 3 faculty collaborate on 1 course and
    deliver to some 200 students -- perhaps in
    multiple sessions (200 students required to fund
    quality curricula and 200 students requires
    distance education except in a few classes)
  • University acts as an integrator putting together
    a set of classes where it may only teach some 20
    but acts as a mentor to all
  • Important issues as to certification and natural
    unit of instruction (smaller than typical
    degree)

48
Global Computer Science Status
  • There was a major federal computer science
    initiative 1990-2000 (HPCC High Performance
    Computing and Communication)
  • At that time, Europe tried and failed to compete
    and Japan was a serious but not leading activity
    in Asia (Japan had a failed fifth generation
    project based on dubious Artificial Intelligence
    ideas)
  • Grids and Cyberinfrastructure have replaced HPCC
    and here the status is different
  • US Business (Google, Amazon, Microsoft) clearly
    dominant
  • Government sponsored work is in classic Grid
    strongest in Europe
  • US Research rather chaotic but may be correctly
    pointing to change
  • Core of field due to US research of around 10
    years ago
  • Work in China, Japan, India, Australia, Latin
    America world class and has interesting EU
    support
  • China needs to make step from developing
    research to major research power!
About PowerShow.com