Cyberinfrastructure Technologies and Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Cyberinfrastructure Technologies and Applications

Description:

Cyberinfrastructure Technologies and Applications – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 54
Provided by: gridsUcs
Category:

less

Transcript and Presenter's Notes

Title: Cyberinfrastructure Technologies and Applications


1
Cyberinfrastructure Technologies and Applications
  • Summit on Cyberinfrastructure Innovation At Work
  • Banff Springs Hotel
  • Banff Canada October 11 2007
  • Geoffrey Fox
  • Computer Science, Informatics, Physics
  • Pervasive Technology Laboratories
  • Indiana University Bloomington IN 47401
  • http//grids.ucs.indiana.edu/ptliupages/presentati
    ons/
  • gcf_at_indiana.edu http//www.infomall.org

2
e-moreorlessanything
  • e-Science is about global collaboration in key
    areas of science, and the next generation of
    infrastructure that will enable it. from its
    inventor John Taylor Director General of Research
    Councils UK, Office of Science and Technology
  • e-Science is about developing tools and
    technologies that allow scientists to do faster,
    better or different research
  • Similarly e-Business captures an emerging view of
    corporations as dynamic virtual organizations
    linking employees, customers and stakeholders
    across the world.
  • This generalizes to e-moreorlessanything
    including presumably e-AlbertaEnterprise and
    e-oilandgas, e-geoscience .
  • A deluge of data of unprecedented and inevitable
    size must be managed and understood.
  • People (see Web 2.0), computers, data (including
    sensors and instruments) must be linked.
  • On demand assignment of experts, computers,
    networks and storage resources must be supported

2
3
What is Cyberinfrastructure
  • Cyberinfrastructure is (from NSF) infrastructure
    that supports distributed science (e-Science)
    data, people, computers
  • Clearly core concept more general than Science
  • Exploits Internet technology (Web2.0) adding (via
    Grid technology) management, security,
    supercomputers etc.
  • It has two aspects parallel low latency
    (microseconds) between nodes and distributed
    highish latency (milliseconds) between nodes
  • Parallel needed to get high performance on
    individual large simulations, data analysis etc.
    must decompose problem
  • Distributed aspect integrates already distinct
    components especially natural for data
  • Cyberinfrastructure is in general a distributed
    collection of parallel systems
  • Cyberinfrastructure is made of services
    (originally Web services) that are just
    programs or data sources packaged for distributed
    access

3
4
Underpinnings of Cyberinfrastructure
  • Distributed software systems are being
    revolutionized by developments from e-commerce,
    e-Science and the consumer Internet. There is
    rapid progress in technology families termed Web
    services, Grids and Web 2.0
  • The emerging distributed system picture is of
    distributed services with advertised interfaces
    but opaque implementations communicating by
    streams of messages over a variety of protocols
  • Complete systems are built by combining either
    services or predefined/pre-existing collections
    of services together to achieve new capabilities
  • As well as Internet/Communication revolutions
    (distributed systems), multicore chips will
    likely be hugely important (parallel systems)
  • Industry not academia is leading innovation in
    these technologies

5
Service or Web Service Approach
  • One uses GML, CML etc. to define the data
    structure in a system and one uses services to
    capture methods or programs
  • In eScience, important services fall in three
    classes
  • Simulations
  • Data access, storage, federation, discovery
  • Filters for data mining and manipulation
  • Services could use something like WSDL (Web
    Service Definition Language) to define
    interoperable interfaces but Web 2.0 follows old
    library practice one just specifies interface
  • Service Interface (WSDL) establishes a contract
    independent of implementation between two
    services or a service and a client
  • Services should be loosely coupled which normally
    means they are coarse grain
  • Services will be composed (linked together) by
    mashups (typically scripts) or workflow (often
    XML BPEL)
  • Software Engineering and Interoperability/Standard
    s are closely related

6
Computing and Cyberinfrastructure TeraGrid
TeraGrid resources include more than 250
teraflops of computing capability and more than
30 petabytes of online and archival data storage,
with rapid access and retrieval over
high-performance networks. TeraGrid is
coordinated at the University of Chicago, working
with the Resource Provider sites Indiana
University, Oak Ridge National Laboratory,
National Center for Supercomputing Applications,
Pittsburgh Supercomputing Center, Purdue
University, San Diego Supercomputer Center, Texas
Advanced Computing Center, University of
Chicago/Argonne National Laboratory, and the
National Center for Atmospheric Research.
Grid Infrastructure Group (UChicago)
UW
PSC
UC/ANL
NCAR
PU
NCSA
UNC/RENCI
IU
Caltech
ORNL
USC/ISI
SDSC
TACC
Resource Provider (RP)
Software Integration Partner
7
Data and Cyberinfrastructure
  • DIKW Data ? Information ? Knowledge ? Wisdom
    transformation
  • Applies to e-Science, Distributed Business
    Enterprise (including outsourcing), Military
    Command and Control and general decision support
  • (SOAP or just RSS) messages transport information
    expressed in a semantically rich fashion between
    sources and services that enhance and transform
    information so that complete system provides
  • Semantic Web technologies like RDF and OWL might
    help us to have rich expressivity but they might
    be too complicated
  • We are meant to build application specific
    information management/transformation systems for
    each domain
  • Each domain has Specific Services/Standards (for
    APIs and Information such as KML and GML for
    Geographical Information Systems)
  • and will use Generic Services (like R for
    datamining) and
  • Generic Standards (such as RDF, WSDL)
  • Standards made before consensus or not observant
    of technology progress are dubious

8
Information and Cyberinfrastructure
Raw Data ? Data ? Information ?
Knowledge ? Wisdom
AnotherGrid
Decisions
AnotherGrid
SS
SS
SS
SS
FS
FS
OS
MD
MD
FS
Portal
FS
OS
OS
OS
OS
Inter-Service Messages
FS
FS
FS
FS
AnotherService
FS
MD
MD
OS
MD
OS
OS
FS
Other Service
FS
FS
FS
FS
OS
MD
OS
OS
FS
FS
FS
MD
MD
FS
Filter Service
OS
FS
MetaData
AnotherGrid
FS
FS
FS
MD
Sensor Service
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
AnotherService
9
Information Cyberinfrastructure Architecture
  • The Party Line approach to Information
    Infrastructure is clear one creates a
    Cyberinfrastructure consisting of distributed
    services accessed by portals/gadgets/gateways/RSS
    feeds
  • Services include
  • Computing
  • original data
  • Transformations or filters implementing DIKW
    (Data Information Knowledge Wisdom) pipeline
  • Final Decision Support step converting wisdom
    into action
  • Generic services such as security, profiles etc.
  • Some filters could correspond to large
    simulations
  • Infrastructure will be set up as a System of
    Systems (Grids of Grids)
  • Services and/or Grids just accept some form of
    DIKW and produce another form of DIKW
  • Original data has no explicit input just output

10
Virtual Observatory Astronomy GridIntegrate
Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible X-ray
Galaxy Density Map
11
(No Transcript)
12
CReSIS PolarGrid
  • Important CReSIS-specific Cyberinfrastructure
    components include
  • Managed data from sensors and satellites
  • Data analysis such as SAR processing possibly
    with parallel algorithms
  • Electromagnetic simulations (currently commercial
    codes) to design instrument antennas
  • 3D simulations of ice-sheets (glaciers) with
    non-uniform meshes
  • GIS Geographical Information Systems
  • Also need capabilities present in many Grids
  • Portal i.e. Science Gateway
  • Submitting multiple sequential or parallel jobs
  • The need for three distinct types of components
    Continental USA with multiple base and field
    camps
  • Base and field camps must be power efficient
  • Terrible connectivity from base and field camps
    to Continental subGrid

13
CICC Chemical Informatics and Cyberinfrastructure
Collaboratory Web Service Infrastructure
Portal Services RSS Feeds User
Profiles Collaboration as in Sakai
Core Grid Services Service Registry Job
Submission and Management Local Clusters IU
Big Red, TeraGrid, Open Science Grid
14
Process Chemistry-Biology Interaction Data from
HTS (High Throughput Screening)
Percent Inhibition or IC50 data is retrieved from
HTS
Scientists at IU prefer Web 2.0 to Grid/Web
Service for workflow
Grids can link data analysis ( e.g image
processing developed in existing Grids),
traditional Chem-informatics tools, as well as
annotation tools (Semantic Web, del.icio.us) and
enhance lead ID and SAR analysis A Grid of Grids
linking collections of services atPubChem ECCR
centers MLSCN centers
Workflows encoding plate control well
statistics, distribution analysis, etc
Question Was this screen successful?
Workflows encoding distribution analysis of
screening results
Question What should the active/inactive cutoffs
be?

Question What can we learn about the target
protein or cell line from this screen?
Workflows encoding statistical comparison of
results to similar screens, docking of compounds
into proteins to correlate binding, with
activity, literature search of active compounds,
etc
Compound data submitted to PubChem
CHEMINFORMATICS
PROCESS
GRIDS
15
People and Cyberinfrastructure Web 2.0
  • Web 2.0 has tools (sites) and technologies
  • Technologies (later) are competition for Grids
    and Web Services
  • Sites (below) are the best way to integrate
    people into Cyberinfrastructure
  • Kazaa, Instant Messengers, Skype, Napster,
    BitTorrent for P2P Collaboration text,
    audio-video conferencing, files
  • del.icio.us, Connotea, Citeulike, Bibsonomy,
    Biolicious manage shared bookmarks
  • MySpace, YouTube, Bebo, Hotornot, Facebook, or
    similar sites allow you to create (upload)
    community resources and share them Friendster,
    LinkedIn create networks
  • http//en.wikipedia.org/wiki/List_of_social_networ
    king_websites
  • Writely, Wikis and Blogs are powerful specialized
    shared document systems
  • Google Scholar and Windows Live Academic Search
    tells you who has cited your papers while
    publisher sites tell you about co-authors

16
Best Web 2.0 Sites -- 2006
  • Extracted from http//web2.wsj2.com/
  • Social Networking
  • Start Pages
  • Social Bookmarking
  • Peer Production News
  • Social Media Sharing
  • Online Storage (Computing)

16
17
Web 2.0 Systems are Portals, Services, Resources
  • Captures the incredible development of
    interactive Web sites enabling people to create
    and collaborate

18
Web 2.0 and Web Services I
  • Web Services have clearly defined protocols
    (SOAP) and a well defined mechanism (WSDL) to
    define service interfaces
  • There is good .NET and Java support
  • The so-called WS- specifications provide a rich
    sophisticated but complicated standard set of
    capabilities for security, fault tolerance,
    meta-data, discovery, notification etc.
  • Narrow Grids build on Web Services and provide
    a robust managed environment with growing
    adoption in Enterprise systems and distributed
    science (so called e-Science)
  • Web 2.0 supports a similar architecture to Web
    services but has developed in a more chaotic but
    remarkably successful fashion with a service
    architecture with a variety of protocols
    including those of Web and Grid services
  • Over 500 Interfaces defined at http//www.programm
    ableweb.com/apis
  • Web 2.0 also has many well known capabilities
    with Google Maps and Amazon Compute/Storage
    services of clear general relevance
  • There are also Web 2.0 services supporting novel
    collaboration modes and user interaction with the
    web as seen in social networking sites, portals,
    MySpace, YouTube,

19
Web 2.0 and Web Services II
  • I once thought Web Services were inevitable but
    this is no longer clear to me
  • Web services are complicated, slow and non
    functional
  • WS-Security is unnecessarily slow and pedantic
    (canonicalization of XML)
  • WS-RM (Reliable Messaging) seems to have poor
    adoption and doesnt work well in collaboration
  • WSDM (distributed management) specifies a lot
  • There are de facto standards like Google Maps and
    powerful suppliers like Google which define the
    rules
  • One can easily combine SOAP (Web Service) based
    services/systems with HTTP messages but the
    lowest common denominator suggests additional
    structure/complexity of SOAP will not easily
    survive

20
Applications, Infrastructure, Technologies
  • The discussion is confused by inconsistent use of
    terminology this is what I mean
  • Multicore, Narrow and Broad Grids and Web 2.0
    (Enterprise 2.0) are technologies
  • These technologies combine and compete to build
    infrastructures termed e-infrastructure or
    Cyberinfrastructure
  • Although multicore can and will support
    standalone clients probably most important
    client and server applications of the future will
    be internet enhanced/enabled so key aspect of
    multicore is its role and integration in
    e-infrastructure
  • e-moreorlessanything is an emerging application
    area of broad importance that is hosted on the
    infrastructures e-infrastructure or
    Cyberinfrastructure

21
Some Web 2.0 Activities at IU
  • Use of Blogs, RSS feeds, Wikis etc.
  • Use of Mashups for Cheminformatics Grid workflows
  • Moving from Portlets to Gadgets in portals (or at
    least supporting both)
  • Use of Connotea to produce tagged document
    collections such as http//www.connotea.org/user/c
    rmc for parallel computing
  • Semantic Research Grid integrates multiple
    tagging and search systems and copes with
    overlapping inconsistent annotations
  • MSI-CIEC portal augments Connotea to tag a mix of
    URL and URIs e.g. NSF TeraGrid use, PIs and
    Proposals
  • Hopes to support collaboration (for Minority
    Serving Institution faculty)

22
Use blog to create posts.
Display blog RSS feed in MediaWiki.
23
Semantic Research Grid (SRG) Architecture
8/3/2018
23
24
MSI-CIEC Portal
MSI-CIEC Minority Serving Institution
CyberInfrastructure Empowerment Coalition
25
Mashups v Workflow?
  • Mashup Tools are reviewed at http//blogs.zdnet.co
    m/Hinchcliffe/?p63
  • Workflow Tools are reviewed by Gannon and Fox
    http//grids.ucs.indiana.edu/ptliupages/publicatio
    ns/Workflow-overview.pdf
  • Both include scripting in PHP, Python, sh etc. as
    both implement distributed programming at level
    of services
  • Mashups use all types of service interfaces and
    perhaps do not have the potential robustness
    (security) of Grid service approach
  • Mashups typically pure HTTP (REST)

25
26
Grid Workflow Datamining in Earth Science
  • Work with Scripps Institute
  • Grid services controlled by workflow process real
    time data from 70 GPS Sensors in Southern
    California

NASA GPS
Earthquake
26
27
Grid Workflow Data Assimilation in Earth Science
  • Grid services triggered by abnormal events and
    controlled by workflow process real time data
    from radar and high resolution simulations for
    tornado forecasts

Typical graphical interface to service composition
28
Web 2.0 uses all types of Services
  • Here a Gadget Mashup uses a 3 service workflow
    with a JavaScript Gadget Client

28
29
Web 2.0 Mashups and APIs
  • http//www.programmableweb.com/apis has (Sept 12
    2007) 2312 Mashups and 511 Web 2.0 APIs and with
    GoogleMaps the most often used in Mashups
  • The Web 2.0 UDDI (service registry)

30
The List of Web 2.0 APIs
  • Each site has API and its features
  • Divided into broad categories
  • Only a few used a lot (49 APIs used in 10 or
    more mashups)
  • RSS feed of new APIs
  • Amazon S3 growing in popularity

31
Grid-style portal as used in Earthquake Grid
  • The Portal is built from portlets providing
    user interface fragments for each service that
    are composed into the full interface uses OGCE
    technology as does planetary science VLAB portal
    with University of Minnesota

Now to Portals
31
32
Portlets v. Google Gadgets
  • Portals for Grid Systems are built using portlets
    with software like GridSphere integrating these
    on the server-side into a single web-page
  • Google (at least) offers the Google sidebar and
    Google home page which support Web 2.0 services
    and do not use a server side aggregator
  • Google is more user friendly!
  • The many Web 2.0 competitions is an interesting
    model for promoting development in the world-wide
    distributed collection of Web 2.0 developers
  • I guess Web 2.0 model will win!

32
33
Typical Google Gadget Structure
  • Lots of HTML and JavaScript lt/Contentgt lt/Modulegt

Portlets build User Interfaces by combining
fragments in a standalone Java Server Google
Gadgets build User Interfaces by combining
fragments with JavaScript on the client
34
Web 2.0 v Narrow Grid I
  • Web 2.0 and Grids are addressing a similar
    application class although Web 2.0 has focused on
    user interactions
  • So technology has similar requirements
  • Web 2.0 chooses simplicity (REST rather than
    SOAP) to lower barrier to everyone participating
  • Web 2.0 and Parallel Computing tend to use
    traditional (possibly visual) (scripting)
    languages for equivalent of workflow whereas
    Grids use visual interface backend recorded in
    BPEL
  • Web 2.0 and Grids both use SOA Service Oriented
    Architectures
  • System of Systems Grids and Web 2.0 are likely
    to build systems hierarchically out of smaller
    systems
  • We need to support Grids of Grids, Webs of Grids,
    Grids of Services etc. i.e. systems of systems of
    all sorts

34
35
Web 2.0 v Narrow Grid II
  • Web 2.0 has a set of major services like
    GoogleMaps or Flickr but the world is composing
    Mashups that make new composite services
  • End-point standards are set by end-point owners
  • Many different protocols covering a variety of
    de-facto standards
  • Narrow Grids have a set of major software systems
    like Condor and Globus and a different world is
    extending with custom services and linking with
    workflow
  • Popular Web 2.0 technologies are PHP, JavaScript,
    JSON, AJAX and REST with Start Page e.g.
    (Google Gadgets) interfaces
  • Popular Narrow Grid technologies are Apache Axis,
    BPEL WSDL and SOAP with portlet interfaces
  • Robustness of Grids demanded by the Enterprise?
  • Not so clear that Web 2.0 wont eventually
    dominate other application areas and with
    Enterprise 2.0 its invading Grids

36
Web 2.0 v Narrow Grid III
  • Narrow Grids have a strong emphasis on standards
    and structure Web 2.0 lets a 1000 flowers
    (protocols) and a million developers bloom and
    focuses on functionality, broad usability and
    simplicity
  • Semantic Web/Grid has structure to allow
    reasoning
  • Annotation in sites like del.icio.us and
    uploading to MySpace/YouTube is unstructured and
    free text search replaces structured ontologies
  • Portals are likely to feature both Web and
    desktop client technology although it is
    possible that Web approach will be adopted more
    or less uniformly
  • Web 2.0 has a very active portal activity which
    has similar architecture to Grids
  • A page has multiple user interface fragments
  • Web 2.0 user interface integration is typically
    Client side using Gadgets AJAX and JavaScript
    while
  • Grids are in a special JSR168 portal server side
    using Portlets WSRP and Java

36
37
The Ten areas covered by the 60 core WS-
Specifications
WS- Specification Area Typical Grid/Web Service Examples
1 Core Service Model XML, WSDL, SOAP
2 Service Internet WS-Addressing, WS-MessageDelivery Reliable Messaging WSRM Efficient Messaging MOTM
3 Notification WS-Notification, WS-Eventing (Publish-Subscribe)
4 Workflow and Transactions BPEL, WS-Choreography, WS-Coordination
5 Security WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation
6 Service Discovery UDDI, WS-Discovery
7 System Metadata and State WSRF, WS-MetadataExchange, WS-Context
8 Management WSDM, WS-Management, WS-Transfer
9 Policy and Agreements WS-Policy, WS-Agreement
10 Portals and User Interfaces WSRP (Remote Portlets)
38
WS- Areas and Web 2.0
WS- Specification Area Web 2.0 Approach
1 Core Service Model XML becomes optional but still useful SOAP becomes JSON RSS ATOM WSDL becomes REST with API as GET PUT etc. Axis becomes XmlHttpRequest
2 Service Internet No special QoS. Use JMS or equivalent?
3 Notification Hard with HTTP without polling JMS perhaps?
4 Workflow and Transactions (no Transactions in Web 2.0) Mashups, Google MapReduce Scripting with PHP JavaScript .
5 Security SSL, HTTP Authentication/Authorization, OpenID is Web 2.0 Single Sign on
6 Service Discovery http//www.programmableweb.com
7 System Metadata and State Processed by application no system state Microformats are a universal metadata approach
8 ManagementInteraction WS-Transfer style Protocols GET PUT etc.
9 Policy and Agreements Service dependent. Processed by application
10 Portals and User Interfaces Start Pages, AJAX and Widgets(Netvibes) Gadgets
39
Too much Computing?
  • Historically one has tried to increase computing
    capabilities by
  • Optimizing performance of codes
  • Exploiting all possible CPUs such as Graphics
    co-processors and idle cycles
  • Making central computers available such as
    NSF/DoE/DoD supercomputer networks
  • Next Crisis in technology area will be the
    opposite problem commodity chips will be
    32-128way parallel in 5 years time and we
    currently have no idea how to use them
    especially on clients
  • Only 2 releases of standard software (e.g.
    Office) in this time span
  • Gaming and Generalized decision support (data
    mining) are two obvious ways of using these
    cycles
  • Intel RMS analysis
  • Note even cell phones will be multicore
  • There is Too much data as well as Too much
    computing but unclear implications

40
Intels Projection
41
RMS Recognition Mining Synthesis
Recognition
Mining
Synthesis
Is it ?
What is ?
What if ?
Find a model instance
Create a model instance
Model
Model-less
Real-time streaming and transactions on static
structured datasets
Very limited realism
Model-based multimodal recognition
Real-time analytics on dynamic,
unstructured, multimodal datasets
Photo-realism and physics-based animation
42
Recognition
Mining
Synthesis
What is a tumor?
Is there a tumor here?
What if the tumor progresses?
It is all about dealing efficiently with complex
multimodal datasets
Images courtesy http//splweb.bwh.harvard.edu800
0/pages/images_movies.html
43
Intels Application Stack
44
Multicore SALSA at IU
  • Service Aggregated Linked Sequential Activities
  • http//www.infomall.org/multicore
  • Aims to link parallel and distributed (Grid)
    computing by developing parallel applications as
    services and not as programs or libraries
  • Improve traditionally poor parallel programming
    development environments
  • Can use messaging to link parallel and Grid
    services but performance functionality
    tradeoffs different
  • Parallelism needs few µs latency for message
    latency and thread spawning
  • Network overheads in Grid 10-100s µs
  • Developing Service (library) of multicore
    parallel data mining algorithms

45
Microsoft CCR for Parallelism
  • Use Microsoft CCR/DSS where DSS is
    mash-up/workflow service model built from CCR and
    CCR supports MPI or Dynamic threads
  • CCR Supports exchange of messages between threads
    using named ports
  • FromHandler Spawn threads without reading ports
  • Receive Each handler reads one item from a
    single port
  • MultipleItemReceive Each handler reads a
    prescribed number of items of a given type from a
    given port. Note items in a port can be general
    structures but all must have same type.
  • MultiplePortReceive Each handler reads a one
    item of a given type from multiple ports.
  • JoinedReceive Each handler reads one item from
    each of two ports. The items can be of different
    type.
  • Choice Execute a choice of two or more
    port-handler pairings
  • Interleave Consists of a set of arbiters (port
    -- handler pairs) of 3 types that are Concurrent,
    Exclusive or Teardown (called at end for clean
    up). Concurrent arbiters are run concurrently but
    exclusive handlers are
  • http//msdn.microsoft.com/robotics/

45
46
Timing of HP Opteron Multicore as a function of
number of simultaneous two-way service messages
processed (November 2006 DSS Release)
DSS Service Measurements
  • Measurements of Axis 2 shows about 500
    microseconds DSS is 10 times better

46
47
MPI Exchange Latency in µs (20-30 µs computation between messaging) MPI Exchange Latency in µs (20-30 µs computation between messaging) MPI Exchange Latency in µs (20-30 µs computation between messaging) MPI Exchange Latency in µs (20-30 µs computation between messaging) MPI Exchange Latency in µs (20-30 µs computation between messaging) MPI Exchange Latency in µs (20-30 µs computation between messaging)
Machine OS Runtime Grains Parallelism MPI Exchange Latency
Intel8cgf12 (8 core 2.33 Ghz) (in 2 chips) Redhat MPJE (Java) Process 8 181
Intel8cgf12 (8 core 2.33 Ghz) (in 2 chips) Redhat MPICH2 (C) Process 8 40.0
Intel8cgf12 (8 core 2.33 Ghz) (in 2 chips) Redhat MPICH2 Fast Process 8 39.3
Intel8cgf12 (8 core 2.33 Ghz) (in 2 chips) Redhat Nemesis Process 8 4.21
Intel8cgf20 (8 core 2.33 Ghz) Fedora MPJE Process 8 157
Intel8cgf20 (8 core 2.33 Ghz) Fedora mpiJava Process 8 111
Intel8cgf20 (8 core 2.33 Ghz) Fedora MPICH2 Process 8 64.2
Intel8b (8 core 2.66 Ghz) Vista MPJE Process 8 170
Intel8b (8 core 2.66 Ghz) Fedora MPJE Process 8 142
Intel8b (8 core 2.66 Ghz) Fedora mpiJava Process 8 100
Intel8b (8 core 2.66 Ghz) Vista CCR (C) Thread 8 20.2
AMD4 (4 core 2.19 Ghz) XP MPJE Process 4 185
AMD4 (4 core 2.19 Ghz) Redhat MPJE Process 4 152
AMD4 (4 core 2.19 Ghz) Redhat mpiJava Process 4 99.4
AMD4 (4 core 2.19 Ghz) Redhat MPICH2 Process 4 39.3
AMD4 (4 core 2.19 Ghz) XP CCR Thread 4 16.3
Intel4 (4 core 2.8 Ghz) XP CCR Thread 4 25.8
48
Clustering algorithm annealing by decreasing
distance scale and gradually finds more clusters
as resolution improved Here we see 10 increasing
to 30 as algorithm progresses
49
Parallel Multicore Clustering (C on Windows)
Parallel Overheadon 8 Threads running on Intel 8
core Speedup 8/(1Overhead)
10 Clusters
Overhead Constant1 Constant2/n Constant1
0.05 to 0.1 (Client Windows) due to
threadruntime fluctuations
20 Clusters
10000/(Grain Size n points per core)
50
We use DSS as Service Framework as Integrated
with CCR Supporting MPI/Threading
51
Intel 8-core C with 80 Clusters Vista Run Time
Fluctuations for Clustering Kernel
  • 2 Quadcore Processors
  • This is average of standard deviation of run time
    of the 8 threads between messaging
    synchronization points

52
Intel 8 core with 80 Clusters Redhat Run Time
Fluctuations for Clustering Kernel
  • This is average of standard deviation of run time
    of the 8 threads between messaging
    synchronization points

Standard Deviation/Run Time
Number of Threads
53
What should one do?
  • i.e. How does one Cyberinfrastructure enable a
    given area/application XYZ
  • As computing free, focus on identifying
    data/information/knowledge/wisdom needed (there
    is probably too much data but not so much wisdom
    in DIKW pipeline)
  • Should we care just about original data or also
    about the whole pipeline DIKW?
  • Scope out supercomputer/computer services needed
    and exploit OGF standards
  • Identify services (filters, often data mining)
    needed by XYZ?
  • Will we need parallel implementations of filters
    if so use multicore compatible frameworks
  • Identify standards for application XYZ
  • Set up distributed XYZ Services
  • Use Web 2.0 (as it makes things easier) not
    current Grids (which makes things harder)
  • Build a Programmable XYZ Web
  • Emphasize Simplicity
  • Is Secrecy important and in fact viable? Often
    important but hard
  • What are synergies of XYZ to pervasive
    capabilities such as Web 2.0 sites, National
    resources like TeraGrid, and Personal aides in
    an information rich world (future of PC) ?
Write a Comment
User Comments (0)
About PowerShow.com