TITAN: A Next-Generation Infrastructure for Integrating and Communication - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

TITAN: A Next-Generation Infrastructure for Integrating and Communication

Description:

TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure Meeting – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 38
Provided by: Davi1767
Category:

less

Transcript and Presenter's Notes

Title: TITAN: A Next-Generation Infrastructure for Integrating and Communication


1
TITAN A Next-Generation Infrastructure for
Integrating and Communication
  • David E. Culler
  • Computer Science Division
  • U.C. Berkeley
  • NSF Research Infrastructure Meeting
  • Aug 7, 1999

2
Project Goal
  • Develop a new type of system which harnesses
    breakthrough communications technology to
    integrate a large collection of commodity
    computers into a powerful resource pool that can
    be accessed directly through its constituent
    nodes or through inexpensive media stations.
  • SW architecture for global operating system
  • programming language support
  • advanced applications
  • multimedia application development

3
Project Components
The Building is the Computer
4
Use what you build, learn from use,...
Develop Enabling Systems Technology
Develop Driving Applications
5
Highly Leveraged Project
  • Large industrial contribution
  • HP media stations
  • Sun compute stations
  • Sun SMPs
  • Intel media stations
  • Bay networks ATM, ethernet
  • Enabled several federal grants
  • NOW
  • Titanium, Castle
  • Daedalus, Mash
  • DLIB
  • Berkeley Multimedia Research Center

6
Landmarks
  • Top 500 Linpack Performance List
  • MPI, NPB performance on par with MPPs
  • RSA 40-bit Key challenge
  • World Leading External Sort
  • Inktomi search engine
  • NPACI resource site

Sustains 500 MB/s disk bandwidth and1,000 MB/s
network bandwidth
7
Sample of 98 Degrees from Titan
  • Amin Vahdat WebOS
  • Steven Lumetta Multiprotocol Communication
  • Wendy Heffner Multicast Communication Protocols
  • Doug Ghormley Global OS
  • Andrea Dusseau Implicit Co-scheduling
  • Armando Fox TACC Proxy Architecture
  • John Byers Fast, Reliable Bulk Communication
  • Elan Amir Media Gateway
  • David Bacon Compiler Optimization
  • Kristen Wright Scalable web cast
  • Jeanna Neefe xFS
  • Steven Gribble Web caching
  • Ian Goldberg Wingman
  • Eshwar Balani WebOS security
  • Paul Gautier Scalable Search Engines

8
Results
  • Constructed three prototypes, culminating in 100
    processor UltraSparc NOW three extensions
  • GLUnix global operating system layer
  • Active Messages providing fast, general purpose
    user-level communication
  • xFS cluster file system
  • Fast sockets, MPI, and SVM
  • Titanium and Split-C parallel languages
  • ScaLapack libraries
  • Heavily used in dept. and external research
  • gt instrumental in establishing clusters as a
    viable approach to large scale computing
  • gt transitioned to an NPACI experimental resource
  • The Killer App Scalable Internet Services

9
First HP/fddi Prototype
  • FDDI on the HP/735 graphics bus.
  • First fast msg layer on non-reliable network

10
SparcStation ATM NOW
  • ATM was going to take over the world.
  • Myrinet SAN emerged

The original INKTOMI
11
Technological Revolution
  • The Killer Switch
  • single chip building block for scalable networks
  • high bandwidth
  • low latency
  • very reliable
  • if its not unplugged
  • gt System Area Networks
  • 8 bidirectional ports of 160 MB/s each way
  • lt 500 ns routing delay
  • Simple - just moves the bits
  • Detects connectivity and deadlock

12
100 node Ultra/Myrinet NOW
13
NOW System Architecture
Parallel Apps
Large Seq. Apps
Sockets, Split-C, MPI, HPF, vSM
Global Layer UNIX
Process Migration
Distributed Files
Network RAM
Resource Management
UNIX Workstation
UNIX Workstation
UNIX Workstation
UNIX Workstation
Comm. SW
Comm. SW
Comm. SW
Comm. SW
Net Inter. HW
Net Inter. HW
Net Inter. HW
Net Inter. HW
Fast Commercial Switch (Myrinet)
14
Software Warehouse
  • Coherent software environment throughout the
    research program
  • Billions bytes of code
  • Mirrored externally
  • New SWW-NT

15
Multi-Tier Networking Infrastructure
  • Myrinet Cluster Interconnect
  • ATM backbone
  • Switched Ethernet
  • Wireless

16
Multimedia Development Support
  • Authoring tools
  • Presentation capabilities
  • Media stations
  • Multicast support / MBone

17
Novel Cluster Designs
  • Tertiary Disk
  • very low cost massive storage
  • hosts archive of Museum of Fine Arts
  • Pleiades Clusters
  • functionally specialized storage and information
    servers
  • constant back-up and restore at large scale
  • NOW tore apart traditional AUSPEX servers
  • CLUMPS
  • cluster of SMPs with multiple NICs per node

18
Massive Cheap Storage
  • Basic unit
  • 2 PCs double-ending four SCSI chains

Currently serving Fine Art at http//www.thinker.o
rg/imagebase/
19
Information Servers
  • Basic Storage Unit
  • Ultra 2, 300 GB raid, 800 GB tape stacker, ATM
  • scalable backup/restore
  • Dedicated Info Servers
  • web,
  • security,
  • mail,
  • VLANs project into dept.

20
Cluster of SMPs (CLUMPS)
  • Four Sun E5000s
  • 8 processors
  • 3 Myricom NICs
  • Multiprocessor, Multi-NIC, Multi-Protocol

21
Novel Systems Design
  • Virtual networks
  • integrate communication events into virtual
    memory system
  • Implicit Co-scheduling
  • cause local schedulers to co-schedule parallel
    computations using a two-phase spin-block and
    observing round-trip
  • Co-operative caching
  • access remote caches, rather than local disk, and
    enlarge global cache coverage by simple
    cooperation
  • Reactive Scalable I/O
  • Network virtual memory, fast sockets
  • ISAAC active security
  • Internet Server Architecture
  • TACC Proxy architecture

22
Fast Communication
  • Fast communication on clusters is obtained
    through direct access to the network, as on MPPs
  • Challenge is make this general purpose
  • system implementation should not dictate how it
    can be used

23
Virtual Networks
  • Endpoint abstracts the notion of attached to the
    network
  • Virtual network is a collection of endpoints that
    can name each other.
  • Many processes on a node can each have many
    endpoints, each with own protection domain.

24
How are they managed?
  • How do you get direct hardware access for
    performance with a large space of logical
    resources?
  • Just like virtual memory
  • active portion of large logical space is bound to
    physical resources

Host Memory
Process n
Processor

Process 3
Process 2
Process 1
NIC Mem
P
Network Interface
25
Network Interface Support
  • NIC has endpoint frames
  • Services active endpoints
  • Signals misses to driver
  • using a system endpont

Frame 0
Transmit
Receive
Frame 7
EndPoint Miss
26
Communication under Load
gt Use of networking resources adapts to
demand. gt VIA (or improvements on it) need to
become widespread
27
Implicit Coscheduling
  • Problem parallel programs designed to run in
    parallel gt huge slowdowns with local scheduling
  • gang scheduling is rigid, fault prone, and
    complex
  • Coordinate schedulers implicitly using the
    communication in the program
  • very easy to build, robust to component failures
  • inherently service on-demand, scalable
  • Local service component can evolve.

28
Why it works
  • Infer non-local state from local observations
  • React to maintain coordination
  • observation implication action
  • fast response partner scheduled spin
  • delayed response partner not scheduled block

29
I/O Lessons from NOW sort
  • Complete system on every node powerful basis for
    data intensive computing
  • complete disk sub-system
  • independent file systems
  • MMAP not read, MADVISE
  • full OS gt threads
  • Remote I/O (with fast comm.) provides same
    bandwidth as local I/O.
  • I/O performance is very tempermental
  • variations in disk speeds
  • variations within a disk
  • variations in processing, interrupts, messaging,
    ...

30
Reactive I/O
  • Loosen data semantics
  • ex unordered bag of records
  • Build flows from producers (eg. Disks) to
    consumers (eg. Summation)
  • Flow data to where it can be consumed

Adaptive Parallel Aggregation
Static Parallel Aggregation
31
Performance Scaling
  • Allows more data to go to faster consumer

32
Driving Applications
  • Inktomi Search Engine
  • World Record Disk-to_Disk store
  • RSA 40-bit key
  • IRAM simulations, Turbulence, AMR, Lin. Alg.
  • Parallel image processing
  • Protocol verification, Tempest, Bio, Global
    Climate. . .
  • Multimedia Work Drove Network Aware Transcoding
    Services on Demand
  • Parallel Software-only Video Effects
  • TACC (transcoding) Proxy
  • Transcend
  • Wingman
  • MBONE media gateway

33
Transcend Transcoding Proxy
Service request
Front-end service threads
User Profile Database
Manager
Physical processor
Caches
  • Application provides services to clients
  • Grows/Shrinks according to demand, availability,
    and faults

34
UCB CSCW Class
Sigh no multicast, no bandwidth, no CSCW class...
Problem Enable heterogeneous sets of
participants to seamlessly join MBone sessions.
35
A Solution Media Gateways
  • Software agents that enable local processing
    (e.g. transcoding) and forwarding of source
    streams.
  • Offer the isolation of a local rate-controller
    for each source stream.
  • Controlling bandwidth allocation and format
    conversion to each source prevents link
    saturation and accommodates heterogeneity.

GW
GW
36
A Solution Media Gateways
Sigh no multicast, no bandwidth, no MBone...
AHA!
MBone
Media GW
37
FIAT LUX Bringing it all together
  • Combines
  • Image Based Modeling and Rendering,
  • Image Based Lighting,
  • Dynamics Simulation and
  • Global Illumination in a completely novel fashion
    to achieve unprecedented levels of scientific
    accuracy and realism
  • Computing Requirements
  • 15 Days of worth of time for development.
  • 5 Days for rendering Final piece.
  • 4 Days for rendering in HDTV resolution on 140
    Processors
  • Storage
  • 72,000 Frames, 108 Gigabytes of storage
  • 7.2 Gigs after motion blur
  • 500 MB JPEG
  • premiere at the SIGGRAPH 99 Electronic Theater
  • http//fiatlux.berkeley.edu/
Write a Comment
User Comments (0)
About PowerShow.com