SimMillennium and Beyond From - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

SimMillennium and Beyond From

Description:

CITRIS Cluster 1: 3/2002 deployment (Intel Donation) ... CITRIS Cluster 2: deployment (Intel Donation) ~128 Dell McKinley class Duals: 256 processors ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 23
Provided by: DavidE7
Category:

less

Transcript and Presenter's Notes

Title: SimMillennium and Beyond From


1
SimMillennium and BeyondFrom Computer Systems,
Computational Science and Engineering in the
Large to petabyte stores
  • David Culler,
  • NSF Site Visit
  • March 5, 2003

2
SimMillennium Project Goals
  • Vision To work, think, and study in a
    computationally rich environment with deep
    information stores and powerful services
  • Enable major advances in Computational Science
    and Engineering
  • Simulation, Modeling, and Information Processing
    becoming ubiquitous
  • Explore novel design techniques for large,
    complex systems
  • Fundamental Computer Science problems ahead are
    problems of scale
  • Organized in concert with Univ. structure gt
    computational economy
  • Develop fundamentally better ways of
    assimilating and interacting with large volumes
    of information and with each other
  • Explore emerging technologies
  • networking, OS, devices

3
Research Infrastructure We Built
  • Cluster of Clusters (CLUMPS) distributed over
    multiple departments
  • gigabit ethernet within and between
  • Myrinet High speed interconnect
  • Vineyard Cluster System Architecture
  • Rootstock remote cluster installation tools
  • Ganglia remote cluster monitoring
  • GEXEC remote execution, GM (Myricom) messaging,
    MPI
  • PCP parallel file tools
  • collection of port daemons, tools to make it all
    hand together
  • Gigabit to desktop, immersadesk, ...

4
(No Transcript)
5
Cluster Counts
  • Millennium Central Cluster
  • 99 Dell 2300/6400/6450 Xeon Dual/Quad 336
    processors
  • Total 238 GB memory, 2 TB disk
  • Myrinet 2000 1000Mb fiber ethernet
  • Millennium Campus Clusters (Astro, Math, CE, EE,
    Physics, Bio)
  • 176 proc, 34 GB mem, 1.2 TB local disk
  • total 512 proc, 292 GB mem, 3.2 TB scratch
  • NPACI ROCKS Cluster
  • 8 proc, 2 GB mem, 36 GB
  • OceanStore/ROC cluster
  • PlanetLab Cluster
  • 6 prc, 1.32 GHz, 3 GB mem, 180 GB
  • CITRIS Cluster 1 3/2002 deployment (Intel
    Donation)
  • 4 Dell Precision 730 Itanium Duals 8 processors
  • Total 8GB memory, 128GB disk
  • Myrinet 2000 1000Mb copper ethernet (SimMil)
  • CITRIS Cluster 2 deployment (Intel Donation)
  • 128 Dell McKinley class Duals 256 processors
  • 16x2 installed

6
Cluster Top Users 2/2003
http//ganglia.millennium.berkeley.edu
  • 800 users total on central cluster
  • 84 major users for 2/2003 average 62 total CPU
    utilization
  • ROC middle tier storage layer
    testing/performance (bling,ach,fox_at_stanford)
  • Computer Vision Group image recognition,
    boundary detection and segmentation, data mining
    (aberg,lwalk,dmartin,ryanw, xren) 2 hours on
    cluster vs. 2 weeks on local resources
  • Computational Biology Lab - large-scale
    biological sequence database searches in parallel
    (brenner_at_compbio)
  • Tempest - TCAD tools for Next Generation
    Lithography (yunfei)
  • Internet services performance characteristics
    of multithreaded servers (jrvb,jcondit)
  • Sensor Networks power reduction (vwen)
  • Economic modeling (stanton_at_haas)
  • Machine learning information retrieval, text
    processing (blei)
  • Analyzing trends in BGP routing tables (sagarwal,
    mccaesar)
  • Graphics - Optical simulation and high quality
    rendering (adamb, csh)
  • Digital Library Project image retreival by
    image content (loretta)
  • Bottleneck Analysis of Fine-grain Parallelism
    (bfields)
  • SPUR Earthquake simulation (jspark_at_ce)
  • Titanium compiler and runtime system design for
    high performance parallel programming languages
    (bonachea)
  • AMANDA neutrino detection from polar ice core
    samples (amanda)

7
Impact
  • Numerous groups doing research they could not
    have done without it
  • Malik photorealistic rendering, physics
    simulation,..
  • Yelick, Titanium, Heart Modeling, ...
  • Wilensky, Digital Library, image segmentation
  • Brewer, Culler, Ninja Internet Service Arch...
  • Price, AMANDA, ...
  • Kubiatowicz, OceanStore, Katz, Sahara,
    Hellerstein PIER
  • First eScience Portals
  • Tempest, EUV lithography, Sugar MEMS simulation
    services
  • safe.millennium.berkeley.edu on Sept 11
  • built w/i hours, scaled to million hits per day
  • CS267 core of MS of computation science X
  • Cluster tools widely adopted
  • NPACI ROCKS
  • Ganglia the most downloaded cluster tool, in all
    the distributions, OSCAR, open source development
    team

8
Computational Economy
  • Developed economic-based resource allocation
  • decentralized design
  • interactive and batch
  • Advanced the SOA
  • controlled experiments with priced and unpriced
    clusters
  • analysis of utility gain relative to traditional
    resource allocation algorithms
  • Picked up in several other areas
  • index pricing internet bandwidth
  • iceberg pricing in telco/internet merge
  • core to internet design for planetary scale
    services

9
Emergence of Planetary-Scale Services
  • In past year Millennium became THE simulation
    engine for P2P
  • oceanstore, I3, Sahara, BGP alternatives, PIER
  • Ganglia was the technical enabler for planetlab
  • gt 100 machines at gt 50 sites in gt 8 countries
  • THE testbed for internet-scale systems research

10
Fundamental Bottleneck Storage
  • Current storage hierarchy
  • based on NPACI reference
  • 3 TB local /scratch and /net/MMxx/scratch 4-day
    deletion
  • 0.5 TB global NFS /work 9-day deletion
  • inadequate BW and capacity
  • 4 TB /home and /project
  • uniform naming through automount
  • doesnt scale to cluster access
  • gt augment capacity, BW, and metadata BW
  • weve been tracking cluster storage options since
    xFS on NOW and Tertiary Disk in 1995.

11
Another Cluster a storage cluster
Millennium Clusters
Scalable GigE Core
Massive Storage Clusters
Myrinet SAN
Citris Clusters
Designed for higher reliability Avoid competition
from on-going computation Local disks heavily
used as scratch
12
Initial Cluster Design with 3.5TB Distributed
File Store
Myrinet 2000
2 Frontend Nodes
Foundry 8000
Campus Core
2
2
1TFlop 1.6TB memory 128 Dual Itanium 2 Compute
Nodes
Foundry 1500
128
128
Foundry 8000
4 Storage Controller 2 MetaServers
6
6
4
1 Gigabit Ethernet
Myrinet
3.5TB Fibre Channel Storage
Fibre Channel
13
Initial 3.5 TB Cluster Data Store
Meta Server
Meta Server
864GB
864GB
864GB
864GB
Storage Controller
Storage Controller
Storage Controller
Storage Controller
BlueARC si8300 with 24 36GB 15K rpm disks and
growth room
36GB 15K rpm
fibre channel
gbit ethernet
myrinet
14
Lustre A High-Performance, Scalable, Distributed
File System for Clusters and Shared-Data
Environments
  • Progress since xFS
  • TruCluster, GPFS, pvfs, ...
  • need production quality
  • NAS is finally here
  • History CMU, Seagate, Los Alamos, Sandia,
    TriLabs
  • Distributed Filesystem replacing NFS
  • Object based file storage
  • object like inode represents a file
  • Opensource development managed by Cluster File
    Systems, Inc.
  • Gaining wide acceptance for production
    high-performance computing
  • PNNL and LLNL
  • Los Alamos and Sandia Labs
  • HP support as part of linux cluster effort
  • Intel Enterprise Architecture Lab

15
Lustre Key Advantages
  • Open protocols, standards Portals API, XML, LDAP
  • Runs on commodity PC hardware 3rd party OST
  • such as BlueArc
  • Uses commodity filesystems on OSTs
  • such as ext3, JFS ReiserFS and XFS
  • Scalable and efficient design splits
  • (qty 2) Metadata servers storing file system
    metadata
  • (up to 100) Object storage targets storing files
  • To support up to 2000 clients
  • Flexible model for adding new storage to existing
    Lustre file system.
  • Metadata server failover

16
Lustre Functionality
recovery, file status, file creation
Meta Servers (Meta Data Servers)
Storage Controllers (Object Storage Targets)
system and parallel file I/O, file locking
directory metadata and concurrency
Clients
17
Growth Plan
  • based on conservative 50 per year density
  • expect roughly double

35 TB 8 SS 3 MS
23 TB 8 SS 3 MS
14 TB 8 SS 3 MS
8 TB 6 SS 3 MS
3.5 TB 4 SS 2 MS
y03
y04
y05
y06
y07
18
Example Projects
  • Cluster monitoring trace
  • ¼ TB per year for 300 nodes
  • ROC failure data
  • ¼ TB per year, much higher if get industrial
    feeds
  • Digital Library
  • Video
  • 100 GB/hour uncompressed
  • Vision
  • 100 GB per experiement
  • PlanetLab
  • internet wide instrumentation and logging

We will look back and say, we are doing
research today that we could not have done
without this
19
End of the Tape Era
20
Emergence of the Sensor Net Era
  • 100s of research groups and companies using the
    Berkeley Mote / TinyOS platform
  • dozens of projects on campus
  • billions of networked devices connected to the
    physical world constantly streaming data
  • gt start building the storage and processing
    infrastructure for this new class of system today!

21
Environment Monitoring Experience
  • Canonical patch net architecture
  • live historical readings www.greatduckisland.net
  • 43 nodes, 7/13-11/18
  • above and below ground
  • light, temperature, relative humidity, and
    occupancy data, at 1 minute resolution
  • gt1 million measurements
  • Best nodes 90,000
  • 3 major maintenance events
  • node design and packaging in harsh environment
  • -20 100 degrees, rain, wind
  • power mgmt and interplay with sensors and
    environment

22
Sample Results
Node Lifetime and Utility
Effective Communication Phase
Packet Loss Correlation
Write a Comment
User Comments (0)
About PowerShow.com