SCD Update - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SCD Update

Description:

IBM Linux Cluster lightning joined production pool. March 2005 ... Generators proved themselves again during outage March 13th , and Xcel power drops April 29 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 29
Provided by: TomBe
Category:
Tags: scd | outage | update

less

Transcript and Presenter's Notes

Title: SCD Update


1
SCD Update
  • Tom Bettge
  • Deputy Director
  • Scientific Computing Division
  • National Center for Atmospheric Research
  • Boulder, CO USA

User Forum 17-19 May 2005
2
NCAR/SCD
IBM Power4
1
50
100
IBM Power3
Position
150
200
250
300
350
1996 Procurement
Year
3
(No Transcript)
4
SCD Update
  • Production HEC Computing
  • Mass Storage System
  • Services
  • Server Consolidation and Decommissions
  • Physical Facility Infrastructure Update
  • Future HEC at NCAR

5
News Production Computing
  • Redeployed SGI 3800 as Data Analysis engine
  • chinook became tempest
  • departure of dave
  • IBM Power 3 blackforest decommissioned Jan 2005
  • Loss of 2.0 Tflops of peak computing capacity
  • IBM Linux Cluster lightning joined production
    pool
  • March 2005
  • Gain of 1.1 Tflops of peak computing capacity
  • 256 processors (128 dual node configuration)
  • 2.2 GHz AMD Opteron processors
  • 6 TByte FastT500 RAID with GPFS
  • 40 faster than bluesky (1.3 GHz POWER4) cluster
    on parallel POP and CAM simulations
  • 3rd party vendor compilers

6
Resource Usage FY04
  • At the end of FY04, the combined supercomputing
    capacity at NCAR was 11 TFLOPs
  • Roughly 81 of that capacity was used for
    climate simulation and analysis(Climate IPCC)

7
bluesky Workload by Facility
April 2005
8
Computing Demand
  • Science Driving Demand for Scientific Computing

Summer 2004 CSL Requests 1.5x
Availability Sept 2004
NCAR Requests 2x Availability Sept
2004 University Requests 3x Availability
March 2005 University Requests 1.7x
Availability
9
Computational Campaigns
  • BAMEX Spring 2003
  • IPCC FY 2004
  • MMM Spring Real-Time Forecasts Spring 2004
  • WRF Real-Time Hurricane Forecast Fall 2004
  • DTC Winter Real-Time Forecasts Winter 2004-2005
  • MMM Spring Real-Time Forecast Spring 2005
  • MMM East Pacific Hurricane Formation July 2005

10
bluesky 8-way
11
bluesky 32-way
12
Servicing the DemandNCAR Computing Facility
  • SCDs supercomputers are well utilized ...
  • ... yet average job queue-wait times are
    measured in hours (was minutes in 04), not days

April 2005 average
13
Average bluesky Queue-Wait Times (HHMM)
14
bluesky Queue Wait Times
  • blackforest removed
  • lightning charging did not start until March 1
  • Corrective (minor) actions taken
  • Disallow batch node_usageshared jobs
  • Increase utility of the share nodes (4 nodes,
    128 pes)
  • Shift the facility split (CSL/Community) from
    50/50 to 45/55
  • More accurately reflects the actual allocation
    distribution
  • Reduce premium charge from 2.0x to 1.5x
  • Encourage use of premium if needed for critical
    turnaround
  • Have reduced NCAR 30-day allocation limit from
    130 to 120
  • Matches other groups (leveled playing field)
  • SCD is watching closely

15
Average Compute Factor per GAU Charged
Jan 1 Feb 1 Mar 1 Apr 1
May 1 2005
16
Mass Storage System
17
(No Transcript)
18
Mass Storage System
  • Disk cache expanded to service files 100MB
  • 60 of files this size being read from cache,
    not tape mount
  • Deployment of 200GB cartridges (previous 60 GB)
  • Now over 500TB of data on these cartridges
  • Drives provide 3x increase in transfer rate
  • Full silo holds 1.2 PBs 5 silos
    hold 6 PBs of data
  • Users have recently moved to single copy class of
    service (motivated by GAU compute charges)
  • Embarking on project to address future MSS growth
  • Manageable growth rate
  • User management tools (identify, remove, etc)
  • User access patterns / User Education (archive
    selectively, tar)
  • Compression

19
SCD Customer Support
  • Consistent with SCD
  • Reorganization
  • Phased Deployment
  • Dec 2004 May 2005
  • Advantages
  • Enhanced service Computer Production Group 24/7
  • Effectively utilize other SCD groups in customer
    support
  • Easier questions handled sooner
  • Harder questions routed to correct group sooner
  • Feedback Plan

SCD will provide a balanced set of services to
enable researchers to easily and effectively
utilize community resources.
20
Server Decommissions
  • MIGS MSS access from remote sites
  • Decommission April 12, 2005
  • Other contemporary methods now available
  • IRJE job submittal to supers (firewall made
    obsolete)
  • Decommissioned March 21, 2005
  • Front-End Server Consolidation to single new
    server over next few months
  • UCAR front-end Sun server (meeker)
  • UCAR front-end Linux server (longs)
  • Joint SCD/CSS Sun computational server (k2)
  • SCD front-end Sun server (niwot)

21
Physical Facility Infrastructure Update
  • Chilled water upgrade continues
  • Brings cooling up to power capacity of data
    center.
  • Startup of new chiller went flawlessly on March
    15th
  • May 19-22 Last planned shutdown
  • Stand-By Generators proved themselves again
    during outage March 13th , and Xcel power drops
    April 29
  • Design phase of planning electrical distribution
    upgrades to be completed by late 2005
  • Risk assessment identified concerns about
    substation 3
  • Power to data center (station is near lifetime
    limit)
  • Additional testing completed Feb. 26th
  • Awaiting report

22
Future Plans for HEC at NCAR
23
SCD Strategic PlanHigh-End Computing
  • Within the current funding envelop, achieve a
  • 25-fold increase over current sustained computing
  • capacity in five years.
  • SCD intends as well to pursue opportunities
  • for substantial additional funding for
    computational
  • equipment and infrastructure to support the
  • realization of demanding institutional science
  • objectives.
  • SCD will continue to investigate and acquire
  • experimental hardware and software systems.
  • IBM BlueGene/L

1Q2005
24
SCD Target Capacity
25
Challenges in Achieving 2006-2007 Goals
  • Capability vs. Capacity
  • Costs (price performance)
  • Need/Desire for Capability Computing (define!)
  • Balance within center of capability and capacity.
    How?
  • NCAR/SCD fixed income
  • Business Plans
  • Evaluating Year 5 Option with IBM
  • Engaging vendors to informally analyze SCD
    Strategic Plan for HEC
  • Likely to enter year-long procurement for 4Q2006
    deployment of additional capacity and capability

26
Beyond 2006
  • Data Center Limitations / Data Center Expansion
  • NCAR center limits of power/cooling/space will be
    reached with 2006 computing addition
  • New center requirements have been
    compiled/completed
  • Conceptual Design for new center is near
    completion
  • Funding options being developed with UCAR
  • Opportunity of NSF Petascale Computing Initiative
  • Commitment to balanced and sustained investment
    in robust cyberinfrastructure.
  • Supercomputing systems
  • Mass storage
  • Networking
  • Data Management Systems
  • Software Tools and Frameworks
  • Services and Expertise
  • Security

27
Scientific Computing DivisionStrategic
Plan2005-2009
to serve the computing, research and data
management needs of atmospheric and related
sciences.
www.scd.ucar.edu
28
Questions
Write a Comment
User Comments (0)
About PowerShow.com