NERSC Site Report - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

NERSC Site Report

Description:

PDSF is the production Linux cluster at NERSC used ... 208 nodes added - 16 way Nighthawk II. Additional 20 TB of disk. Total System. 10 Tflops/s peak ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 14
Provided by: min6151
Category:
Tags: nersc | nighthawk | report | site

less

Transcript and Presenter's Notes

Title: NERSC Site Report


1
NERSC Site Report
  • HEPiX
  • October 20, 2003
  • TRIUMF

2
LBL, NERSC, and PDSF
  • LBL manages the
  • NERSC Center for DOE
  • PDSF is the production Linux cluster at NERSC
    used primarily for HEP science
  • Site report will touch on activities of interest
    to HEPiX community at each of these levels

3
PDSF - New Hardware
  • 96 Dual Athlon Systems
  • 8 Storage Nodes - 18 TB formatted
  • All gigabit attached (Dell switches)
  • Purchased two Opteron systems for testing

4
PDSF Projects
  • HostDB - Presentation later
  • Sun GridEngine Evaluation
  • Met all requirements (long list)
  • Putting in semi-production on retired nodes
  • Grid certificate DN kernel module
  • 1-wire based monitoring and control network
  • High Availability Server
  • Uses heartbeat code
  • IDE based Fibre-Channel array

5
PDSF - Other news
  • Aztera
  • Zambeel folded
  • StorAd is making best effort to support the
    system
  • New User Groups
  • KamLAND
  • e896
  • ALICE

6
IBM SP
  • Upgraded
  • 208 nodes added - 16 way Nighthawk II
  • Additional 20 TB of disk
  • Total System
  • 10 Tflops/s peak
  • 7.8 TB memory
  • 44 TB of GPFS storage

7
Mass Storage
  • Hardware
  • New DataDirect disk cache
  • New tape drives allow high capacity cartridges
    (200 GB)
  • Software
  • Currently running HPSS 4.3
  • Testing 5.1
  • Testing
  • DMAPI
  • htar command

8
Grid Activities
  • GridFTP and gatekeeper deployed on all
    productions system (except gatekeeper on Seaborg
    which is coming soon)
  • Integrating account management system with grid
    certificates
  • Testing myproxy based system
  • Portal
  • Web interface to HPSS

9
Networking
  • Jumbo support to ESNET
  • Looking for other sites to test Jumbo across WAN
  • New production router (Juniper)

10
GUPFS
  • Hardware testbed
  • 3Par Data
  • Yotta Yotta
  • Dell EMC
  • Dot Hill
  • Data Direct (Soon)
  • Panasas
  • Interconnect hardware
  • Topspin (IB)
  • Infinicon (IB)
  • Cisco (ISCSI)
  • Qlogic (ISCSI)
  • Adaptec (ISCSI)
  • Myrinet 2000
  • Various FC
  • Filesystems
  • ADIC license
  • GPFS license
  • GFS 5.2 license
  • Lustre
  • Test clients
  • Dual processor 2.2GHz Xeons
  • 2GB memory
  • 2 PCI-X
  • Local HD for OS

11
Distributed System Dept.
  • Net100 (http//www.net100.org/) - Built on Web100
    (PSC, NCAR, NCSA) and NetLogger (LBNL), Net100
    modifies operating systems to respond dynamically
    to network conditions and make adjustments in
    network transfers, sending data as fast as the
    network will allow.
  • Self Configuring Network Monitor (SCNM) -
    (http//dsd.lbl.gov/Net-Mon/Self-Config.html)
    provide accurate, comprehensive, and on-demand,
    application-to-application monitoring
    capabilities throughout the interior of the
    interconnecting network domains.

12
Distributed Systems (contd)
  • Netlogger (http//www-didc.lbl.gov/NetLogger/)
  • pyGlobus (http//dsd.lbl.gov/gtg/projects/pyGlobus
    /) Python interface to the Globus Toolkit.
    LIGO gravity wave experiment is using it to
    replicate TB/day data around the US with the LIGO
    Data Replicator (http//www.lsc-group.phys.uwm.edu
    /LDR/)
  • DOEGrids.org PKI for the DOE science community,
    part of federation supporting international
    scientific collaborations

13
Repaired Hardware
  • System from 2000 wide spread failure (half of 90
    systems)
  • Had broken systems inspected by LBL Electronics
    Shop
  • Discovered 4 bad capictors (2)
  • Prepd systems can be repaired for 20/board
  • 16 systems repaired so far
  • Plan to eventually repair all system from batch
Write a Comment
User Comments (0)
About PowerShow.com