First stop: Long Island. The BNL RHICUS ATLAS Tier 1 Computing Center - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

First stop: Long Island. The BNL RHICUS ATLAS Tier 1 Computing Center

Description:

... late 1990's to act as the tier 1 computing center for ATLAS in the United States. Ramping up resources provided to ATLAS: Data Challenge 2 (DC2) underway ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 19
Provided by: gridUc
Category:

less

Transcript and Presenter's Notes

Title: First stop: Long Island. The BNL RHICUS ATLAS Tier 1 Computing Center


1
First stop Long Island. The BNL RHIC/US ATLAS
Tier 1 Computing Center
SC2004 Pitsburgh PA 11 August, 2009
  • Christopher Hollowell Dantong Yu
  • Physics Department
  • Brookhaven National Laboratory

2
Facility Overview
  • Created in the mid 1990's to provide centralized
    computing services for the RHIC experiments
  • Expanded our role in the late 1990's to act as
    the tier 1 computing center for ATLAS in the
    United States
  • Ramping up resources provided to ATLAS Data
    Challenge 2 (DC2) underway
  • RHIC Run 5 scheduled to begin in late December
    2004

3
Facility Overview (Cont.)
  • Storage Resource
  • Disk storage and Tape storage
  • Computing Resource
  • Grid activities and network

4
Centralized Disk Storage
  • 37 NFS Servers Running Solaris 9 recent upgrade
    from Solaris 8
  • Underlying filesystems upgraded to VxFS 4.0
  • Issue with quotas on filesystems larger than 1 TB
    in size
  • 220 TB of fibre channel SAN-based RAID5 storage
    available added 100 TB in the past year

5
Centralized Disk Storage (Cont.)
  • Scalability issues with NFS (network-limited to
    70 MB/s max per server 75-90 MB/s max local
    I/O in our configuration) testing of new
    network storage models including Panasas and
    IBRIX in progress
  • Panasas tests look promising. 4.5 TB of storage
    on 10 blades available for evaluation by our user
    community. DirectFlow client in use on over 400
    machines
  • Both systems allow for NFS export of data

6
Centralized Disk Storage (Cont.)
7
Centralized Disk Storage AFS
  • Moving servers from Transarc AFS running on AIX
    to OpenAFS 1.2.11 on Solaris 9
  • The move from Transarc to OpenAFS motivated by
    Kerberos4/Kerberos5 issues and Transarc AFS end
    of life
  • Total of 7 fileservers and 6 DB servers 2 DB
    servers and 2 fileservers running OpenAFS
  • 2 Cells

8
Mass Tape Storage
  • Four STK Powderhorn silos provided, each with the
    capability of holding 6000 tapes
  • 1.7 PB data currently stored
  • HPSS Version 4.5.1 likely upgrade to version 6.1
    or 6.2 after RHIC Run 5
  • 45 tape drives available for use
  • Latest STK tape technology 200 GB/tape
  • 12 TB disk cache in front of the system

9
Mass Tape Storage (Cont.)
  • PFTP, HSI and HTAR available as interfaces

10
LINUX Farm
  • Farm of 1423 dual-CPU (Intel) systems
  • Added 335 machines this year
  • 245 TB local disk storage (SCSI and IDE)
  • Upgrade of RHIC Central Analysis Servers/Central
    Reconstruction Servers (CAS/CRS) to Scientific
    Linux 3.0.2 (updates) underway should be
    complete before next RHIC run

11
LINUX Farm (Cont.)
  • LSF (5.1) and Condor (6.6.6/6.6.5) batch systems
    in use. Upgrade to LSF 6.0 planned
  • Kickstart used to automate node installation
  • GANGLIA custom software used for system
    monitoring
  • Retiring 142 VA Linux 2U PIII 450 MHz systems
    after next purchase

12
LINUX Farm (Cont.)
13
CAS/CRS Farm (Cont.)
14
Grid Activities
  • Brookhaven planning on upgrading external network
    connectivity to OC48 (2.488 Gbps) from OC12 (622
    Mbps) to support ATLAS activity
  • DoE recently funded us to investigate advanced
    network technologies such as (G)MPLS and QoS.
  • ATLAS Data Challenge 2 jobs submitted via Grid3
  • GUMS (Grid User Management System)
  • Generates grid-mapfiles for gatekeeper hosts
  • In production since May 2004

15
Storage Resource Manager (SRM)
  • SRM middleware providing dynamic storage
    allocation and data management services
  • Automatically handles network/space allocation
    failures
  • HRM (Hierarchical Resource Manager)-type SRM
    server in production
  • Accessible from within and outside the facility
  • 350 GB Cache
  • Berkeley HRM 1.2.1

16
dCache
  • Provides global name space over disparate storage
    elements
  • Hot spot detection
  • Client software data access through libdcap
    library or libpdcap preload library
  • ATLAS PHENIX dCache pools
  • PHENIX pool expanding performance tests to
    production machines
  • ATLAS pool interacting with HPSS using HSI no
    way of throttling data transfer requests as of yet

17
Conclusion
  • USATLAS facility
  • Participating in effective Grid development,
    deployment, data challenge production.
  • Supporting required computing capacities to
    ATLAS.
  • Facility and staff continue to increase to meet
    LHC Tier1 computing requirements.
  • RHIC computing facility
  • Facility keeps growing (300 nodes are coming into
    house)
  • More experiments (STAR, PHENIX and PHOBOS)
    adopted Grid technology for their data
    production.

18
More Information
  • USATLAS
  • http//www.usatlas.bnl.gov
  • http//www.acf.bnl.gov
  • RHIC
  • http//www.rhic.bnl.gov/RCF
Write a Comment
User Comments (0)
About PowerShow.com