QCDOC: Project Status and First Results - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

QCDOC: Project Status and First Results

Description:

... 33 3f3f33ff3f 3f3f 3 3 33 f3 3 3 3333f3 33 3 3 33 f3 3 3 ff3fff ff ... 3 33 3f 3 3 3 f f3 ff f f f 3 f 3 f 3 f 3f 3 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 44
Provided by: norman71
Category:
Tags: qcdoc | first | kal | project | results | status

less

Transcript and Presenter's Notes

Title: QCDOC: Project Status and First Results


1
QCDOC Project Status and First Results
SciDAC 2005 June 30, 2005
Norman H. Christ Columbia University
2
Outline
  • QCDOC Computer
  • Architecture
  • Construction
  • Software
  • Performance
  • SciDAC component
  • First QCDOC results
  • Overview of Lattice QCD
  • Current emphasis
  • Control/reduce errors
  • Extend reach
  • New ideas
  • Symanzik improvement
  • Chiral fermions
  • Targeted QCD Computers

3
Review of Lattice QCD
  • Introduce a space-time lattice.
  • Perform the Euclidean Feynman path integral.
  • Precise non-perturbative formulation.
  • Capable of numerical evaluation.
  • Evaluate using Monte Carlo, importance sampling,
    with hybrid molecular dynamics/Langevin
    evolution.
  • Use space-time formulation directly easily
    mounted on a parallel computer.

4
Physics Underlying QCD
  • Quarks interact by simple gluon exchange.
  • Same geometrical beauty as EM.
  • With the 1 accurate neglect of
    electromagnetism, this treatment is exact!
  • Interaction energy generates 99 of known mass in
    the Universe.
  • Should explain all of nuclear physics.
  • Must be mastered if the underlying properties of
    quarks are to be learned from experiment.

Quark/anti-quark p meson
5
Sources of Error
6
Finite Lattice Spacing Errors
  • Computational costs rise rapidly as a ? 0
    1/a8
  • Space-time volume 1/a4
  • Dirac operator inversions 1/a
  • Critical slowing down 1/a
  • Molecular dynamics time step 1/a2
  • Symanzik improvement (Runge-Kutta for field
    theory)
  • Represent O(an) lattice theory errors by higher
    dimension operators in effective continuum
    theory.
  • Adjust irrelevant lattice operators to make
    c(n)i 0.

7
Chiral Fermions
  • Domain wall fermions most thoroughly explored.
  • 5-D theory with 4-D, chiral surface states.

8
Residual Chiral Symmetry Breaking
  • Finite Ls produces residual chiral symmetry
    breaking
  • Size of mres depends on the roughness of the
    gauge field

9
QCD Machines?
  • Regularity of lattice QCD makes parallelization
    easy, reduces network cost.
  • Vanishing I/O and small memory needs allow
    economical configuration.
  • Simple, fundament character of the theory and
    stability of the numerical formulation encourage
    hardware effort.

10
QCD Machines!
1985 1987 1989
1998 2005
16 Mflops 64 Mflops
3.2Gflops 64 Gflops
11
Columbia QCD Machines
256-node, 16 Gflops (1989)
16-node, 0.256 Gflops (1985)
8192-node, 0.4 Tflops QCDSP machine (1998)
12
QCDOC Goals
  • Massively parallel machine capable of strong
    scaling use many nodes on a small problem.
  • Large inter-node bandwidth.
  • Small communications latency.
  • 1/sustained Mflops cost/performance.
  • Low power, easily maintained modular design.

13
QCDOC Collaboration
  • UKQCD (PPARC)
  • Peter Boyle
  • Mike Clark
  • Balint Joo
  • RBRC (RIKEN)
  • Shigemi Ohta
  • Tilo Wettig
  • IBM
  • Dong Chen
  • Alan Gara
  • Design groups
  • Yorktown Heights, NY
  • Rochester, MN
  • Raleigh, NC
  • Columbia (DOE)
  • Norman Christ
  • Saul Cohen
  • Calin Cristian
  • Zhihua Dong
  • Changhoan Kim
  • Ludmila Levkova
  • Sam Li
  • Xiaodong Liao
  • HueyWen Lin
  • Guofeng Liu
  • Meifeng Lin
  • Robert Mawhinney
  • Azusa Yamaguchi
  • BNL (SciDAC)
  • Robert Bennett
  • Chulwoo Jung
  • Konstantin Petrov
  • Stratos Efstathiadis

14
QCDOC Architecture
  • IBM-fabricated, single-chip node.
  • 50 million transistors, 5 Watt, 1.3cm x
    1.3cm
  • Processor
  • PowerPC 32-bit RISC.
  • 64-bit, 1 Gflops floating point unit.
  • Memory/node 4 Mbyte (on-chip) lt2 Gbyte
    DIMM.
  • Communications network
  • 6-dim, supporting lower dimensional partitions.
  • Global sum/broadcast functionality.
  • Multiple DMA engines/minimal processor overhead.
  • Ethernet connection to each node booting, I/O,
    host control.
  • 7-8 Watt/node, 15 in3 per node.

15
Network Architecture
  • Red boxes are nodes.
  • Blue boxes mother boards.
  • Red lines are communications links.
  • Green lines are Ethernet connections.
  • Green boxes are Ethernet switches.
  • Pink boxes are host CPU processors.

16
Mesh geometry
  • N0N1N2 mother boards are wired as a N0 x N1 x
    N2 torus.
  • With 26 nodes on a mother board, the resulting
    machine is a 2N0 x 2N1 x 2N2 x 2 x 2 x 2,
    six-dimensional torus.
  • Use the extra two dimensions to create
    lower-dimensional torii.
  • qpartition_remap -X01 -Y23 -Z4 -T5 maps the six
    machine dimensions (0-5) into four physical
    dimensions automatically.

4x4x2 (machine) ? 32 (physics)
17
QCDOC Chip
  • 50 million transistors, 0.18 micron, 1.3 x 1.3 cm
    die, 5 Watt

18
Software Environment
  • Lean kernel on each node
  • Protected kernel mode and address space.
  • RPC support for host access.
  • NFS access to NAS disks (/pfs).
  • Normal Unix services including stdout and stderr.
  • Threaded host daemon
  • Efficient performance on 8-processor SMP
    host.
  • User shell (qsh) with extended commands.
  • Host file system (/host).
  • Simple remapping of 6-D machine to (6-n)-D torus.
  • Programming environment
  • POSIX compatible, open-source libc.
  • gcc and xlc compilers
  • SciDAC standards
  • Level-1, QMP protocol
  • Level-2 parallelized linear algebra, QDP QDP.
  • Efficient level-3 inverters
  • Wilson/clover
  • Domain wall fermions
  • ASQTAD
  • p4 (underway)

19
Daughter board (2 nodes)
20
BNL constructed test jig
21
Mother board (64 nodes)
22
Edge view of mother board
23
Single mother board test jig
24
512-Node Machine
25
First 4 racks installed at Columbia
26
UKQCD Machine (12,288 nodes/10 Tflops)
27
SciDAC Role
  • SciDAC-supported community-wide software support.
  • Postdocs at Universities.
  • New staff at National Labs.
  • Some management support.
  • Software coordinating committee.
  • Software structure defined and much code written
  • Level 3 high performance inverters, tailored for
    QCDOC and clusters.
  • Level 2 Parallel linear algebra routines needed
    for LGT.
  • Level 1
  • Single-node, optimized linear algebra routines.
  • QMP message passing protocol (MPI like)
  • Supports efficient nearest-neighbor transfers.
  • Includes efficient use of QCDOC hardware.

28
SciDAC Software Project
UK Peter Boyle
Balint Joo (Mike Clark)
Software coordinating committee
29
SciDAC Pay-Off
  • Funding for a common effort encouraged
    unprecedented collaboration.
  • Solid software preparation permitted a compelling
    case to be made to HEP/NP for significant program
    funding.
  • New multi-year DOE program support
  • 5 Tflops QCDOC installed at BNL.
  • Continuing multi-Teraflops investment in clusters
    starting FY06.
  • U.S. LGT community resources increased 10X.
  • Major science advances in the next 1-2 years.

30
Brookhaven Installation
  • DOE (left) and RBRC (right) 12K-node QCDOC
    machines

31
Project Status
  • UKQCD 13,312 nodes --5.2M 3-5 Tflops
    sustained.
  • Installed in Edinburgh 12/04.
  • Running production at 400 MHz.
  • RBRC 12,288 nodes -- 5M 3-5 Tflops
    sustained.
  • Installed at BNL 2/05.
  • Running production at 400 MHz..
  • DOE 12,288 nodes -- 5.1M 3-5 Tflops
    sustained
  • Installed at BNL 4/05.
  • 1/3 being debugged.
  • 2/3 performing physics tests.

32
ASQTAD Performance
33
Application Performance (double precision)
1024-node machine
4096-node machine (UKQCD)
34
QCDOC First Results
  • Given the difficulty of QCD and our ambitious
    goals, important first results will require 6
    months to 1 year.
  • Topics now being pursued
  • QCD thermodynamics, study of quark gluon plasma
    (Bielefeld/Brookhaven/Columbia/RBRC).
  • Dynamical, 21 flavor, staggered fermions
    (Asqtad)
  • (MILC and UKQCD collaborations)
  • Dynamical, 21 flavor, domain wall fermions
  • JLab/UKQCD algorithm development.
  • RBC/UKQCD large-scale simulation.

Monte Carlo samples will be used for many
cutting-edge projects.
35
RBC Collaboration
  • Columbia
  • Michael Cheng
  • Norman Christ
  • Saul Cohen
  • Changhoan Kim (Southampton)
  • Ludmila Levkova (Indiana)
  • Meifeng Lin
  • HueyWen Lin
  • Oleg Loktik
  • Robert Mawhinney
  • Samuel Shu
  • Azusa Yamaguchi (Glasgow)
  • RBRC
  • Yasumichi Aoki (Wuppertal)
  • Tom Blum
  • Chris Dawson
  • Taku Izubuchi (Kanazawa)
  • Yukio Nemoto
  • Jun-Ichi Noaki (Southampton)
  • Kostas Orginos (MIT)
  • Norikazu Yamada
  • Takeshi Yamazaki
  • BNL
  • Frederico Berruto
  • Michael Creutz
  • Jack Laiho (Fermilab)
  • Peter Petreczki
  • Konstantin Petrov
  • Sas Prelovsek
  • Amajit Soni

36
Edinburgh/UKQCD
  • Edinburgh
  • David Antonio
  • Kenneth Bowler
  • Peter Boyle
  • Michael Clark
  • Balint Joo
  • Anthony Kennedy
  • Richard Kenway
  • Christopher Maynard
  • Robert Tweedie
  • Glasgow
  • Azusa Yamaguchi
  • Southampton
  • Changhoan Kim
  • Jun-Ichi Noaki

37
Simulations with Dynamical DWF(RBC)
  • Improved algorithms give 2-4x speed-up.
  • 2002-2004 on QCDSP
  • Algorithm development and interesting physics.

Evolution of topological charge
2-flavor, 163 x 32, Ls12, 1/a1.7 GeV
38
BK Results
39
Large DWF Simulations
  • First parts of QCDOC used to explore
  • lattice spacing
  • action
  • quark mass.
  • Joint RBC/UKQCD effort.
  • Extensive initial studies performed.

40
Exploratory Runs this Springpreliminary 163 x
32, Ls 8
41
Large DWF Simulations
  • These studies determined initial parameter
    values
  • Iwasaki gauge action
  • mstrange a 0.04
  • mud a 0.01, 0.02, and 0.03
  • b 2.13 ? 1/a 1.8 GeV
  • These three runs on large, 243 x 64 lattices are
    now underway at BNL and Edinburgh.

42
First 40 trajectories 243 x 64, Ls16
Evolution of the gauge action. The upper graph is
from Edinburgh and the lower from Brookhaven
43
Outlook
  • New QCDOC machines offer gt10x capability.
  • SciDAC software support
  • Convenient tools boost efficiency.
  • Application-specific communications interface.
  • High-level staff to support/evolve software base.
  • Close UK US collaboration.
  • Expect important results with a major impact on
    high energy and nuclear physics .
Write a Comment
User Comments (0)
About PowerShow.com