The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking - PowerPoint PPT Presentation

About This Presentation
Title:

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking

Description:

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking ... Cluster bought from a company (Megware), Beowulf type (1 master, 32 slaves) ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 20
Provided by: quev
Category:

less

Transcript and Presenter's Notes

Title: The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking


1
The High Performance Cluster for QCD
CalculationsSystem Monitoring and Benchmarking
  • Lucas Fernandez Seivane
  • quevedin_at_mail.desy.de
  • Summer Student 2002
  • IT Group, DESY Hamburg
  • Supervisor Andreas Gellrich
  • Oviedo University (Spain)

2
Topics
  • Some Ideas of QM
  • The QFT Problem
  • Lattice Field Theory
  • What can we get?
  • Approaches to the computing
  • lattice.desy.de
  • Hardware
  • Software
  • The stuff we made Clumon
  • Possible improvements

3
Lets do some physics
  • QM, real behavior of the world fuzzy world
  • Relativity means causality (cause must precede
    consequence!)
  • Any complete description of Nature must combine
    both ideas
  • The only consistent way of doing this is
    QUANTUM FIELD THEORY

4
The QFT Problem
  • Impossible to solve it exactly
  • PERTURBATIVE APPROACH
  • Necessity of small coupling constant (like ?em
    1/137)
  • Example QED (the strange theory of light and
    matter)
  • Taylor ?em?2em/2 ?3em/6

5
but for QCD
  • Not small coupling constant (at least at low
    energies)
  • We cannot explain (at least analytically) a
    proton!!!
  • We do need something exact (the LATTICE is EXACT)

6
Lattice field theory
  • Generic tool for approaching non perturbative QFT
  • But more necessary in QCD (non perturbative
    aspects)
  • Even pure theoretical interests (Wilson approach)

7
What can we get?
  • We are interested in the spectra (bound states,
    masses of particles)
  • We can do it by means of correlation functions
    if we could calculate them exactly, we would have
    solved the theory
  • They are extracted out of Path Integrals (foil1)
  • The problem is calculate Path Integrals
  • Lattice can calculate Path Integrals


8
A Naïve Approach
  • Discretize space-time
  • Monte-Carlo methods for choosing field
    configurations (Random generators)
  • Numerical evaluation of Path Integrals and
    correlation functions!!!
  • (typical lattice sizes a0.05-0.1 fm, 1/a
    2GeV, L32)
  • but

9
but
  • Huge computer power
  • Highly dimensional integrals
  • The calculation requires to compute the inverse
    of an infinite-dimensional matrix, which takes
    a lot of CPU time and RAM.
  • Thats why we need clusters, supercomputers or
    special machines (to divide the work)
  • The amount of data transferred is not so
    important, the deciding factor is the LATENCY of
    the network and the scalability above 1TFlops

10
How can we get it?
  • General Purpose Supercomputers
  • Very expensive
  • Rigid (difficult upgrades on hardware)
  • Fully customed parallel machines
  • Completely optimized
  • Only this use (difficult recycling)
  • Necessity of design, develop and build (or
    modify) the hard soft
  • Commodity clusters
  • Cheap PC components
  • Completely customizable
  • Easy to upgrade / recycle

11
Machines
  • Commercial Supercomputers
  • CrayT3E, Fujitsu VPP77, NECSx4, Hitachi SR8000
  • Parallel machines
  • APEmille/apeNEXT INFN/DESY
  • QCDSP/QCDOC CU/UKQCD/Riken
  • CP-PACS Tsukuba/Hitachi
  • Commodity clusters Fast Networking
  • Low latency (Fast Networking)
  • Fast Speed
  • Standard software and programming environments

12
Lattice cluster_at_DESY
  • Cluster bought from a company (Megware), Beowulf
    type (1 master, 32 slaves)
  • Before upgrade (some weeks ago)
  • 32 nodes IntelXEONP4 1.7GHz 256 KB cache
  • 1GB Rambus RAM
  • 2 ? 64 bit PCI slots
  • 18 GB SCSI hard disks
  • Fast Ethernet switch (normal networking, NFS
    disk mounting)
  • Myrinet network (low latency)
  • Upgrade (August 2002)
  • 16 nodes 2 IntelXEONP4 1.7GHz 256 KB cache
  • 16 nodes 2 IntelXEONP4 2.0GHz 512 KB cache

13
Lattice cluster_at_DESY(2)
  • Software SuSE Linux (modified by Megware)
  • MPICH-GM (implementation of MPI-CHamaleon for
    Myrinet GM system)
  • Megware Clustware (OpenSCE/SCMS modified) tool
    for monitoring and administration (but no logs)

14
Lattice cluster_at_DESY(3)
  • http//lattice.desy.de/cgi-bin/clumon/cgi_clumon.p
    l
  • Andreas Gellrich First Version
  • Provides logs and monitoring
  • Perl written (customizable)

15
Lattice cluster_at_DESY(4)
  • http//lattice.desy.de/cgi-bin/clumon/cgi_clumon.p
    l
  • Me and Andreas Gellrich new version
  • Also graphical data and another log measure
  • Uses MRTG to graph data

16
Clumon v2.0 (1)
17
Clumon v2.0 (2)
18
Work done (in progress)
  • Getting the flavor of a really high-perf cluster
  • Learning Perl (more or less) to understand
    Andreas tool
  • Playing around with Andreas tool
  • Search for how to graph this kind of data
  • Learning how to use MRTG/RRDtool
  • Some test and previous versions
  • Only have to do last retouches (polishing)
  • Time info of the cluster
  • Better documentation of the tools
  • Play around this last week with other stuff
  • Prepare talk and document and write up

19
Possible Improvements
  • The cluster is unplugged to AFS DESY
  • Need for Backups / Archiving of the Data stored
    (dCash theoc01)
  • Maybe reinstall the cluster with DESY Linux (to
    fully know whats in it)
  • Play around with other cluster stuff
  • OpenSCE, OSCAR, ROCKS
Write a Comment
User Comments (0)
About PowerShow.com