Scaling up at UTSWTACC: Facilitating use of TACCs Lonestar cluster by UTSW researchers - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Scaling up at UTSWTACC: Facilitating use of TACCs Lonestar cluster by UTSW researchers

Description:

With a few exceptions, these are coarse-grained, embarrassingly parallel (EP) or ... 2) DaliLite (Bong-Hyun Kim, Nick Grishin; Liisa Holm, U. of Helsinki, Finland) ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 10
Provided by: SJOH3
Category:

less

Transcript and Presenter's Notes

Title: Scaling up at UTSWTACC: Facilitating use of TACCs Lonestar cluster by UTSW researchers


1
Scaling up at UTSW/TACCFacilitating use of
TACCs Lonestar cluster by UTSW researchers
  • Stuart Johnson
  • TACC (embedded at UTSW)

2
UTSW applications so far
  • Working with roughly 10 different labs
  • Range of apps Refinement of structures from NMR
    data to protein structure comparison to electron
    microscopy image processing
  • With a few exceptions, these are coarse-grained,
    embarrassingly parallel (EP) or sequences of EP
    problems (SEP) problems in many cases the
    primary difficulties are software installation
    and shoveling data around the system
  • EP means no coordination between nodes is
    required to get the problem done (no CS papers in
    the P)

3
For these applications,scaling up involves
  • running on lots of nodes handle data movement,
    catch errors/problems with code, make sure RTE is
    accessible to every node in an efficient way
    but generally not so much about figuring out how
    to distribute the actual work
  • adapting to a shared resource move coding from
    an on-demand compute resource usage model
    (cluster in your office/department) to a shared
    machine model (scheduler, transient disk space,
    etc) many tricks here!

4
Applications(note UTSW contact code authors in
parentheses)
  • 1) ARIA/CNS (Kevin Gardner Axel BrĂ¼nger, et al.,
    Yale (CNS), M. Habeck, W. Rieping, J. Linge and
    M. Nilges, Institut Pasteur, Paris (ARIA) )
  • fitting protein structures to NMR data
  • Solution change on-demand coding style to batch
    scheduler-friendly implementation (a very simple
    approach)
  • currently in testing phase
  • (SEP,Python/FORTRAN/C, 10s of processors, ? Data)
  • 2) DaliLite (Bong-Hyun Kim, Nick Grishin Liisa
    Holm, U. of Helsinki, Finland)
  • Protein structure comparisons (36M pairs)
  • Solution master/slave C/MPI code to manage
    serial code runs, data movement, local/global
    storage, logging, RTE copy to /tmp
  • Running on Lonestar - 100,000 SUs
  • (EP, C/MPI/Python/Perl/C/FORTRAN, 128 processors,
    100GB output)

5
Applications(note UTSW contact code authors in
parentheses)
  • 3) Ruby/Helix (Masahide Kikkawa)
  • Electron microscopy image processing for helical
    structures
  • Solution use mpi_ruby to implement parallelism
    install,test,code up mpi/ruby example
  • currently in testing phase
  • (SEP,Ruby/MPI/mpi_ruby/Fortran/C, 10s of
    processors, ? Data)
  • 4) EMAN (Masahide Kikkawa Steve Ludtke, et al.,
    NCMI, Baylor College of Medicine)
  • Electron microscopy image processing for
    particles
  • Solution add LSF (TACC batch scheduler)
    compatibility to code, install
  • Currently in testing phase
  • (SEP, Python/C/FORTRAN, 10s of processors, ?
    Data)

6
Applications
  • 5) Altschuler/Wu Lab
  • Large scale systems biology - image processing,
    data reduction
  • Solution Matlab compiler deploy RTE and
    compiled applications
  • currently scaling up image segmentation phase
  • Probably looking at O(10,000) CPU hours for first
    end-to-end data analysis of an existing data set
  • (EP,MATLAB, 10s of processors, 350 GB input data)
  • 6) MoNET (Feng Luo, Richard Scheuermann)
  • Finding modularity in large(100K N/E) networks of
    interactions
  • Solution adapt code for the largest part of the
    problem from another research group collaborate
    on final code
  • currently in testing phase on parallel
    implementation
  • Tentatively looking at 0(1000) CPU hours /graph
  • (non-EP,C/MPI, ? of processors, ? Data)

7
Applications coming up
  • 7) Large sparse matrix eigenvalue problems (Feng
    Luo, Richard Scheuermann)
  • Data analysis tool for large data sets
  • Solution cobble together from parallel libraries
  • (non-EP, C/FORTRAN/MPI, ? processors, ? Data)
  • 8) Radiology imaging problems (Matthew Lewis)
  • (?,?, ? processors, ?Data)

8
General characteristics of applications
  • Large range of data TACC lt-gt UTSW movement
    requirements and TACC CPU usage requirements
  • One user needs to move O(10 TB)/year of data
    UTSW -gtTACC ( 1TB / experiment week)
  • Most users are anticipating at least 0(10K) CPU
    hours, some users will need at least O(100K) CPU
    hours
  • Most of these applications are community codes
    which are used by numerous research groups
  • Most users need to know HOW to scale up (or adapt
    to a shared resource) their application (s) and
    may need various coding approaches to accomplish
    this

9
General comments
  • Training users here need a diverse set of
    cluster use examples to get them started to a
    solution
  • Coarse grained computing with scripting languages
    to control workflow
  • Distributed and parallel MPI, Perl, Python, Ruby,
    MATLAB,C, FORTRAN
  • Interesting possibilities for grid (distrubuted,
    non-cluster) computing
  • Interesting possibilities for providing
    applications to multiple users via the internet -
    but not many lightweight applications
  • Bandwidth to the labs!
Write a Comment
User Comments (0)
About PowerShow.com