Tools and Libraries for ManyCore - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Tools and Libraries for ManyCore

Description:

texts introducing threads include a race in their example ... Threads will be'fixed' ... Very first thing needed: thread private variables (by default) for ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 10
Provided by: willia540
Category:

less

Transcript and Presenter's Notes

Title: Tools and Libraries for ManyCore


1
Tools and Libraries for ManyCore
  • William Groppwww.mcs.anl.gov/gropp

2
Correctness
  • Yesterday, I missed the first part of the
    workshop
  • Uniteds computers were down 2 hours, stranding
    aircraft system-wide
  • We need to make it harder to write incorrect
    programs
  • It isnt enough to make it easy to write correct
    programs
  • (Tim Mattsons Cognitive Dimension 3)
  • Shared Memory (the programming model) is
    particularly challenging
  • Few users understand the interaction of their
    programming language with shared memory
  • E.g., even if they know about volatile, they
    dont know about write barriers/fences
  • Many (Most?) texts introducing threads include a
    race in their example
  • Convenience of global access is paired with
    complexity of ensuring atomic updates

3
Some Answers to Questions to the Panel
  • What will Multicore do to the ecosystem?
  • Concurrency will be taught to CS students
  • Threads will befixed
  • Users will demand tools for making data races
    less likely and easier to find (by meeting the
    system half way - dont require complete
    automation)
  • How will programming models change?
  • Expose concurrency
  • Enable dynamic resources (e.g., through
    virtualization)
  • Face reality on locality
  • Face reality on the difficulty in writing correct
    shared-memory programs
  • Is it already here (CTM/CUDA)? (Ack!)
  • Memory Hierarchy
  • Note David Pattersons sparse mat-vec example
    yesterday. After autotunig, the fraction of peak
    was still terrible - but it was nearly optimal
    for those systems (they were starved for memory
    bandwidth).
  • Is a single programming model the right solution?
  • Thats easy. No
  • Too late for many applications - theyre already
    written
  • Everyone wants the simple programming model that
    does their problem well - the union of these is
    large and the intersection is null

4
Two Examples
  • An application for (small) multicore - The MPICH2
    implementation of MPI
  • What tools did we need?
  • What gaps were there in the language support?
  • What was hard?
  • A success in parallel programming
  • The problem with MPI is not that it is message
    passing
  • Its what MPI does not do help the user with
    distributed data structures
  • The search for a universal, efficient description
    of distributed data structure is doomed to
    failure
  • What has worked is support for important classes
    of data structures
  • What is missing is an easy way to add to those
    classes
  • And one way to move forward

5
Building a Demanding Multicore Application
  • Consider an MPI implementation on multicore -
    challenges in achieving performance
  • Performance counters for memory operations
  • How will these be coordinated in multicore?
  • They must be associated with code, not hardware
    (unless code is associated with particular
    hardware)
  • Atomic operations for shared structure updates
  • Need a signaling model for data ready (poll,
    locks, ready bits, etc)
  • If coarse-grained, process may be ok, any elegant
    model allowed
  • If fine-grained, need a low-cost, low overhead
    sharable mechanism
  • Polling is required if multiple completion
    mechanisms in use
  • Libraries and tools need to have a common, low
    cost, low impact completion mechanism
  • Need a way to work with the host
    language/compiler to make reliable use of these
    instructions asm isnt it.
  • If all done well, you can get the results on the
    next slide

6
MPI on Multicore Processors
  • Work of Darius Buntinas and Guillaume Mercier
  • 340 ns MPI ping/pong latency
  • More room for improvement (but will require
    better software engineering tools)

7
MPI Threads
  • Threads are a terrible programming model
  • Global common variables make analysis hard,
    errors easy
  • Very first thing needed thread private variables
    (by default) for globals used to support common
    performance monitoring and task state
  • Update model for data is too low level
  • Transactional memory is interesting, but can it
    work for arbitrary sections of code? If not,
    what are the limits? What are the performance
    consequences for contention?
  • Null test Can you efficiently implement
    shared-memory barrier, broadcast, and reduce
    algorithms?
  • How does the user describe thread usage
  • Busy threads (compute)
  • Latency hiding threads (e.g., for distant I/O,
    remote memory access/update)
  • Service threads (e.g., for RMA passive-target
    update)

8
Separation of Concerns User Code/PETSc Library
9
Domain Specific Parallel Programming
  • PETSc is a scalable, parallel numerical library
    for the solution of linear and nonlinear
    equations arising from PDEs 100s of users
  • A a complete 2-d Poisson solver in PETSc takes
    only two pages (7 slides) of code. Features of
    this solver include
  • Fully parallel
  • 2-d decomposition of the 2-d mesh
  • Linear system described as a sparse matrix user
    can select many different sparse data structures
  • Linear system solved with any user-selected
    Krylov iterative method and preconditioner
    provided by PETSc, including GMRES with ILU,
    BiCGstab with Additive Schwarz, etc.
  • Complete performance analysis built-in
  • Did I say only 7 slides of code!
  • Key is to define the correct abstractions and
    provide efficient implementations
  • There are numerous places where the library
    approach is awkward or inefficient

10
Program Annotation Tools
  • How can we make rapid progress on
    programmability?
  • Use annotations to augment existing languages
  • Not a new approach used in HPF, OpenMP
    Aspect-oriented programming, others
  • Some applications already use this approach for
    performance portability
  • Annotations do have limitations
  • A standard framework for annotating source code
    that can invoke third party transformation
    tools can enable faster progress
  • Creates an annotation ecosystem to spur
    evolution of improved tools
  • Provides a uniform approach for applications
  • Autotuners become a component in this ecosystem
  • A key is to make the compiler a partner in the
    dialog with the rest of the ecosystem
  • Enable the creation of efficient, domain-specific
    abstractions
  • Let the expert programmers build tools for all
    programmers to use
  • We need to avoid the artificial boundaries
    between languages, environments, and code
    transformation tools

11
Final Comments
  • Please Remember Correctness!
  • A programming model should enable the clearly
    (provably) (weakly) deterministic expression of
    deterministic algorithms
  • We should not make it easier to write buggy
    programs
  • Composing approaches is key efficiently
    switching between tasks without polling is
    critical, especially for applications built from
    multiple components
  • Multicore is not SMP
  • Good news for in-cache problems
  • Bad news for out-of-cache (accentuates the need
    for someone, whether algorithm developer,
    programmer, and/or software stack, to manage
    memory locality)
  • Successful abstractions have been used for 10000
    parallelism
  • But these are domain specific, not a universal
    parallel abstraction
  • Can we make 95 happy by providing a modest-sized
    collection of parallel abstractions? Do these
    belong in the language or somewhere else?
  • Ill believe compiler-based solutions when
    vendors compile the reference HPL benchmark
    (including the BLAS) for their Top500 submission
Write a Comment
User Comments (0)
About PowerShow.com