Berkeley UPC - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Berkeley UPC

Description:

Supports collectives (tuning ongoing); memory model compliance ... 30,000 UPC compilations and 20,000 UPC test runs per night ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 9
Provided by: DanBon5
Category:

less

Transcript and Presenter's Notes

Title: Berkeley UPC


1
Berkeley UPC
Kathy Yelick Christian Bell, Dan Bonachea, Wei
Chen, Jason Duell, Paul Hargrove, Parry Husbands,
Costin Iancu, Rajesh Nishtala, Mike Welcome LBNL
and U.C. Berkeley http//upc.lbl.gov
2
Berkeley UPC Compiler Status
  • Recent Berkeley UPC release (v2.2)
  • Support 1.2 language spec
  • Supports collectives (tuning ongoing) memory
    model compliance
  • Supports UPC I/O (naïve reference implementation)
  • Compiler work
  • Optimization phase and improved performance in
    v2.2
  • Work on automated communication overlap,
    upc_forall,
  • Large effort in quality assurance and robustness
  • Test suite 600 tests run nightly on 20
    platform configs
  • gt30,000 UPC compilations and gt20,000 UPC test
    runs per night
  • Test suite infrastructure extended to support any
    UPC compiler
  • now running nightly with GCC/UPC UPCR
  • also has been used on HP-UPC, Cray UPC

3
Berkeley UPC Collaborations
  • GCC UPC on Berkeley UPC Runtime
  • Use for cluster (GASNet) implementations
  • Now works with pthread runtime
  • Source-level debugging with Totalview 7.x
  • Joint project with Etnus
  • General framework for source-to-source
    translators
  • Future work
  • Cray XT3 and other Rainier/Adams port
  • Possible BlueGene/L port
  • XT3 and BG/L both run on MPI conduit

4
Berkeley Applications Benchmarks
  • Some new applications
  • FT .45 TFlops on 512 proc Itanium/Quadrics
    (Elan4)
  • CG 30 GFlops on 512 HP Alpha/Quadrics (Elan3)
  • LU gt2 TFlops on 512 proc Itanium/Quadrics
    (Elan4)
  • Barnes-Hut fine-grained (based on Splash)
  • CFG uses to Chombo
  • More on LU
  • Towards a Sparse direct solver (SuperLU)
  • Currently a full (top500-compliant) HPL
    implementation
  • All UPC except for call to the BLAS

5
End of Berkeley Status
6
Data Movement and Synchronization
7
Motivation for Data Movement Synchronization
  • Some are (at best) hard/slow in UPC
  • Benchmarks highlight these
  • FT communication-limited, all-to-all want to
    overlap
  • MG fill in ghost regions
  • Remote writes are often faster than remote reads
  • But need to synchronize let the other proc know
    data is available
  • See Tarek and John Mellor-Crummeys PPoPP05 paper
  • Signaling store in Split-C
  • Implementation issue reordering
  • LU remotely enqueue a task
  • GUPS and Histogram remotely increment/XOR a
    value
  • With or without atomicity

8
Who Would Like to Talk?
  • Non-Blocking Memget/put (Dan)
  • Semaphores (Dan)
  • Semaphore example (Tarek)
  • Remote Atomics (Phil)
  • Floating functions (Jason)
Write a Comment
User Comments (0)
About PowerShow.com