Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures - PowerPoint PPT Presentation

Loading...

PPT – Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures PowerPoint presentation | free to download - id: 5aafa8-NDAyM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures

Description:

Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures Per Stenstrom, Truman Joe and Anoop Gupta Presented by Colleen Lewis – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 17
Provided by: Collee107
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures


1
Comparative Performance Evaluation of
Cache-Coherent NUMA and COMA Architectures
  • Per Stenstrom, Truman Joe and Anoop Gupta
  • Presented by Colleen Lewis

2
Overview
  • Common Features
  • CC-NUMA
  • COMA
  • Cache Misses
  • Performance Expectations
  • Simulation Results
  • COMA-F

3
Common Features
  • Large-scale multiprocessors
  • Single address space
  • Distributed main memory
  • Directory-based cache coherence
  • Scalable interconnection network
  • Examples

4
Cache-Coherent Non-Uniform-Memory-Access Machines
  • Network independent
  • Write-invalidate cache coherence protocol
  • 2 hop miss
  • 3 hop miss

5
COMA
Cache-Only Memory Architectures
  • Attraction memory per-node memory acts as
    secondary/tertiary cache
  • Data is distributed and mobile
  • Directory is dynamically distributed in a
    hierarchy
  • Combining can optimize multiple reads
  • LU - 47, Barnes Hut - 6, remaining lt 1
  • Reduces the average cache latency
  • Increased overhead for directory structure

6
Cache Misses
Which architecture has lower latency?

Cold miss
Capacity miss
Coherence miss
7
Figure 1
8
Performance Expectations
9
Simulation
  • 16 processors
  • Cache lines 16 bytes
  • Cache size of 4 Kbytes
  • (Small to force capacity misses)

10
Results
11
Results
  • MP3D Particle-based wind tunnel simulation
  • PTHOR Distributed-time logic simulation
  • LocusRoute VLSI standard cell router
  • Water Molecular dynamics code Water
  • Cholesky Cholesky factorization of sparse
    matrix
  • LU LU decomposition of dense matrix
  • Barnes-Hut N-body problem solver O(NlogN)
  • Ocean Ocean basin simulation

12
Page Migration Page Size
  • Introduces additional overhead
  • Node hit rate increases as page size decreases
  • Reduces false sharing
  • Fewer pages accessed by multiple processors
  • Likely wont work if data chunks are much smaller
    than pages (example - LU)
  • NUMA-M performs better for Cholesky

13
Initial Placement
  • Implemented as page migration with a max of 1
    time that a page can be migrated
  • LU does significantly better
  • Ocean does the same for single vs. multiple
    migrations
  • Requires increased work for compiler and
    programmer

14
Cache Size/Network Variations
  • Cache Size Variations
  • Increasing the cache size causes coherence misses
    to dominate
  • With 64KB cache, CC-NUMA (without migration) is
    better for everything except Ocean.
  • Network Latency Variations
  • Even with aggressive implementations of directory
    structure, COMA cant compensate in applications
    with significant coherence miss rate

15
COMA-F
  • Data directory information has a home node
    (CC-NUMA)
  • Supports replication and migration of data blocks
    (COMA-H)
  • Attempts to reduce the coherence miss penalty

16
Conclusion
  • CC-NUMA and COMA perform well for different
    application characteristics
About PowerShow.com