Lecture 12: Hardware/Software Trade-Offs - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 12: Hardware/Software Trade-Offs

Description:

Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory University of Utah Capacity Limitations In a Sequent NUMA-Q design above, A remote ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 16

Provided by: RajeevBala94

Learn more at: https://users.cs.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 12: Hardware/Software Trade-Offs

1
Lecture 12 Hardware/Software Trade-Offs

Topics COMA, Software Virtual Memory

2
Capacity Limitations
P
P
P
P
C
C
C
C
B1
B1
Coherence Monitor
Mem
Coherence Monitor
Mem
B2

In a Sequent NUMA-Q design above,
A remote access is involved if data cannot be
found in the remote
access cache
The remote access cache and local memory are
both DRAM
Can we expand cache and reduce local memory?

3
Cache-Only Memory Architectures

COMA takes the extreme approach no local memory
and
a very large remote access cache
The cache is now known as an attraction memory
Overheads/issues that must be addressed
Need a much larger tag space
More care while evicting a block
Finding a clean copy of a block
Easier to program data need not be
pre-allocated

4
COMA Performance

Attraction memories reduce the frequency of
remote
accesses by reducing capacity/conflict misses
Attraction memory access time is longer than
local memory
access time in the CC-NUMA case (since the
latter does
not involve tag comparison)
COMA helps programs that have frequent capacity
misses
to remotely allocated data

5
COMA Implementation

Even though the memory block has no fixed home,
the
directory can continue to remain fixed on a
miss or on
a write, contact directory to identify valid
cached copies
In order to not evict the last block, one of the
sharers has
the block in master state while replacing
the master
copy, a message must be sent to the directory
the
directory attempts to find another node that
can
accommodate this block in master state
For high performance, the physical memory
allocated to
an application must be smaller than attraction
memory
capacity, and attraction memory must be highly
associative

6
Reducing Cost

Hardware cache coherence involves specialized
communication assists cost can be reduced by
using
commodity hardware and software cache coherence
Software cache coherence each processor
translates the
applications virtual address space into its
own physical
memory if the local physical memory does not
exist
(page fault), a copy is made by contacting the
home node
a software layer is responsible for tracking
updates and
propagating them to cached copies also known
as
shared virtual memory (SVM)

7
Shared Virtual Memory Performance

Every communication is expensive involves OS,
message-passing over slower I/O interfaces,
protocol
processing happens at the processor
Since the implementation is based on the
processors
virtual memory support, granularity of sharing
is a page
? high degree of false sharing
For a sequentially consistent execution, false
sharing
leads to a high degree of expensive
communication

8
Relaxed Memory Models

Relaxed models such as release consistency can
reduce
frequency of communication (while increasing
programming
effort)
Writes are not immediately propagated, but have
to wait
until the next synchronization point
In hardware CC, messages are sent immediately
and
relaxed models prevent the processor from
stalling in
software CC, relaxed models allow us to defer
message
transfers to amortize their overheads

9
Hardware and Software CC
Rd y
Rd y
Rd y
synch
Traffic with hardware CC
Traffic with software CC
Wr x
Wr x
synch

Relaxed memory models in hardware cache
coherence hide latency
from processor ? false sharing can result in
significant network traffic
In software cache coherence, the relaxed memory
model sends messages
only at synchronization points, reducing the
traffic because of false sharing

10
Eager Release Consistency

When a processor issues a release operation, all
writes
by that processor are propagated to other nodes
(as
updates or invalidates)
When other processors issue reads, they
encounter a
cache miss (if we are using an invalidate
protocol), and
get a clean copy of the block from the last
writer
Does the read really have to see the latest
value?

11
Lazy Release Consistency

RCsc guarantees SC between special operations
P2 must see updates by P1 only if P1 issued a
release,
followed by an acquire by P2
In LRC, updates/invalidates are visible to a
processor only
after it does an acquire it is possible that
some processors
will never see the update (not true cache
coherence)
LRC reduces the amount of traffic, but increases
the
latency and complexity of an acquire

12
LRC Vs. ERC Vs. Hardware-RC
P1
P2 lock L1 ptr
non_null_value unlock L1
while (ptr null)

lock L1
a ptr
unlock L1
13
Multiple Writer Protocols

It is important to support two concurrent writes
to different
words within a page and to merge the writes at
a later point
Each process makes a twin copy of the page
before it
starts writing updates are sent as a diff
between the old
and new copies after an acquire, a process
must get
diffs from all releasing processes and apply
them to its
own copy of the page
If twins are kept around for a long time,
storage overhead
increases it helps to have a home location of
the page
that is periodically updated with diffs

14
Simple COMA

SVM takes advantage of virtual memory to provide
easy
implementations of address translation,
replication, and
replacement
These can be applied to the COMA architecture
Simple COMA if virtual address translation
fails, the OS
generates a local copy of the page when the
page is
replaced, the OS ensures that the data is not
lost if data
is not found in attraction memory, hardware is
responsible
for fetching the relevant cache block from a
remote node
(note that physical address must be translated
back to
virtual address)

15
Title

Bullet

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Lecture 14 Software Design for Low-Power PowerPoint PPT Presentation

Lecture 14 Software Design for Low-Power - Title: Testing in the Fourth Dimension Author: pagrawal Last modified by: bushnell Created Date: 11/3/2000 2:09:08 AM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 2 Addendum: Software Platforms PowerPoint PPT Presentation

Lecture 2 Addendum: Software Platforms - Lecture 2 Addendum: Software Platforms Anish Arora CIS788.11J Introduction to Wireless Sensor Networks Lecture uses s from tutorials prepared by authors of these ... | PowerPoint PPT presentation | free to view

Hardware Software Codesign of Embedded System PowerPoint PPT Presentation

Hardware Software Codesign of Embedded System - Hardware Software Codesign of Embedded System CPSC689-602 Rabi Mahapatra Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues ... | PowerPoint PPT presentation | free to view

Lecture 2: Software Platforms PowerPoint PPT Presentation

Lecture 2: Software Platforms - Lecture 2: Software Platforms Anish Arora CIS788.11J Introduction to Wireless Sensor Networks Lecture uses s from tutorials prepared by authors of these platforms | PowerPoint PPT presentation | free to view

Informatics 122 Software Design II PowerPoint PPT Presentation

Informatics 122 Software Design II - Informatics 122 Software Design II Lecture 7 Emily Navarro Duplication of course material for any commercial purpose without the explicit written permission of the ... | PowerPoint PPT presentation | free to view

ECE3055 Computer Architecture and Operating Systems Lecture 8 Memory Subsystem PowerPoint PPT Presentation

ECE3055 Computer Architecture and Operating Systems Lecture 8 Memory Subsystem - Computer Architecture and Operating Systems Lecture 8 Memory Subsystem Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of ... | PowerPoint PPT presentation | free to view

Lecture 2: RF Issues for Software Radios RF Engineering for the DSP Engineer PowerPoint PPT Presentation

Lecture 2: RF Issues for Software Radios RF Engineering for the DSP Engineer - ... Multimode radios 900 MHz to 2 GHz are difficult ... 60 dB down or more I Q BPF BPF I/Q LO ... for TDMA and FDMA Low noise amplifier (LNA ... | PowerPoint PPT presentation | free to view

Computer Architecture and Parallel Computing ??????? Lecture 2 - Pipelining PowerPoint PPT Presentation

Computer Architecture and Parallel Computing ??????? Lecture 2 - Pipelining - Exceptions are usually unexpected or rare from program s point of ... timer expiration power ... hazard can always be avoided by adding more ... | PowerPoint PPT presentation | free to view

Hardware Software Codesign of Embedded System - Amortize hardware design over large volume productions. Suggestion: ... Choice of hardware to implement the design affects the performance and cost ... | PowerPoint PPT presentation | free to view

Lecture 1 for Chapter 6, System Design PowerPoint PPT Presentation

Lecture 1 for Chapter 6, System Design - System Design: Decomposing the System | PowerPoint PPT presentation | free to view

CSC 402 Requirements Engineering PowerPoint PPT Presentation

CSC 402 Requirements Engineering - Administrivia, General software principals, and brief run-down of software life-cycles. | PowerPoint PPT presentation | free to view

CS 152 Computer Architecture and Engineering Lecture 25: The Final Chapter PowerPoint PPT Presentation

CS 152 Computer Architecture and Engineering Lecture 25: The Final Chapter - Computer Architecture and Engineering Lecture 25: The Final Chapter Dec 5, 1995 Dave Patterson (patterson@cs) lecture s: http://www-inst.eecs.berkeley.edu/~cs152/ | PowerPoint PPT presentation | free to view

CS 3204 Operating Systems PowerPoint PPT Presentation

CS 3204 Operating Systems - CS 3204 Operating Systems Lecture 2 Godmar Back | PowerPoint PPT presentation | free to view

SE 477 Software and Systems Project Management PowerPoint PPT Presentation

SE 477 Software and Systems Project Management - Title: Lecture 2 Subject: SE 477: Software and Systems Project Management Author: Dennis L. Mumaugh Last modified by: Dennis L. Mumaugh Created Date | PowerPoint PPT presentation | free to view

ECE 697F Reconfigurable Computing Lecture 21 HardwareSoftware CoDesign: Automatic Compilation to Rec PowerPoint PPT Presentation

ECE 697F Reconfigurable Computing Lecture 21 HardwareSoftware CoDesign: Automatic Compilation to Rec - Hardware/Software Co-Design: Automatic Compilation. to Reconfigurable Coprocessors ... Co-Design Methodology. Lecture 21: Hardware/Software Codesign. November 30, 2006 ... | PowerPoint PPT presentation | free to view

System Design: PowerPoint PPT Presentation

System Design: - to acquire the necessary hardware and software ... Hardware/Software. deals. SystemOwners/Users. Selected Design. Option. Design in Progress ... | PowerPoint PPT presentation | free to view

Introduction to Memory Management PowerPoint PPT Presentation

Introduction to Memory Management - Introduction to Memory Management Lecture 28 Lawrence Angrave/Klara Nahrstedt | PowerPoint PPT presentation | free to view

Lecture 1: Cryptography for Network Security PowerPoint PPT Presentation

Lecture 1: Cryptography for Network Security - Title: Data Refinement Author: The Leals Last modified by: Anish Arora Created Date: 3/1/2000 5:00:15 PM Document presentation format: On-screen Show (4:3) | PowerPoint PPT presentation | free to view

Network and Information Security Lecture 2 PowerPoint PPT Presentation

Network and Information Security Lecture 2 - Title: Public Key Cryptosystems Author: B Srinivasan Last modified by: pdle Created Date: 2/23/1996 1:12:16 PM Document presentation format: On-screen Show | PowerPoint PPT presentation | free to view

Lecture 8 Virtual Circuits, ATM, MPLS PowerPoint PPT Presentation

Lecture 8 Virtual Circuits, ATM, MPLS - 'Link-local scope' Connection setup can proceed hop-by-hop. Good news for our setup protocols! ... E.g. voice channels, data traffic. Elaborate signaling stack. ... | PowerPoint PPT presentation | free to view

ELEC692 VLSI Signal Processing Architecture Lecture 8 PowerPoint PPT Presentation

ELEC692 VLSI Signal Processing Architecture Lecture 8 - ELEC692 VLSI Signal Processing Architecture Lecture 8 ... Utilization of multipliers increased to 75% due to storage of 3 out of radix-4 butterfly outputs. | PowerPoint PPT presentation | free to view

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 PowerPoint PPT Presentation

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 - ... between the hardware and the low-level software. Hardware ... computer designers to talk about functions independently from the hardware that performs them. ... | PowerPoint PPT presentation | free to view

CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan PowerPoint PPT Presentation

CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan - CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan Course Administration Instructor Vijay Narayanan (vijay@cse.psu.edu ... | PowerPoint PPT presentation | free to view

CpE 442 Introduction To Computer Architecture Lecture 1 PowerPoint PPT Presentation

CpE 442 Introduction To Computer Architecture Lecture 1 - HI. LO. OP. OP. OP. rs. rt. rd. sa. funct. rs. rt. immediate. target ... The speed of some I/O devices is limited by human reaction time--very very slow ... | PowerPoint PPT presentation | free to view

CS6461 - ... Computer Architect and turns it into a real working system using a variety of implementation techniques while making trade-offs among the ... an Internal Result ... | PowerPoint PPT presentation | free to view

CS152 Computer Architecture and Engineering Lecture 1 PowerPoint PPT Presentation

CS152 Computer Architecture and Engineering Lecture 1 - Computer Architecture and Engineering Lecture 1 August 27, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture s: http://www-inst.eecs.berkeley.edu/~cs152/ | PowerPoint PPT presentation | free to view