Transactional Memory An Overview of Hardware Alternatives - PowerPoint PPT Presentation

About This Presentation
Title:

Transactional Memory An Overview of Hardware Alternatives

Description:

'The development of SQL/801 was greatly simplified because, with minor exceptions, ... Elide locks from the dynamic execution stream ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 23
Provided by: pagesC
Category:

less

Transcript and Presenter's Notes

Title: Transactional Memory An Overview of Hardware Alternatives


1
Transactional MemoryAn Overview of Hardware
Alternatives
  • David A. Wood
  • University of Wisconsin
  • Transactional Memory Workshop
  • April 8th, 2005

2
Whats database got to do with it?
  • Atomicity
  • All updates, or none
  • Consistency
  • Correct at begin and end
  • Isolation
  • Partial work not visible
  • Inputs stay stable
  • Durability
  • Survive system failures

All (or some) memory ops, not just database
objects
Despite increasing awareness of failures
3
801 Database Storage
  • Lock bits on virtual memory
  • 128 byte granularity
  • Added to pagetable and TLB
  • Caches users lock state
  • Trap on lock conflict
  • No h/w for logging, abort, etc.
  • Only uniprocessors
  • 801 and RS/6000

Memory
TLB
Tid
Was this transactional memory?
4
SQL/801
  • The development of SQL/801 was greatly
    simplified because, with minor exceptions, it
    considers only a single user. It achieves
    multiuser concurrency on a uniprocessor by
    running in multiple processes using the shared
    database storage. Chang and Mergen, 88
  • Largest transactional memory application
  • Only real hardware transactional memory
    implementation
  • No one seems to be looking at what they learned

5
Basic Transactional Mechanisms
  • Isolation
  • Detect when transactions conflict
  • Track read and write sets
  • Version management
  • Record new and old values
  • Atomicity
  • Commit new values
  • Abort back to old values

6
H/W Transactional Memory Systems
  • Knights Lisp Work
  • Transactional Memory
  • Oklahoma Update
  • SLE/TLR
  • Transactional Coherence and Consistency
  • Unbounded TM
  • Virtual TM
  • Thread-level TM

7
Knights Lisp Work 86
  • Parallel execution of sequential code
  • Break program into transaction blocks
  • Multiple loads in a transaction
  • Exactly one store ends the transaction
  • No register state passed between transactions
  • Execute transactions in parallel
  • Track dependences (i.e., read set)
  • Abort and restart on conflicting write
  • Transactions commit in sequential order
  • Broadcast writes on commit

8
Knights Hardware
  • Two caches
  • Dependency cache
  • Tracks read set
  • Bus monitor detects conflicts
  • Confirm cache
  • Holds write set
  • Supports multiple writes
  • Commits
  • Check dep. cache
  • Broadcast writes
  • Fast aborts
  • Invalidate Confirm cache
  • Use old values in Dep. Cache
  • Immediately restart execution

Memory
Confirm Cache
Dependency Cache
Spawned two threads TLS TM
9
HMs Transactional Memory 93
  • Targets explicitly parallel (non-functional)
    codes
  • Motivated by lock-free data structures
  • Transactions
  • Read and write multiple locations
  • Commit in arbitrary order
  • Implicit begin, explicit commit operations
  • Abort affects memory, not registers
  • Software manages restarting execution
  • Validate instruction detects pending abort
  • Implementation extends cache coherence
  • Read/Write locks correspond to MOESI states
  • Add orthogonal transaction states

10
HMs Transactional Memory
  • Adds Transaction Cache
  • Stores all data accessed by transactions
  • 2 copies of each line
  • Before and after image
  • Even for read-only data
  • Small, fully associative
  • Abort on all conflicts
  • NACK conflicting requests
  • Abort NACKed transaction
  • Fast commit and abort
  • Change trans. cache state

Memory
Cache
Transaction Cache
11
SLE/TLR
  • Hardware exploits speculative processors
  • Read sets tracked by coherence protocol
  • Write set maintained in store queue
  • Abort restarts execution, including register
    state
  • Speculative lock elision (SLE)
  • Elide locks from the dynamic execution stream
  • Convert critical sections to optimistic
    transactions
  • Concurrently execute non-conflicting transactions
  • Fall back on explicit locks if conflicts
  • Transactional Lock Removal (TLR)
  • Resolve conflicts using priority ordering
    (timestamps)
  • Delay lower priority transactions
  • Deadlock and starvation free

12
Transactional Coherence and Consistency 04
  • TCC unifies coherence, memory consistency, and
    transaction support
  • All transactions, all the time
  • Transaction ordering
  • Ordered, Unordered, Partially Ordered
  • Supports thread-level speculation
  • Optimistic concurrency model
  • Unordered transactions serialize at commit
  • Conflicts detected at commit

13
TCC
On-Chip Interconnect Broadcast updates at commit
Write buffer 4 kB, holds new values until commit
Shadow register file checkpoints architectural
registers
L2 Cache Logically Shared
CPU
L1 D
L1 cache tracks read set, bit per line
SRF
14
TCC
  • Commits are sequential
  • Broadcasts addresses of all updates
  • Supports large transactions
  • Serialize all other transactions
  • Grabs and holds the commit bus
  • Cannot abort large transactions
  • Updates affect L2/Mem no undo
  • Extensions forthcoming
  • talk to Kunle and Christos

15
Unbounded Transactional Memory (UTM)
  • Unbounded transactions
  • Arbitrary size
  • Not limited by write buffer, cache, or memory
  • Arbitrary duration
  • Not limited by interrupts, context switch, etc.
  • Complex implementation
  • Not justified by performance
  • Settle for nearly unbounded transactions
  • Much simpler hardware

16
Transactional Linux
Log-log scale
  • Almost all of the transactions require lt 100
    cache lines
  • 99.9 need fewer than 54 cache lines
  • There are, however, some very large transactions!
  • gt500k-byte fully-associative cache required

17
Large Transaction Memory (LTM)
  • Register checkpoints
  • Snapshot of rename maps
  • Cache tracks read and write sets
  • T-bits mark transactional blocks
  • Cache holds new data values in place
  • O-bit indicates overflow to in-memory hashtable
  • Memory holds committed state
  • Abort invalidates all modified blocks
  • Miss on re-execution
  • Transactional writes force memory updates
  • Repeated writes (e.g., to local data) are written
    through

18
Virtual Transactional Memory (VTM)
  • Only an overflow mechanism
  • No overhead on common in-cache case
  • Check shared overflow counter on cache miss
  • Low overhead when no conflict
  • Shared Bloom Filter rules out conflicts
  • Filter resides in virtual memory
  • Higher overhead on possible conflict
  • Hardware table walk to detect actual conflict
  • Table resides in virtual memory
  • Only incurred by large transactions with likely
    conflict
  • Supports context switches and paging

19
801 revisited
  • Why didnt 801 database storage succeed?
  • Lock bits helped performance and simplified
    software
  • Answer 1
  • Changing lock bits requires TLB shootdown
  • Too complicated for the benefits?
  • ? Not a current problem transaction h/w is easy
  • Answer 2
  • Not universally available
  • DB2 was (is) multiplatform
  • Cant rely on feature only available in one
    architecture
  • ?Still a relevant concern

20
Need Standard Transaction Interface
  • Abstract away resource requirements
  • Support large, long transactions
  • Virtualize transactional memory
  • Transaction semantics between threads
  • NOT a hardware property
  • Permit range of implementations
  • Hardware, software, and combinations

21
Thread-level Transactional Memory
  • Abstract mechanisms
  • Version management
  • Update memory in place
  • Log before images to thread level VM
  • Isolation
  • Logically extend memory words with read and write
    bits
  • Implementations can be conservative (e.g.,
    blocks)
  • Atomicity
  • Commits easy due to in place updates
  • Aborts trap to user-level software
  • Hardware can accelerate common case

22
Conclusions
  • Make the common case fast
  • 99 of transactions fit in hardware
  • Lots of alternatives
  • Make both commits and aborts fast
  • Handle the uncommon case
  • Large transactions will occur, deal with em
  • Shouldnt be limited by hardware
  • Agree on a common abstraction
  • Success requires multi-platform support
  • Let vendors compete on price-performance
Write a Comment
User Comments (0)
About PowerShow.com