Applying Thread Level Speculation to Database Transactions - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Applying Thread Level Speculation to Database Transactions

Description:

BerkeleyDB. Simulated machine. 11. Transaction. Programmer. DBMS ... TPC-C transactions on BerkeleyDB. In-core database. Single user. Single warehouse ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 48
Provided by: csC76
Learn more at: https://cs.login.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Applying Thread Level Speculation to Database Transactions


1
Applying Thread Level Speculation to Database
Transactions
  • Chris Colohan
  • (Adapted from his Thesis Defense talk)

2
Chip Multiprocessors are Here!
AMD Opteron
IBM Power 5
Intel Yonah
  • 2 cores now, soon will have 4, 8, 16, or 32
  • Multiple threads per core
  • How do we best use them?

3
Multi-Core Enhances Throughput
Database Server
Users
Cores can run concurrent transactions and improve
throughput
4
Multi-Core Enhances Throughput
Database Server
Users
Can multiple cores improve transaction latency?
5
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-query parallelism
  • Used for long-running queries (decision support)
  • Does not work for short queries
  • Short queries dominate in commercial workloads

6
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-transaction parallelism
  • Each thread spans multiple queries
  • Hard to add to existing systems!
  • Need to change interface, add latches and locks,
    worry about correctness of parallel execution

7
Parallelizing transactions
DBMS
SELECT cust_info FROM customer UPDATE district
WITH order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock quantity-- UPDATE stock WITH
quantity INSERT item INTO order_line
  • Intra-transaction parallelism
  • Breaks transaction into threads
  • Hard to add to existing systems!
  • Need to change interface, add latches and locks,
    worry about correctness of parallel execution

Thread Level Speculation (TLS) makes
parallelization easier.
8
Thread Level Speculation (TLS)
p
p
q
q
p
q
Sequential
Parallel
9
Thread Level Speculation (TLS)
  • Use epochs
  • Detect violations
  • Restart to recover
  • Buffer state
  • Worst case
  • Sequential
  • Best case
  • Fully parallel

Epoch 1
Epoch 2
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
Data dependences limit performance.
10
A Coordinated Effort
TPC-C
Transactions
DBMS
BerkeleyDB
Hardware
Simulated machine
11
A Coordinated Effort
Choose epoch boundaries
TransactionProgrammer
DBMS Programmer
Remove performance bottlenecks
Hardware Developer
Add TLS support to architecture
12
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Hardware Support
  • Results
  • Conclusions

13
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
  • Only dependence is the quantity field
  • Very unlikely to occur (1/100,000)

14
Case Study New Order (TPC-C)
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order foreach(item) GET quantity FROM
stock WHERE i_iditem UPDATE stock
WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
GET cust_info FROM customer UPDATE district WITH
order_id INSERT order_id INTO
new_order TLS_foreach(item) GET quantity
FROM stock WHERE i_iditem UPDATE
stock WITH quantity-1 WHERE i_iditem
INSERT item INTO order_line
15
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Hardware support
  • Results
  • Conclusions

16
Dependences in DBMS
17
Dependences in DBMS
  • Dependences serialize execution!
  • Performance tuning
  • Profile execution
  • Remove bottleneck dependence
  • Repeat

18
Buffer Pool Management
CPU
get_page(5)
put_page(5)
Buffer Pool
ref 1
ref 0
19
Buffer Pool Management
CPU
get_page(5)
get_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
TLS ensures first epoch gets page first. Who
cares?
ref 0
20
Buffer Pool Management
  • Escape speculation
  • Invoke operation
  • Store undo function
  • Resume speculation

CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
put_page(5)
put_page(5)
get_page(5)
Buffer Pool
put_page(5)
ref 0
21
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? Wraps get_page()
22
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? No violations while calling get_page()
23
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? May get bad input data from speculative thread!
24
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? Only one epoch per transaction at a time
25
get_page() wrapper
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

? How to undo get_page()
26
get_page() wrapper
  • Isolated
  • Undoing this operation does not cause cascading
    aborts
  • Undoable
  • Easy way to return system to initial state
  • Can also be used for
  • Cursor management
  • malloc()
  • page_t get_page_wrapper(pageid_t id)
  • static tls_mutex mut
  • page_t ret
  • tls_escape_speculation()
  • check_get_arguments(id)
  • tls_acquire_mutex(mut)
  • ret get_page(id)
  • tls_release_mutex(mut)
  • tls_on_violation(put, ret)
  • tls_resume_speculation()
  • return ret

27
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
get_page(5)
Buffer Pool
Not undoable!
ref 0
28
Buffer Pool Management
CPU
get_page(5)
get_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref 0
  • Delay put_page until end of epoch
  • Avoid dependence

29
Removing Bottleneck Dependences
  • We introduce three techniques
  • Delay operations until non-speculative
  • Mutex and lock acquire and release
  • Buffer pool, memory, and cursor release
  • Log sequence number assignment
  • Escape speculation
  • Buffer pool, memory, and cursor allocation
  • Traditional parallelization
  • Memory allocation, cursor pool, error checks,
    false sharing

30
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Hardware support
  • Results
  • Conclusions

31
TLS in Database Systems
  • Large epochs
  • More dependences
  • Must tolerate
  • More state
  • Bigger buffers

Non-Database TLS
TLS in Database Systems
32
Feedback Loop
for() do_work()
33
Violations Feedback
p
Violation!
p
p
R2
q
q
p
q
Sequential
Parallel
34
Eliminating Violations
0x0FD8? 0xFD20 0x0FC0? 0xFC18
35
Tolerating Violations Sub-epochs
Violation!
q
Sub-epochs
36
Sub-epochs
  • Started periodically by hardware
  • How many?
  • When to start?
  • Hardware implementation
  • Just like epochs
  • Use more epoch contexts
  • No need to check violations between sub-epochs
    within an epoch

Violation!
q
Sub-epochs
37
Old TLS Design
Buffer speculative state in write back L1 cache
CPU
CPU
CPU
CPU
L1
L1
L1
L1
Restart by invalidating speculative lines
Invalidation
Detect violations through invalidations
  • Problems
  • L1 cache not large enough
  • Later epochs only get values on commit

L2
Rest of system only sees committed data
Rest of memory system
38
New Cache Design
CPU
CPU
CPU
CPU
Speculative writes immediately visible to L2 (and
later epochs)
L1
L1
L1
L1
Restart by invalidating speculative lines
Buffer speculative and non-speculative state for
all epochs in L2
L2
L2
Invalidation
Detect violations at lookup time
Rest of memory system
Invalidation coherence between L2 caches
39
New Features
New!
CPU
CPU
CPU
CPU
Speculative state in L1 and L2 cache
L1
L1
L1
L1
Cache line replication (versions)
L2
L2
Data dependence tracking within cache
Speculative victim cache
Rest of memory system
40
Outline
  • Introduction
  • Related work
  • Dividing transactions into epochs
  • Removing bottlenecks in the DBMS
  • Hardware support
  • Results
  • Conclusions

41
Experimental Setup
  • Detailed simulation
  • Superscalar, out-of-order, 128 entry reorder
    buffer
  • Memory hierarchy modeled in detail
  • TPC-C transactions on BerkeleyDB
  • In-core database
  • Single user
  • Single warehouse
  • Measure interval of 100 transactions
  • Measuring latency not throughput

42
Optimizing the DBMS New Order
1.25
26 improvement
1
0.75
Time (normalized)
Other CPUs not helping
0.5
Cant optimize much more
Cache misses increase
0.25
0
Sequential
43
Optimizing the DBMS New Order
1.25
1
0.75
Time (normalized)
0.5
0.25
0
This process took me 30 days and lt1200 lines of
code.
Sequential
44
Other TPC-C Transactions
1
0.75
Idle CPU
Violated
Time (normalized)
Cache Miss
0.5
Busy
0.25
0
New Order
Delivery
Stock Level
Payment
Order Status
45
Scaling
1
0.75
Time (normalized)
0.5
0.25
0
Seq.
2 CPUs
4 CPUs
8 CPUs
46
Scaling
New Order 150
1
0.75
Time (normalized)
0.5
0.25
0
Seq.
2 CPUs
4 CPUs
8 CPUs
Seq.
2 CPUs
4 CPUs
8 CPUs
47
Conclusions
  • A new form of parallelism for databases
  • Tool for attacking transaction latency
  • Intra-transaction parallelism
  • Without major changes to DBMS
  • With feasible new hardware
  • TLS can be applied to more than transactions
  • Halve transaction latency by using 4 CPUs
Write a Comment
User Comments (0)
About PowerShow.com