CSC%204250%20Computer%20Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

CSC%204250%20Computer%20Architectures

Description:

Mult. Yes. Mult1. Vj. Reg[F2] Vk. Load1. Qj. Qk. No. Add3 ... Mult. Yes. Mult1. Store2. Add2. Add1. Load1. Name. Yes. Yes. Yes. Yes. Busy. A. Op. Reg[R1] Load ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 22
Provided by: stude6
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: CSC%204250%20Computer%20Architectures


1
CSC 4250Computer Architectures
  • October 20, 2006
  • Chapter 3. Instruction-Level Parallelism
  • Its Dynamic Exploitation

2
One More Example on Tomasulos Algorithm
  • L.D F0,0(R0)
  • ADD.D F0,F0,F2
  • MUL.D F0,F0,F4
  • ADD.D F0,F0,F2
  • MUL.D F0,F0,F4
  • S.D F0,0(R0)
  • ADD.D F0,F4,F2

3
IBM 360 Assembly Language
  • Only two operands. Advantage? Disadvantage?
  • Example
  • L.D F0,0(R0)
  • ADD.D F0,F2
  • MUL.D F0,F4
  • ADD.D F0,F2
  • MUL.D F0,F4
  • S.D F0,0(R0)

4
Figure 0.1
Instruction Issue Execute Write Result
L.D F0,0(R0) v
ADD.D F0,F0,F2
MUL.D F0,F0,F4
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Load1
5
Figure 0.2
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 No
Add3 No
Mult1 No
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add1
6
Figure 0.3
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 No
Add3 No
Mult1 Yes Mult RegF4 Add1
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Mult1
7
Figure 0.4
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4
S.D F0,0(R0)
ADD.D F0,F4F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 Yes Add RegF2 Mult1
Add3 No
Mult1 Yes Mult RegF4 Add1
Mult2 No
Store1 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add2
8
Figure 0.5
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
S.D F0,0(R0)
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 Yes Add RegF2 Mult1
Add3 No
Mult1 Yes Mult RegF4 Add1
Mult2 Yes Mult RegF4 Add2
Store1 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Mult2
9
Figure 0.6
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
S.D F0,0(R0) v
ADD.D F0,F4,F2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 Yes Add RegF2 Mult1
Add3 No
Mult1 Yes Mult RegF4 Add1
Mult2 Yes Mult RegF4 Add2
Store1 Yes Store Mult2 0RegR0
F0 F2 F4 F6 F8 F10 F12 F30
Qi Mult2
10
Figure 0.7
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
S.D F0,0(R0) v
ADD.D F0,F4,F2 v
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 Yes Add RegF2 Mult1
Add3 Yes Add RegF4 RegF2
Mult1 Yes Mult RegF4 Add1
Mult2 Yes Mult RegF4 Add2
Store1 Yes Store Mult2 0RegR0
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add3
11
Figure 0.8
Instruction Issue Execute Write Result
L.D F0,0(R0) v v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
ADD.D F0,F0,F2 v
MUL.D F0,F0,F4 v
S.D F0,0(R0) v
ADD.D F0,F4,F2 v v v
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load 0RegR0
Add1 Yes Add RegF2 Load1
Add2 Yes Add RegF2 Mult1
Add3 No
Mult1 Yes Mult RegF4 Add1
Mult2 Yes Mult RegF4 Add2
Store1 Yes Store Mult2 0RegR0
F0 F2 F4 F6 F8 F10 F12 F30
Qi
12
Modified Loop-Based Example
  • Loop L.D F0,0(R1)
  • MUL.D F0,F0,F2
  • ADD.D F0,F0,F4
  • S.D F0,0(R1)
  • DADDIU R1,R1,-8
  • BNE R1,R2,Loop

13
Figure 0.1. One active iteration of loop
Instruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 v v
MUL.D F0,F0,F2 1 v
ADD.D F0,F0,F4 1 v
S.D F0,0(R1) 1 v
L.D F0,0(R1) 2
MUL.D F0,F0,F2 2
ADD.D F0,F0,F4 2
S.D F0,0(R1) 2
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load RegR1
Load2 No
Add1 Yes Add RegF4 Mult1
Add2 No
Mult1 Yes Mult RegF2 Load1
Mult2 No
Store1 Yes Store Add1 RegR1
Store2 No
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add1
14
Figure 0.2. Two active iterations of loop
Instruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 v v
MUL.D F0,F0,F2 1 v
ADD.D F0,F0,F4 1 v
S.D F0,0(R1) 1 v
L.D F0,0(R1) 2 v v
MUL.D F0,F0,F2 2 v
ADD.D F0,F0,F4 2 v
S.D F0,0(R1) 2 v
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load RegR1
Load2 Yes Load RegR1-8
Add1 Yes Add RegF4 Mult1
Add2 Yes Add RegF4 Mult2
Mult1 Yes Mult RegF2 Load1
Mult2 Yes Mult RegF2 Load2
Store1 Yes Store Add1 RegR1
Store2 Yes Add2 RegR1-8
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add2
15
Figure 0.2. Two active iterations of loop
Instruction Iteration Issue Execute Write Result
L.D F0,0(R1) 1 v v
MUL.D F0,F0,F2 1 v
ADD.D F0,F0,F4 1 v
S.D F0,0(R1) 1 v
L.D F0,0(R1) 2 v v
MUL.D F0,F0,F2 2 v
ADD.D F0,F0,F4 2 v
S.D F0,0(R1) 2 v
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load RegR1
Load2 Yes Load RegR1-8
Add1 Yes Add RegF4 Mult1
Add2 Yes Add RegF4 Mult2
Mult1 Yes Mult RegF2 Load1
Mult2 Yes Mult RegF2 Load2
Store1 Yes Store Add1 RegR1
Store2 Yes Add2 RegR1-8
F0 F2 F4 F6 F8 F10 F12 F30
Qi Add2
16
Dynamic Branch Prediction
  • Static branch prediction in Appendix A
  • Branch Prediction Buffer a small memory indexed
    by the lower portion of the address of the branch
    instruction. The memory contains a bit that says
    whether the branch was recently taken or not
  • The prediction bit may have been placed there by
    another instruction

17
Figure 3.14. A Branch Prediction Buffer
  • Use the 4 low-order address bits of the branch
    (word address) to choose a row.

18
Nested Loops
  • Loop1 L.D F2,1600(R1)
  • DADDIU R2,R0,80
  • Loop2 L.D F0,1000(R2)
  • ADD.D F0,F0,F2
  • S.D F0,1000(R2)
  • DADDIU R2,R2,-8
  • BNEZ R2,Loop2
  • DADDIU R1,R1,-8
  • BNEZ R1,Loop1

19
Figure 3.7. States in 2-bit Prediction Scheme
20
Figure 3.8. Prediction Accuracy of 4096-entry
2-bit Prediction Buffer for SPEC89 Benchmarks
21
Figure 3.9. Prediction Accuracy of 4096-entry
2-bit Prediction Buffer versus an infinite 2-bit
Prediction Buffer for SPEC89
Write a Comment
User Comments (0)
About PowerShow.com