P6 and IA64 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

P6 and IA64

Description:

Pentium III ( Coppermine ) integrate 256 KB cache on die ... 256 bit in Pentium III ( Coppermine ) BSB speed is higher than mainboard's bus speed ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 34
Provided by: kmit
Category:
Tags: coppermine | ia64

less

Transcript and Presenter's Notes

Title: P6 and IA64


1
P6 and IA-64
  • 8086 released on 1978
  • Pentium release on 1993
  • 8086 has upgrade by Pipeline, Super scalar,
    Clock frequency, Cache and so on
  • But 8086 has limit, Hard to improve efficiency
  • Intel released new technology call P6

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
2
P6
  • Pentium's L2 cache problem
  • 256 512 KB
  • Pentium interface cache and main memory via
    external bus
  • Pipeline stall
  • 1. Prefetch read cache
  • and 2. Execution unit read data from main
    memory
  • ? Its use the same bus

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
3
P6
  • P6 move L2 cache on same package with CPU
  • Pentium Pro and Pentium II separate die with
    CPU with 512 KB cache
  • ( include Pentium III in Slot 1 )
  • Celeron integrate 128 KB cache on die
  • Pentium III ( Coppermine ) integrate 256 KB
    cache on die
  • ( 28 millions transistor on die of Pentium III )
  • Xeon integrate cache 2 MB on die

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
4
DIB
  • DIB ( Dual Independent Bus ) FSB and BSB
  • Cache Bus ( Back Side Bus ) 64 bit
  • 256 bit in Pentium III ( Coppermine )
  • BSB speed is higher than mainboards bus speed
  • Pentium Pro and Pentium II ( include Pentium III
    in Slot 1 )
  • BSB speed ½ CPU speed
  • Celeron and Pentium III ( Coppermine )
  • BSB speed CPU speed

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
5
BSB and FSB
  • Cache 128 KB has Hit Rate gt 90
  • BSB free CPU from use dedicate bus,
  • CPU clock independence with main board clock
  • ( september 2000 ) CPUs speed is 1.13 GHz,
  • but main board clock 133 MHz
  • FSB ( Front Side Bus ) Bus on main board
  • interface CPU with I/O and main memory
  • FSB speed 66 MHz, 100 and 133 MHz

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
6
P6 Architecture
  • Separate cache
  • L1 to L2 via BSB
  • L1 to Mem via FSB
  • L1 cache 32KB
  • - Instruction 16 KB
  • - Data Cache 16 KB

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
7
P6 Architecture
  • Dynamic Execution Microarchitecture
  • Fetch / Decode Unit
  • Dispatch / Execute Unit
  • Retire Unit
  • Instruction Pool
  • Dynamic Execution
  • Multiple Branch Prediction
  • Dynamic Data Flow Analysis
  • Speculative Execution

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
8
P6 Architecture
  • Multiple Branch Prediction
  • Concept from mainframe
  • Use multiple pipeline for call or return
    instruction
  • Fetch/Decode unit use to find branch instruction
  • Dynamic Data Flow Analysis
  • Analyze and search for out of order instruction
  • Dispatch/Execute unit scan and sort instruction
    for
  • Maximize usage of Execution unit

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
9
P6 Architecture
  • Speculative Execution
  • Dispatch/Execute unit use to analyze instruction
  • Forward execute instruction and send to
    instruction pool
  • Keep result in temporary register
  • Retire unit use to find executed instruction and
  • out of order ( No branch ), Commit and
    confirm result in
  • register, Then delete from pool
  • This 3 techniques, Made P6 is non sequential CPU

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
10
Pentium Pro
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
11
P6 Architecture
  • P6 next evolution of Intels CPU
  • No more 80X86 core
  • P6 Core is RISC
  • Redesign all instruction on RISC core
  • Backward compatible by mapping 80x86 to RISC
    command
  • Improve Branch Prediction

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
12
P6
  • Pentium Pro first P6 architecture
  • short life cycle, a few series of Pentium Pro
  • Speed 150, 166, 180 and 200 MHz
  • L1 Cache ( 8 8 )
  • L2 Cache 256 and 512 KB on same package
  • L2 Cache 1 MB at 200 MHz

Pentium Pro
Pentium Pro 1 MB L2 Cache
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
13
Pentium II
  • Pentium II Pentium Pro MMX
  • Speed 233, 266, 300 and 333 MHz
  • Package S.E.C.C ( Slot 1 )
  • FSB 66 MHz
  • L1 Cache 16 16, L2 Cache 512 KB
  • FSB 100 MHz, Speed 350, 400, 450 MHz
  • L2 Cache 2 MB name Pentium II Xeon ( speed
    cache CPU )
  • Package S.E.C.C 2 ( Slot 2 )

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
14
Celeron
  • Celeron Pentium II but low throughput (
    Same Core )
  • Speed 266, 300 MHz
  • No L2 Cache
  • L1 Cache 16 16
  • FSB 66 MHz
  • L2 Cache 128 KB ( Cache speed CPU )
  • Speed 300A, 333, 366, 400, 433, 466, and 500 MHz
  • FSB 66 MHz

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
15
Celeron
  • Package PPGA ( Plastic Pin Grid Array ) 370 Pin
  • Package FC-PGA ( SSE )
  • Change to 0.18 micron
  • Core 1.5 VDC

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
16
SSE
  • 3D speed upgrade by adding new instruction
  • Streaming SIMD Extension ( SSE )
  • Can jump over L2 Cache
  • Processor Serial Number
  • Pentium III
  • L1 Cache 16 16
  • L2 Cache 512 KB
  • ( Coppermine Cache 256 KB )

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
17
Pentium 4
  • In P6 architecture
  • Speed upgrade from 150 MHz to 1.13GHz
  • Change technology 0.5 to 0.25 and 0.13 micron
  • VCC 3.3 to 2.2 and 1.5 V
  • Pentium 4
  • Same core with Penutium III
  • But many thing has change

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
18
Pentium 4
  • 133 MHz bus to 200MHz and 400 MHz DDR ( Double
    Date Rate)
  • Double clock speed in integer ALU ( lt 1 clock /
    instruction )
  • Add Execution trace cache ( keep translate
    Micro-op )
  • Upgrade pipeline and Branch Prediction from P6
  • SSE Extension 2 ( new 144 instructions )
  • Floating point 128 bit
  • Dynamic Execution add Instruction Pool
  • from keep 40 Micro-Ops to 100 Micro-Ops
  • Execution Trace Cache Dynamic Execution
  • All Loop work in Instruction Pool

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
19
AMD K5
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
20
AMD K5
  • 5 Stage pipeline
  • Super scalar technique
  • Branch Prediction
  • Dynamic Execution
  • Architecture same as Pentium
  • But Pentium pipe line is better

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
21
K6-III
P6 architectures better than K6-III
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
22
K7 ( Althon )
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
23
Crusoe
  • Intel and AMD structure
  • RISC 80X86 Shell
  • Mappig 80x86 instruction to RISC Core
    instruction
  • Crusoe
  • CPU of Transmeta
  • Use software to help hardware work
  • Translate instruction by hardware ( Code
    Morphing )

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
24
Crusoe
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
25
Crusoe
  • Software Code Morphing
  • 128 bit VLIW ( Very Long Instruction Word ), 4
    instructions
  • 4 execution unit
  • Integer, Floating Point, Load/ Store and Branch
  • Crusoe TM 5400
  • 64 register
  • Instruction cache 64 KB
  • Data cache 64 KB
  • L2 Cache 256 KB
  • Speed 266 533 MHz
  • Low power consumption 1/3 of Pentium III

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
26
Crusoe
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
27
CPU Compare
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
28
IA-64 Intel HP
  • CISC and RISC processor
  • RISC core CISC
  • RISC processor PowerPC, Alpha, Sparc, MIPS
  • CPU problem
  • Jump Branch prediction
  • Read memory Cache and Prefetch queue
  • gt 1 instruction/clock Super scalar

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
29
Merced ( Itanium )
  • EPIC (Explicitly Parallel Instruction Computing)
  • 128 General register
  • 128 Floating point register
  • Parallel processing unit
  • VLIW ( Very Long Instruction Word ) 128 bit ( 41
    X 3 5 )
  • Compiler optimization

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
30
Branch Removal
cmp.ne p1,p2a,r0 // p1 lt- a ! 0 cmp.ne
p3,p4e,r0 // p3 lt- e ! 0 (p1) add bc,d // If
a ! 0 then add (p3) sub hi,j // If e !
0 then sub
if (a) b c d if (e) h i - j
  • Predicate Register ( 64 )

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
31
IA-64 Technique
  • 64 Bit processor, improve from P6 architecture
  • VLIW
  • Compiler optimization
  • Speculation feature ( reduce memory timing )
  • 6 GFLOPs FPU
  • 128 128 register
  • Support by many software provider
  • IA-32 Compatible ( Virtual 8086 Mode )
  • IA-32 to IA-64 by Hardware translation mechanism

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
32
Itanium
  • March 2000
  • 800 MHz
  • 20 Instruction / Clock
  • 3 level cache, 4 MB
  • 320 millions transistors
  • 25 millions for CPU
  • 295 millions for L3 cache

C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
33
Itanium
C. Vongchumyen 1 / 2004
Computer Organization and Assembly Language 8
Write a Comment
User Comments (0)
About PowerShow.com