Intels P6 Processor Family Architecture - PowerPoint PPT Presentation

Loading...

PPT – Intels P6 Processor Family Architecture PowerPoint presentation | free to download - id: b0ab0-Yzg1M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Intels P6 Processor Family Architecture

Description:

? '?ata?a??t???' ??d?s? t?? PPRO. ???te???? L2 Cache 512KB Slot 1 ... 450 550MHz(Katmai) 1.33GHz (Coppermine) 133MHz bus. 0.25 0.18 -013 . SIMD epe?t?se?? ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 33
Provided by: geor208
Learn more at: http://cgi.di.uoa.gr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Intels P6 Processor Family Architecture


1
Intels P6Processor FamilyArchitecture
?a?a??t??? Ge?????? ?? 438 ?etapt???a??
?????f?????? 3? ?ate????s? ?????µ??e?
????te?t?????? ?p?????st?? ?e?µe???? ???µ??? 2001
2
???????e?e? ?pe?e??ast?? Intel IA-32
  • P5
  • Pentium
  • Pentium MMX
  • P6
  • Pentium Pro
  • Pentium II
  • Celeron
  • Pentium III
  • Netburst
  • Xeon
  • Pentium IV

3
?a µ??? t?? ???????e?a? P6
  • Pentium PRO
  • 133-200MHz
  • 66MHz bus
  • 0.5µ
  • 256, 512 1?? L2 cache
  • 88KB L1 cache
  • Pentium II
  • ? ?ata?a??t??? ??d?s? t?? PPRO
  • ???te???? L2 Cache 512KB Slot 1 vs Socket
  • 233-300MHz(Klamath)-450MHz (Deschutes)
  • 66-100MHz bus
  • 0.25µ
  • ?a??te??? 16bit code
  • 16KB L1 cache
  • Celeron
  • ??????µ??? e?d?s? t?? PII
  • 66-100MHz bus
  • 128-256 L2 Cache
  • 266MHz 1.2GHz
  • 0.25µ-0.18-0.13µ
  • SIMD
  • 32KB L1 Cache
  • Pentium III
  • 450550MHz(Katmai)1.33GHz (Coppermine)
  • 133MHz bus
  • 0.25µ 0.18µ-013µ
  • SIMD epe?t?se??
  • 256KB on chip cache
  • Single Edge Contact Cartridge v2

4
??ta????sµ??(?at? t?? eµf???s? t?? ?e???? P6)
  • CPUs
  • AMDs K5
  • NexGens Nx586
  • Cyrixs M1
  • ??e??e?t?µata
  • ?a??te?? pipelining
  • Chip te???????a? 0.5µ a????? ?a? 0.35µ
  • 133MHz CPU Clock
  • On chip L2 cache 256KB
  • St???e?a p?? eµfa?????ta? ap? a?ta????st??
  • Micro-ops
  • Dynamic Execution (out of order)
  • Branch Prediction

5
P6 S???pt???
6
? d?af??? t?? a???te?t?????? P6
  • S?µf??es? ??a???? / ???µ??
  • ?p?t?????e? ße?t??s? t?? ap?d?s?? µe ?p?????se?
    f????? te???????e? µ??µ??.
  • ??a??st?p???s? t?? ????s?? st? bus t??
    s?st?µat??. ? P6 ???s?µ?p??e? ????te?? t?? 25
    t?? bandwidth a?t?? gt ?e??ss?te?e? CPUs ?/?a?
    IO.
  • S?µf??es? Se???a??? ??t??es?? ??t????
  • Dynamic Execution Out Of Order, Data flow
    Analysis, Branch Prediction, Speculative
    Execution.
  • Deeper Pipelining
  • ?e?t??s? ?atas?e???
  • ?a?µ?? Sµ?????s??.
  • ????s? s?????t?ta? ?e?t?????a? a????? e??te????
    ?a? ?pe?ta ?a? es?te???? .

7
Level 3 Super scalar engine.
  • ????a??e? t? µ??e??? t?? ???e f?s?? t?? pipeline
    ?a? a????e? ? a???µ?? t?? f?se??
  • 3 ??????? µ???de? e?t??es??
  • Fetch/Decode Unit
  • Geµ??e? t? Instruction pool µe e?t???? µe ß?s?
    t?? instruction pointer. Se pe??pt?s? branch
    p??spa?e? ?a p??ß???e? st???.
  • Dispatch/execute Unit
  • ?p? t? instruction pool e?te?e? ?p??a e?t???
    ??e? d?a??s?µ??? t??? te?est?? t??. Dataflow
    analysis ??a ?a e?t?p?ste? ? se??? µe t?? ?p??a
    ?a e?te?este? ta??te?a t? p????aµµa
  • ?a ap?te??sµata de? ???st???p?????ta?
    (speculative execution)
  • Retirement Unit
  • ??et??e? t? Instruction Pool ??a e?t???? ??
    ?p??e? p??pe? ?a ???st???p??????? ?a? st???e? t??
    a??a??? st? µ??µ?.
  • ?p?d?se??
  • Fetch/Decode µ???? 3 e?t???? a?? ?????
  • Dispatch/execute µ???? 5 e?t???? a?? ????? (3
    t?p???)
  • Retirement µ???? 3 e?t???? a?? ?????

8
Non Blocking Caches
  • Dual Ported Caches
  • L1 Performance
  • ??a LOAD ?a? µ?a STORE ta?t?????a se ???e ?????.
  • Non-Blocking - 4 Stage Pipelined L2 Cache
  • ? ap?t???a e??es?? µ?a? ?ata????s?? de? p???a?e?
    pa?s? ?e?t?????a? t?? CPU.
  • 4 a?t?se?? e??p??et???ta? ?a? ep?p???? 12
    µp????? ?a ß??s???ta? se a?aµ???.
  • Transaction Bus
  • ?p?st????? d?s???????. 4 a?t?se?? µp????? ?a
    e???eµ??? ??a ??a? p6 e?? 8 ??a t? s????? t??
    d?a????.
  • ?e ?????ta? ?????? se a?aµ????.
  • Out Of Order Bus
  • ?? p??t?????? d?a?e???s?? t?? d?a???? µe
    µ??a??sµ??? Retry Defer, se ??a MP pe??ß?????
    ap?????eta? e?t?? se???? (Out Of Order).
  • 64GB Cacheable memory

9
MP Ready
  • ?p????????a µe L2 Cache se ?d??t??? d?a???
  • ?pe?e?????s? ??a???? ?p???ste?? Hardware
    ?p?st??????
  • MESI
  • Snooping memory
  • G?a t?? e?????t?ta t?? ?at?stas?? t?? cache.
  • 4 CPUs ??a p???? e?µet???e?s? d?a????
  • ??s?µ?t?s? APIC
  • ?p???ste?? Hardware ?p?st??????

10
P6 ???f??a
Dual Independent Bus Architecture (DIB) ??a????
es?te????? ?e?t?????a? a?e???t?t?? a?t?? t??
ep????????a? µe t? s?st?µa, ?? ?p???? ?e?t???????
se e?te??? d?af??et???? s????t?te?. L1 Code
Cache 4 way set associative, ???a??µ??? se
??aµµ?? t?? 32 bytes, pa?a??????e? t?? a?t?se??
Next_IP L1 Data Cache 2 way set associative
???a??µ??? se ??aµµ?? t?? 32 bytes, pa?a??????e?
t?? ?e???t??e? d?e????se?? t?? µ???d?? e?t??es??
e?t????. Registers 40 f?s???? ?ata????t?? t?? 64
bits ??a ???s? se renaming. 8 data registers
???????, ??at?? st?? p????aµµat?st?. 8 FP
Registers ?a? 8 MMX Registers ?? ?p????
ta?t????ta? µe t??? FP. 6 16 bit segment
registers ?a? ??a? flag register. Performance
Counters ?s?te????? ?ata????t?? p?? t?????
stat?st??? st???e?a.
11
P6 Functional Units Diagram
  • Key points
  • In order Section
  • Instruction Cache
  • Instruction Fetch / Decode Units
  • Data Cache
  • Out of Order Section
  • Execution Engine
  • Micro-ops (load / store)
  • Reservation Stations

12
Pipeline Stages
??e???? branch, a??a?? Next ID
Commit
??t??es? ??t????
??t??s? pe??st??f? s?µe??s? 16 bytes
S?µe??s? ?a? ap?st??? se Instruction Pool
???µ???s? IP
?p???d???p???s?
?et???µas?a ?ata????t??
?p?fas? ?a? ???t?s? ap? IP e?t???? p??? e?t??es?
13
Instruction Fetching
  • Instruction Prefetcher
  • Next IP ?????e? t?? ep?µe?? ??s? µ??µ?? ap? t??
    ?p??a ??te?ta? e?t???.
  • Instruction Cache ?pa?t? µe 16 st????sµ??a
    bytes st?? a?t?s? t?? Next IP
  • 3 instruction Decoders pa?a?aµß????? ta 16 bytes
    af?? pe??st?af??? ?a? s?µe?????? ?? e?t????
    (start/end)
  • 128bit bus

14
Instruction Decoders
  • Decorders
  • ??a? general decoder µ?a macro op a?? ?????.
  • ??? simple decoders µ?a micro op a?? ?????. ??
    s??a?t?s??? macro op gt stall ??a ??a ????? ?a?
    µetaf??? t?? st?? general
  • Micro ops Macro ops
  • Simple instructions of the register-register form
    are only one micro-op.
  • Load instructions are only one micro-op.
  • Store instructions have two micro-ops.
  • Simple read-modify instructions are two
    micro-ops.
  • Simple instructions of the register-memory form
    have two to three micro-ops.
  • Simple read-modify write instructions are four
    micro-ops.
  • Complex instructions generally have more than
    four micro-ops, therefore they take multiple
    cycles to decode.

15
Instruction Decoding (µ-ops)
  • ? Decoder µetaf???e? ???e IA32 e?t??? se ??a
    t??ad??? µ-op
  • 2-logical sources, 1 logical destination
  • S?????? 1 IA32 instruction -gt 1 µ-op
  • ??? sp???a 1 IA32 instruction -gt 4 µ-op
  • ???? sp???a Microcode Instruction Sequencer.
  • Microcode µ?a se??? ap? p??e???aµ??e? a???????e?
    ap??? µ-ops.

16
Instruction Decoding (RAT)
  • ?? µ-ops ap?st????ta? st?? RAT (Register Alias
    Table) e? se???
  • ???p??e?ta? register renaming ???????
    ?ata????t?? IA32 se f?s????? P6
  • ?? e?t???? µ-ops pe??st???????ta? µe bits
    ?at?stas?? (allocator stage)
  • ?? ap?t??esµa p????e?ta? st? ???? a?aµ????
    e?t???? (instruction pool)
  • Instruction pool ReOrder Buffer (ROB)
    ???p??e?ta? sa? content addressable memory t?? 40
    e???af??

17
Instruction Dispatch/Execution
  • ?p??????ta? e?t???? ap? t? Instruction Pool µe
    ß?s? t? status t???.
  • ?????eta? a? e??a? d?a??s?µ?? ?? te?est??
  • ?????eta? a? e??a? d?a??s?µ? ? apa?t??µe?? µ???da
    e?t??es??.
  • ? RS ap?µa????e? t?? e?t??? ap? t? ROB ?a? t??
    p????e? st? µ???da. ?e? ??e? s?µas?a ? se??? st?
    ROB.
  • 5 ported RS (?p?? st? s??µa)
  • max 5 µ-ops per cycle, 3 sustained
  • Data flow analysis (?e?d? FIFO)

18
Instruction Dispatch/Execution (Branches)
  • ??p??e? ap? t?? µ-ops e??a? branches.
  • S?µe?????ta? (tagged) ?a??? e?s????ta? e?t??
    se???? µe t? d?e????s? µetap?d?s?? ?a? t?
    d?e????s? ap???????.
  • ?ta? t? branch p?a?µat??? e?te?este? s????????ta?
    ta st???e?a µe e?e??a p?? ?p?t????a?.
  • Se ep?t???a t? branch ep??????eta? (retirement)
    ?a??? ?a? ??e? ?? e?t???? µeta?? a?t?? ?a? t??
    ep?µe??? branch (speculative execution).
  • ?? BTB p??ß??pe? t?? pe??ss?te?e? a??? ??? ??e?.
    ?e????e? 512 e???af?? p??????µe??? branches ?a?
    st????.
  • Se ap?t???a ? Jump Execution Unit (JEU)
    ap?µa????e? ??e? t?? e?t???? ap? t? ROB ?a? t?
    pipeline s??e???e? ap? t? ??a s?st? d?e????s?.
  • ??st??
  • Not taken on hit ?aµ?a ep?pt?s?
  • Taken on hit ?a??st???s? e??? ?????? (p??s?
    fetch ?a? issue)
  • Mis-predicted e????st? ??st?? 9 ?????? (t?
    µ???? t?? in order issue pipeline, ??a ????? IF
    ?a? t? ????? t?? ep?????s?? t?? s?st?? branch).
    ??p??? 10, µ???st? 26 ??????.
  • Static predictions, conditional ?a? unconditional
    on-hit 5-6 ??????.

19
Branch Prediction (Dynamic)
  • 4 branch predictions/??aµµ? 128bits.
  • 2 level adaptive(Yeh method).
  • 4 bits p????f???a? a?? branch. ???ß??pe?
    a???????e? branches.
  • Return Stack Buffer (RET instructions)
  • ??e???? BTB ??a p??????µe?? e?t??es? t?? branch.
  • ?? de? ?p???e? BTB entry ???eta? static
    prediction gt e??µ???s? BTB.
  • ???s???? e?t???? CMOV.

20
Branch Prediction (Static)
21
Retire Unit
  • ??et??e? t?? ?at?stas? t?? µ-ops st? ROB.
  • ?se? ????? ?????????e? ?a p??pe? ?a ap?µa????????
    ap? t? ROB.
  • ???pe? ?a f???t?se? ??a t?? ???? t???s? t??
    se???? t??? sa? IA32 instructions ?a? µ???sta e?
    µeta?? interrupts, traps, faults, breakpoints ?a?
    mis-predictions.
  • ?a??? ap?fas??e? t?? ep?µe?e? e?t???? p???
    ep?????s?, a?t?? t?? p????e?, in-order st??
    Retirement Register file.

22
Bus Interface Unit
  • ??? t?p?? e?t???? loads (1 µ-op, address, width,
    register) ?a? stores (2 µ-ops, µ?a ?e??? t?
    d?e????s? ?a? µ?a ta ded?µ??a)
  • ??t? ta stores de? e?te????ta? speculatively
    ??at? de? ?p???e? d??at?t?ta undo.
  • ??t? ta stores de? a?as??t?ss??ta? µeta?? t???.
  • ??a store e?te?e?ta? µ??? ?ta? ?a? ta d?? µ-ops
    t?? e??a? ?t??µa ?a? de? e???eµ??? p??????µe?a
    stores
  • ?? MOB e??a? ??t? sa? RS ?a? reorder buffer ??a
    loads ?a? stores, t? ?p??? ep?t??pe? loads ?a
    pe????? loads ?a? stores ?a? ta epa?e????e? ?ta?
    e?de??µe?e? s?????e? µp???a??sµat?? a????ta?
    (dependency of resources)

23
St???e?a ??t??es??
  • ??a load ?a? ??a store st?? ?d?a d?e????s?
    µp????? ?a e?te?est??? st?? ?d?? ?????.
  • ? ?a??st???s? t?? stores de? ??e? µe???? s?µas?a
    st?? ap?d?s? (3-5 ?at? t?? Intel).
  • Register Renaming ???e read e??? ???????
    register a?af??eta? st?? ?d?? f?s???. ???e
    ep?µe?? write a?af??eta? se ??? f?s??? register.
  • ?e? e??a? d??at?? ?a e?te?este? µ?a FMUL se ?????
    aµ?s?? ep?µe?? t?? p??????µe??? t??.
  • FPU stages ?p?? P5, d??
  • ?etat??p? te?est?? se es?te???? format
  • ??t??es? ?e?t?????a? se ?????te?? ep?ped?
    a???ße?a?
  • St???????p???s? ?a? µetat??p? te?est?? se
    standard format
  • ??af??? sf??µat??

24
Execution Modes
  • Protected mode. The native state of the
    processor. All instructions and architectural
    features are available, providing the highest
    performance and capability. Recommended mode for
    all new applications and operating systems. Also
    offers the ability to directly execute
    real-address mode 8086 software in a protected,
    multi-tasking environment. This (Virtual-8086)
    mode is not actually a processor modebut a
    protected mode attribute that can be enabled for
    any task.
  • Real-address mode. Provides the programming
    environment of the Intel 8086 processor, with a
    few extensions (such as the ability to switch to
    protected or system management mode). The
    processor is placed in real-address mode
    following power-up or a reset. From real-address
    mode, only a single instruction is required to
    switch to protected mode.
  • System management mode. A standard architectural
    feature unique to all Intel processors, beginning
    with the Intel386 SL processor. Provides an
    operating system or executive with a transparent
    mechanism for implementing platform-specific
    functions such as power management. The processor
    enters SMM the external SMM interrupt pin (SMI)
    is activated or an SMI is received from the
    advanced programmable interrupt controller
    (APIC). In SMM, the processor switches to a
    separate address space while saving the entire
    context of the currently running program or task.
    SMM-specific code may then be executed
    transparently. Upon returning from SMM, the
    processor is placed back into its state prior to
    the system management interrupt.

25
Addressing Modes
  • Flat memory model memory appears to a program
    as a single, continuous address space, called a
    linear address space. Code (a programs
    instructions), data, and the procedure stack are
    all contained in this address space. The linear
    address space is byte addressable.
  • Segmented memory model memory appears to a
    program as a group of independent address spaces
    called segments. Code, data, and stacks are
    typically contained in separate segments. To
    address a byte in a segment, a program must issue
    a logical address (far pointer), which consists
    of a segment selector and an offset. The segment
    selector identifies the segment to be accessed
    and the offset identifies a byte in the address
    space of the segment. Up to 16,383 segments of
    different sizes and types. The processor
    translates each logical address into a linear
    address to access a memory location,
    transparently to the application program.
    Increases the reliability of programs and
    systems.
  • Real-address model same as the Intel 8086
    processor, for backward compatibility. Uses a
    specific implementation of segmented memory in
    which the linear address space for the program
    and the operating system/executive consists of an
    array of equally sized segments.
  • In Protected Mode all modes can be used
  • In Real and SM mode only the Real-address mode is
    available

26
???f??a T?µata
  • Data flow analysis
  • ?? pe??ss?te?e? t?? µ?a? e?t???? e??a? ?t??µe?
    st? ROB p??? ap?st??? se µ?a µ???da e?t??es??
    t?te ???eta? ep????? ß?se? ?e?d? FIFO a??????µ??
    ? ?at??????.
  • ECC
  • Se ep??tas? t?? e?????? ?s?t?µ?a? sta s?µata t??
    address bus ?a? t?? e?????? d?s???????,
    ?p?st????e? ??e??? ?a? d?????s? sfa??µt?? sta
    data signals ?a? ??a t? d?a??? t?? L2 Cache a???
    ?a? t? d?a??? s?st?µat??. ?ts? p??stat????ta?
    ???s?µa ded?µ??a, af?? d????????ta? single bit
    errors ?a? e?t?p????ta? ta double bit errors. ?a
    sf??µata µp????? ?a ?ata???f??ta? ?a? ?ts? st?
    s????e?a ?a e?t?p????ta? ?? ast???e? t??
    s?st?µat??.
  • MMX
  • Math Matrix Extensions. ??p?? SIMD.
  • ?p??tas? t?? ßas???? ?epe?t????? e?t???? t?? x86
    ??a t? d?a?e???s? e????a? video ?a? ????, t?
    ?p??? eµfa???eta? st??? P5.
  • Streaming SIMD extensions
  • 70 ??e? e?t???? p?? ep?t??p??? d?a?e???s?
    e????a?, 3D ape?????s??, ????, video ?a?
    a?a?????s?? f????.
  • Intel Processor Serial Number (PII ?a? PIII)
  • ??s????a ??t??es?? Unaligned e?t????
  • ??t???? p?? p????pt??? ap? 16bit ??d??a t??
    pa?e????t??. ?????ase st? ??????? eµf???s? t??
    PII
  • PGE (page global enable)
  • ?p?t??pe? t? s?µe??s? se??d?? sa? Global (p?
    kernel pages) ?ste ?a µ?? e??a?a?????ta? ??
    e???af?? t?? TLB ?at? t? context switching.

27
MESI MODEL
St???? ? pa?a???????s? t?? ?at?stas?? t?? ??aµµ??
t?? cache ????? t?? ?d??? ap?????? t???.
  • Modified
  • ?a ded?µ??a ????? a??a?te? ???? write hit. Ta
    p??pe? ?a ??aft??? p?s? st?? ????a µ??µ?.
  • Exclusive
  • ?a ded?µ??a p??a??tata pe??????ta? µ??? st?
    t????? cache
  • Shared
  • ?a ded?µ??a ß??s???ta? p??a??tata ?a? se ???a
    caches a??? de? ????? ???e? a??a???
  • Invalid
  • ?????? ?at?stas? ?p?? ? ??aµµ? t?? cache e??a?
    ?????

28
Associative Caches / CAM
  • Associative
  • ??a block a?t?st?????eta? se ?p??ad?p?te ??aµµ?
    t?? cache
  • ? d?e????s? µetaf???eta? se tag ?a? word
  • ?? tag µ??ad??? p??sd?????e? ??a block
  • ?? tag ???e ??aµµ?? e????eta? ??a s?µpt?s? gt
    ?a??st???s? a?a??t?s??
  • Set Associative
  • ?????eta? se sets
  • ???e set pe???aµß??e? ??p??e? ??aµµ?? (lines)
  • ??a ded?µ??? block a?t?st?????eta? se µ?a ap? t??
    ??aµµ?? e??? ?a? µ??? set (n-way ?p?? n a???µ??
    ??aµµ??) ?? 2-way set associative pe????e? sets
    t?? d?? ??aµµ??. ????ap?? blocks a?t?st?????? st?
    ?d?? set.
  • ?e???eta? d?aµµat??? ? ?????? a?a??t?s??
  • CAM
  • ?a ?d?a ded?µ??a pe?????f??? t? s?µe??
    ap????e?s??, pe????????ta? t?? a????e? a?a??t?s??
  • ??p??a ded?µ??a ????? ??a ?a? µ??? ??a s?µe??
    ap????e?s?? t? ?p??? e??a? p???a????sµ???
  • ?a pa??µ??a ded?µ??a ap????e???ta? se ?e?t??????
    ??se??
  • ? ?a??? pa???e? ta ded?µ??a ?a? pa???e? t?
    d?e????s? µ?sa se ??a µ??? ?????
  • ???sfata ???p??????a? hardware components CAM.

29
?p?te??sµata
  • 90 ???? p??ß?e?? branches a??µa ?a? ?ta? a?t?
    ß??s???ta? se µe???? ß????
  • 25 ???s? t?? bandwidth d?a???? s?st?µat??
  • ?e?t??s? ep?d?se?? e?t???? a???a???

30
S?????t???
31
Rerefences
  • Intels P6 Uses Decoupled Superscalar Design by
    Linley Gwennap (Microdesign Resources vol 9 No 2
    , 16 February 1995)
  • The P6 Architecture Background Information for
    Developers by Intel Corporation
    (1995-p6arc.pdf)
  • IA-32 Intel Architecture Software
    DevelopersManual Volume 2 Instruction Set
    Reference by Intel Corporation (2001
    24547104.pdf)
  • Pentium Pro Family Developers Manual
    Vol2Programmers Reference Manual , by Intel
    Corporation (Dec 1995 - 24269101.pdf)
  • The Pentium Pro At 150, 166 , 180 and 200MHz,
    by Intel Corporation (Jun 1997 - 24276905.pdf)
  • The Intel Architecture Optimization Manual, by
    Intel Corporation (1997 -24281601.pdf)
  • Pentium II Developers Manual, by Intel
    Corporation (Oct 1999 - 24350201.pdf)
  • The Intel Celeron Processor up to 1.1GHz
    Datasheet, by Intel Corporation (Aug 2001 -
    24365819.pdf)
  • The P6 Family of Processors Hardware
    Developers Manual, by Intel Corporation (Sep
    1998 - 24400101.pdf)
  • Pentium III Processor at 450 MHz to 1.13 GHz
    Datasheet, by Intel Corporation (Jul 2000 -
    24445208.pdf)

32
Functional Units (e?a??a?t??? ???)
About PowerShow.com