The x86 Server Platform - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

The x86 Server Platform

Description:

A great deal of innovation has centered around approximating this perfect world. CISC ... Clock speeds increase because instructions are simpler ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 35
Provided by: michaela46
Category:
Tags: clock | platform | server | world | x86

less

Transcript and Presenter's Notes

Title: The x86 Server Platform


1
The x86 Server Platform
  • .. Resistance is futile.
  • Dec 6, 2004

2
Server shipments Total vs x86
3
Market Share Servers, United States, 2Q04  
Michael McLaughlin, Market Share Servers, United
States, 2Q04  7 October 2004, Gartner
4
x86 Platform CPUs
  • Intel
  • Xeon MP Gallatin (future is Potomac)
  • Xeon SP/DP EM64T - Nacona
  • Itanium II MP Madison (future is Montecito)
  • AMD
  • Opteron

5
Gallatin - MP
  • 130 nm
  • 3 GHz
  • 4 MB L3 Cache
  • FSB - 400 MHz

6
ES7000 32 Gallatins
7
Nacona Single Processor with EM64T
  • 90 nm
  • Clock Speed 3.2-3.6 GHz
  • L3 4 MB
  • FSB 800 Mhz

8
Itanium II - Madison
  • 130 nm
  • 9 MB L3 cache
  • 1.6 GHz
  • FSB 400 MHz

9
(No Transcript)
10
(No Transcript)
11
STOP
  • Why Multi-Core?
  • .. And while were at it, why Multi-Threading?
  • Its all about the balance of
  • Silicon real estate
  • Compiler technology
  • Cost
  • Power
  • . to meeting the constant pressure to double
    performance every 18 months

12
Memory Latency vs CPU Speed
MicroprocessorOperating Frequency (GHz)
DRAM AccessFrequency (10-9 sec)-1
10.0
10.0
1.0
1.0
Microprocessor on-chip clock
Commodity DRAM
0.1
0.1
0.01
0.01
1990
1995
2000
2005
2010
Production Year
13
Processor Architecture
  • When latency ? Ø and bandwidth ? 8 we will have
    the perfect CPU
  • A great deal of innovation has centered around
    approximating this perfect world
  • CISC
  • CPU Cache
  • RISC
  • EPIC
  • Multi-Threading
  • Multiple Cores

14
Complex Instruction Set Computer
  • Hardware implements assembler instructions
  • MULT A, B
  • hardware loads registers, multiplies and stores
    results
  • Multiple clocks needed for an instruction
  • RAM requirements are relatively small
  • Compilers translate high level languages down to
    assembler instructions Von Neumann

hardware
http//www.hardwarecentral.com/hardwarecentral/tut
orials/2427
15
CPU Cache
  • When CPU speeds started to increase, memory
    latency emerged as a bottleneck
  • CPU caches were used to keep local references
    close to the CPU
  • For SMP systems, memory banks were more than a
    clock away
  • It is not uncommon today to find 3 orders of
    magnitude between the fastest and slowest memory
    latency

16
Reduced Instruction Set Computer
  • Hardware is simplified fewer transistors are
    needed for full instruction set
  • RAM requirements are higher to store intermediate
    results and more code
  • Compilers are more complex
  • Clock speeds increase because instructions are
    simpler
  • Deterministic, simple instructions allow
    pipelining

17
Pipelining
25 busy
Higher Clock Speeds!
100 busy
80 busy
60 busy
40 busy
18
Branch Prediction
  • While processing in parallel, branches occur
  • Branch prediction is used to increase the
    probability that a specific branch will be
    followed
  • If incorrect, the pipeline is dead and the CPU
    stalls
  • Statistics
  • 10-20 of instructions are branches
  • Predictions are incorrect about 10 of the time
  • As the pipeline increases, probability of miss
    increases and cycles will be discarded
  • 80-deep pipeline / 20 branches / 10 miss 80
    chance of miss and a penalty of 80 cycles

19
Itanium II Epic Instruction SetExplicitly
Parallel Instruction Computing
  • Compiler can indicate code that can be executed
    in parallel
  • Both branches are pipelined
  • No lost cycles due to miss-prediction
  • Pipeline can be deeper
  • Complexity continues to move into the compiler

20
Multi-Threading
21
(No Transcript)
22
Multiple Cores
  • Fabrication sizes continue to diminish
  • The additional real estate has been used to put
    more and more memory on the die
  • Multi-core technology provides a new way to
    exploit the additional space
  • The clock rates cannot continue to climb due to
    the excessive heat
  • P C V2 f C - switch capacitance V
    Supply Voltage f clock frequency
  • Multiple cores is the next step to providing
    faster execution times for applications

23
(End of 2005?)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
AMD Opteron 800 Series
  • 130 nm
  • Clock Speed 1.4-2.4 GHz
  • L2 1 MB
  • 6.4 GB/s Hypertransport

31
Architectural Comparison
Hypertransport - 6.4 GB/s
Opteron
Opteron
Xeon
Xeon
Xeon
Xeon
6.4 GB/s
Opteron
Opteron
PCI-XBridge
MemoryAddressBuffer
DDR 144-bit
PCI-XBridge
SNC
PCI-XBridge
PCI-XBridge
MemoryAddressBuffer
PCI-XBridge
I/OHub
OtherBridge
MemoryAddressBuffer
I/OHub
MemoryAddressBuffer
32
Mapping Workloads onto Architecture
  • Consider a dichotomy of workloads
  • Large Memory Model This needs a large, single
    system image and a large amount of coherent
    memory
  • Database apps - SQL Server / Oracle
  • Business Intelligence Data Warehousing
    Analytics
  • Memory-resident databases
  • 64 bit architectures allow memory addressability
    above 1 TB
  • Small/Medium Memory Model This can be
    cost-effective in workloads that do not require
    extensive shared memory/state
  • Stateless Applications and Web Services
  • Web Servers
  • Clusters of systems for parallelized applications
    and grids

33
Large Server Vendors
  • Intel Announcement (Nov 19)
  • Otellini said product development, marketing and
    software efforts (for Itanium) will all now be
    aimed at "greater than four-way systems". He also
    said, "The mainframe isn't dead. That's where I'd
    like to push Itanium over time."
  • The size of the SMP is affected by Intels chip
    set support for coherent memory
  • OEM Vendors (Unisys, HP, SGI, Fujitsu, IBM)
  • Each has unique chip set to build basic
    four-ways into large SMP systems
  • IBM has Power5, which is a direct competitor
  • Intel 32-bit and EM674T
  • This could emerge as the flagship product

34
Where Are We Going?
  • Since the early CISC computers, we have moved
    more and more of the complexity out to the
    compiler to achieve parallelism and fully exploit
    the silicon real estate
  • The power requirements, along with the smaller
    fabrication sizes, have pushed the CPU vendors to
    exploit multiple cores
  • The key to performance for these future machines
    will be the applications ability to exploit
    parallelism
Write a Comment
User Comments (0)
About PowerShow.com