CSE 520 Computer Architecture Lec 4

About This Presentation

Title:

CSE 520 Computer Architecture Lec 4

Description:

Chip manufacturing begins with silicon, a substance found in sand ... Each generation drops in dollar price by a factor of 10 to 30 over its lifetime ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 54

Provided by: impac1

Learn more at: https://impact.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSE 520 Computer Architecture Lec 4

1
CSE 520 Computer Architecture Lec 4
Quantifying Cost, Energy-Consumption,
Performance, and Dependability (Chapter 1)

Sandeep K. S. Gupta
School of Computing and Informatics
Arizona State University

Based on Slides by David Patterson and M. Younis
2
Moores Law gets life-term extension

Intel, IBM unveil new chip technology -
Breakthrough, using new material, will allow
processors to become smaller and more powerful
CNN Money http//money.cnn.com/2007/01/27/technolo
gy/bc.microchips.reut/index.htm?cnnyes
Intel Corp. and IBM have announced one of the
biggest advances in transistors in four decades,
overcoming a frustrating obstacle by ensuring
microchips can get even smaller and more
powerful.
The latest breakthrough means Intel, IBM and
others can proceed with technology roadmaps that
call for the next generation of chips to be made
with circuitry as small as 45 nanometers, about
1/2000th the width of a human hair.
Researchers are optimistic the new technology
can be used at least through two more technology
generations out, when circuitry will be just 22
nanometers.
This gives the entire chip industry a new life
in terms of Moores Law, in all three of the big
metrics performance, power consumption, and
transistor density, David Lammers, director,
WeSRCH.com social networking site for
semiconductor enthusiasts (part of VLSI Research
Inc.)

3
What is the Breakthrough?

How to reduce energy loss in microchips
transistors as the technology shrinks as the
transistor shrinks to the atomic scale?
The problem is that the silicon dioxide used for
mote than 40 years as an insulator inside
transistors has been shaved so thin that an
increasing amount of current is seeping through,
wasting electricity and generating unnecessary
heat.
Intel and IBM have discovered a way to replace
SiO2 with various metals e.g Intel is using
silvery metal called hafnium in parts called the
gate, which turns the transistor on and off, and
the gate dielectric, and insulating layer, which
helps improve transistor performance and retain
more energy.

4
What does it mean for Intel and Arizona?

The chip to be used in Intels new Penryn
microprocessor, will be produced in Intel
facilities throughout the world. But the new
3B plant, called Fab 32, will allow Chandler
to remain a key site for the companys
manufacturing operation. The Arizona Republic,
New Intel chip is fab news for Chandler, M.
Jarman, Jan. 28, 2007.
Intel says the new chip is a result of the
biggest breakthrough in transistor technology in
40 years.
It also ratchets up the competition between
Intel and rival chipmaker Advanced Micro devices
Inc., which helped IBM develop the technology
along with electronics maker Sony Corp. and
Toshiba Corp.
Intel will be the first to have this in
production, but IBM could potentially have a
density advantage compared with Intels scheme.
But both should get the gold medal.

Source The Arizona Republic, Jan 28, 2007.
5
Recap

Execution (CPU) time is the only true measure of
performance.
One must be careful when using other measures
such as MIPS.
Computer architects (Industry) need to be aware
of Technology trends to design computer
architectures which address the various walls.
Increasing proportion of Static (or leakage)
current (in comparison to Dynamic current) is a
cause of concern
One of the motivation for multicore design is to
reduce Thermal dissipation

6
Amdahls Law
The performance enhancement possible with a given
improvement is limited by the amount that the
improved feature is used

A common theme in Hardware design is to make the
common case fast
Increasing the clock rate would not affect
memory access time
Using a floating point processing unit does not
speed integer ALU operations
Example Floating point instructions improved to
run 2 times faster but only
10 of the actual instructions
are floating point
Exec-Timenew Exec-Timeold x (0.9 .1/2)
0.95 x Exec-Timeold
Speedupoverall Exec-Timenew /
Exec-Timeold 1/0.95 1.053

Slide by M. Younis
7
Processor Performance Equation
Where Ci is the count of number of
instructions of class i executed
CPIi is the average number of cycles per
instruction for that instruction class
n is the number of different instruction
classes
Slide by M. Younis
8
Performance Metrics - Summary

Maximizing performance means
minimizing response (execution) time

Figure is courtesy of Dave Patterson
9
Chapter 1 Fundamentals of Computer Design

Technology Trends Culture of tracking,
anticipating and exploiting advances in
technology
Understanding Cost
Careful, quantitative comparisons
Define, quantity, and summarize relative
performance
Define and quantity relative cost
Define and quantity dependability
Define and quantity power

10
Moores Law 2X transistors / year

Cramming More Components onto Integrated
Circuits
Gordon Moore, Electronics, 1965
on transistors / cost-effective integrated
circuit double every N months (12 N 24)

11
Latency Lags Bandwidth (last 20 years)

Performance Milestones
Processor 286, 386, 486, Pentium, Pentium
Pro, Pentium 4 (21x,2250x)
Ethernet 10Mb, 100Mb, 1000Mb, 10000 Mb/s
(16x,1000x)
Memory Module 16bit plain DRAM, Page Mode DRAM,
32b, 64b, SDRAM, DDR SDRAM (4x,120x)
Disk 3600, 5400, 7200, 10000, 15000 RPM (8x,
143x)

12
Rule of Thumb for Latency Lagging BW

In the time that bandwidth doubles, latency
improves by no more than a factor of 1.2 to 1.4
(and capacity improves faster than bandwidth)
Stated alternatively Bandwidth improves by more
than the square of the improvement in Latency

13
6 Reasons Latency Lags Bandwidth

Moores Law helps BW more than latency
Distance limits latency
Bandwidth easier to sell (biggerbetter)
Latency helps BW, but not vice versa
Bandwidth hurts latency
Operating System overhead hurts Latency more
than Bandwidth

14
Summary of Technology Trends

For disk, LAN, memory, and microprocessor,
bandwidth improves by square of latency
improvement
In the time that bandwidth doubles, latency
improves by no more than 1.2X to 1.4X
Lag probably even larger in real systems, as
bandwidth gains multiplied by replicated
components
Multiple processors in a cluster or even in a
chip
Multiple disks in a disk array
Multiple memory modules in a large memory
Simultaneous communication in switched LAN
HW and SW developers should innovate assuming
Latency Lags Bandwidth
If everything improves at the same rate, then
nothing really changes
When rates vary, require real innovation

15
Chapter 1 Fundamentals of Computer Design

Technology Trends Culture of tracking,
anticipating and exploiting advances in
technology
Understanding Cost
Careful, quantitative comparisons
Define, quantity, and summarize relative
performance
Define and quantity relative cost
Define and quantity dependability
Define and quantity power

16
Trends in Cost

Textbooks usually ignore cost half of
cost-performance because costs change.
Yet understanding cost and its factors is
essential for designers to make intelligent
decisions about what features to include in
designs when costs is an issue
Agenda Study impact of time, volume and
commodification
Underlying principle learning curve
manufacturing costs decrease over time
Measured by change in yield the percentage of
manufactured devices that survives the testing
procedure

17
Integrated Circuits Fueling Innovation

Chip manufacturing begins with silicon, a
substance found in sand
Silicon does not conduct electricity well and
thus called semiconductor
A special chemical process can transform tiny
areas of silicon to either
. Excellent conductors of electricity (like
copper)
. Excellent insulator from electricity (like
glass)
. Areas that can conduct or insulate under a
special condition (a switch)
A transistor is simply an on/off switch
controlled by electricity
Integrated circuits combines dozens of hundreds
of transistors in a chip

Advances of the IC technology affect H/W and S/W
design philosophy
18
Microelectronics Process

Silicon ingot are 6-12 inches in diameter and
about 12-24 inches long
The manufacturing process of integrated circuits
is critical to the cost of a chip
Impurities in the wafer can lead to defective
devices and reduces the yield

19
Integrated Circuits Costs

Die cost roughly goes with die area4
Slide is courtesy of Dave Patterson
20
Example Dies per Wafer

Find the number of dies per 300 mm (30 cm) wafer
for a die that is 1.5 cm on a side

21
Example Dies per Wafer

Find the number of dies per 300 mm (30 cm) wafer
for a die that is 1.5 cm on a side
Die Area 2.25 cm2
Dies per wafer
(? x (30/2)2)/2.25 (? x 30)/?(2 x 2.25)
(706.9/2.25) (94.2/2.12) 270

22
Example Die Yield

Find the die yield for dies that are 1.5 cm on a
side and 1.0 cm on a side, assuming a defect
density of 0.4 per cm2 and a is 4.

23
Example Die Yield

Find the die yield for dies that are 1.5 cm on a
side and 1.0 cm on a side, assuming a defect
density of 0.4 per cm2 and a is 4.
Dies areas 2.25 cm2 and 1.00 cm2, respectively.
For larger die, yield (1 (0.4x 2.25)/4.0)-4
0.44
For smaller die, yield (1 (0.4x 1)/4.0)-4
0.68
i.e. less than half of all the large dies are
good, but more than two-thirds of the small dies
are good.

24
Real World Examples

From "Estimating IC Manufacturing Costs, by
Linley Gwennap, Microprocessor Report, August 2,
1993, p. 15

Slide is courtesy of Dave Patterson
25
Costs and Trends in Cost

Understanding trends in component costs (how
they will change over time)
is an important issue for designers
Component prices drop over time without major
improvements in
manufacturing technology
What affect cost
Learning curve
The more experience in manufacturing a
component, the better the yield (the
number of good devices/ total number of
devices)
In general, a chip, board or system with twice
the yield will have half the cost
The learning curve is different for different
components, thus complicating new
system design decisions
Volume
Larger volume increases rate of learning curve
and manufacturing efficiency
Doubling the volume typically reduce cost by 10
Commodities
Are essentially identical products sold by
multiple vendors in large volumes
Aid the competition and drive the efficiency
higher and thus the cost down

26
Cost Trends for DRAM
One dollar in 1977 ? 2.95 in 2001 Cost/MB
500 in 1997 0.35 in 2000
0.08 in 2001
Demand exceeded supply ? price slow drop
/DRAM chip
Each generation drops in dollar price by a factor
of 10 to 30 over its lifetime
27
Cost Trends for Processors
Price drop due yield enhancements
Intel List price for 1000 units of the Pentium III
28
Cost vs. Price

Component Costs raw material cost for the
systems building blocks
Direct Costs (add 25 to 40) recurring costs
labor, purchasing, scrap, warranty
Gross Margin (add 82 to 186) nonrecurring
costs RD, marketing, sales, equipment
maintenance, rental, financing cost, pretax
profits, taxes
Average Discount to get List Price (add 33 to
66) volume discounts and/or retailer markup

Slide is courtesy of Dave Patterson
29
Example Price vs. Cost
Chip Prices (August 1993) for a volume of
10,000 units
Slide is courtesy of Dave Patterson
30
Outline

Technology Trends Culture of tracking,
anticipating and exploiting advances in
technology
Understanding Cost
Careful, quantitative comparisons
Define and quantity power
Define and quantity dependability
Define, quantity, and summarize relative
performance
Define and quantity relative cost

31
Define and quantity power ( 1 / 2)

For CMOS chips, traditional dominant energy
consumption has been in switching transistors,
called dynamic power

For mobile devices, energy better metric

For a fixed task, slowing clock rate (frequency
switched) reduces power, but not energy
Capacitive load a function of number of
transistors connected to output and technology,
which determines capacitance of wires and
transistors
Dropping voltage helps both, so went from 5V to
1V
To save energy dynamic power, most CPUs now
turn off clock of inactive modules (e.g. Fl. Pt.
Unit)

32
Example of quantifying power

Suppose 15 reduction in voltage results in a 15
reduction in frequency. What is impact on dynamic
power?

33
Define and quantity power (2 / 2)

Because leakage current flows even when a
transistor is off, now static power important too

Leakage current increases in processors with
smaller transistor sizes
Increasing the number of transistors increases
power even if they are turned off
In 2006, goal for leakage is 25 of total power
consumption high performance designs at 40
Very low power systems even gate voltage to
inactive modules to control loss due to leakage

34
Outline

Review
Technology Trends Culture of tracking,
anticipating and exploiting advances in
technology
Careful, quantitative comparisons
Define and quantity power
Define and quantity dependability
Define, quantity, and summarize relative
performance
Define and quantity relative cost

35
Define and quantity dependability (1/3)

How decide when a system is operating properly?
Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their
networking or power service would be dependable
Systems alternate between 2 states of service
with respect to an SLA
Service accomplishment, where the service is
delivered as specified in SLA
Service interruption, where the delivered service
is different from the SLA
Failure transition from state 1 to state 2
Restoration transition from state 2 to state 1

36
Define and quantity dependability (2/3)

Module reliability measure of continuous
service accomplishment (or time to failure). 2
metrics
Mean Time To Failure (MTTF) measures Reliability
Failures In Time (FIT) 1/MTTF, the rate of
failures
Traditionally reported as failures per billion
hours of operation
Mean Time To Repair (MTTR) measures Service
Interruption
Mean Time Between Failures (MTBF) MTTFMTTR
Module availability measures service as alternate
between the 2 states of accomplishment and
interruption (number between 0 and 1, e.g. 0.9)
Module availability MTTF / ( MTTF MTTR)

37
Example calculating reliability

If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is
the sum of failure rates of the modules
Calculate FIT and MTTF for 10 disks (1M hour MTTF
per disk), 1 disk controller (0.5M hour MTTF),
and 1 power supply (0.2M hour MTTF)

38
Example calculating reliability

If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is
the sum of failure rates of the modules
Calculate FIT and MTTF for 10 disks (1M hour MTTF
per disk), 1 disk controller (0.5M hour MTTF),
and 1 power supply (0.2M hour MTTF)

39
Outline

Review
Technology Trends Culture of tracking,
anticipating and exploiting advances in
technology
Careful, quantitative comparisons
Define and quantity power
Define and quantity dependability
Define, quantity, and summarize relative
performance
Define and quantity relative cost

40
Definition Performance

Performance is in units of things per sec
bigger is better
If we are primarily concerned with response time

" X is n times faster than Y" means
41
Performance What to measure

Usually rely on benchmarks vs. real workloads
To increase predictability, collections of
benchmark applications, called benchmark suites,
are popular
SPECCPU popular desktop benchmark suite
CPU only, split between integer and floating
point programs
SPECint2000 has 12 integer, SPECfp2000 has 14
integer pgms
SPECCPU2006 - announced in Spring 2006
SPECSFS (NFS file server) and SPECWeb (WebServer)
added as server benchmarks
Transaction Processing Council measures server
performance and cost-performance for databases
TPC-C Complex query for Online Transaction
Processing
TPC-H models ad hoc decision support
TPC-W a transactional web benchmark
TPC-App application server and web services
benchmark

42
Performance Tuning Cycle
Benchmarks Independent Software Vendors
Workload
Product
Evaluation Simulation/Silicon
No
Satisfactory?
H/W or S/W changes
OK
Based on talk with Jim Abele, Intel Chandler
(8/30/07)
43
Some Comments

Usually the industry teams look far in future
Currently Intel Chandler team is looking at
workloads for 2012
The Workstation workload of today are PC
workloads of tomorrow
Independent S/W vendors (such as Microsoft/Adobe)
may or may not work with chip manufacturers to
make changes in their products.
Modern chips provide many performance counters
and event tracing can be used in conjunction
with performance enhancement tools such as VTune
from Intel.

44
How Summarize Suite Performance (1/5)

Arithmetic average of execution time of all pgms?
But they vary by 4X in speed, so some would be
more important than others in arithmetic average
Could add a weights per program, but how pick
weight?
Different companies want different weights for
their products
SPECRatio Normalize execution times to reference
computer, yielding a ratio proportional to
performance
time on reference computer
time on computer being rated

45
How Summarize Suite Performance (2/5)

If program SPECRatio on Computer A is 1.25 times
bigger than Computer B, then

Note that when comparing 2 computers as a ratio,
execution times on the reference computer drop
out, so choice of reference computer is
irrelevant

46
How Summarize Suite Performance (3/5)

Since ratios, proper mean is geometric mean
(SPECRatio unitless, so arithmetic mean
meaningless)

Geometric mean of the ratios is the same as the
ratio of the geometric means
Ratio of geometric means Geometric mean of
performance ratios ? choice of reference
computer is irrelevant!
These two points make geometric mean of ratios
attractive to summarize performance

47
How Summarize Suite Performance (4/5)

Does a single mean well summarize performance of
programs in benchmark suite?
Can decide if mean a good predictor by
characterizing variability of distribution using
standard deviation
Like geometric mean, geometric standard deviation
is multiplicative rather than arithmetic
Can simply take the logarithm of SPECRatios,
compute the standard mean and standard deviation,
and then take the exponent to convert back

48
How Summarize Suite Performance (5/5)

Standard deviation is more informative if know
distribution has a standard form
bell-shaped normal distribution, whose data are
symmetric around mean
lognormal distribution, where logarithms of
data--not data itself--are normally distributed
(symmetric) on a logarithmic scale
For a lognormal distribution, we expect that
68 of samples fall in range
95 of samples fall in range
Note Excel provides functions EXP(), LN(), and
STDEV() that make calculating geometric mean and
multiplicative standard deviation easy

49
Example Standard Deviation (1/2)

GM and multiplicative StDev of SPECfp2000 for
Itanium 2

50
Example Standard Deviation (2/2)

GM and multiplicative StDev of SPECfp2000 for AMD
Athlon

51
Comments on Itanium 2 and Athlon

Standard deviation of 1.98 for Itanium 2 is much
higher-- vs. 1.40--so results will differ more
widely from the mean, and therefore are likely
less predictable
Falling within one standard deviation
10 of 14 benchmarks (71) for Itanium 2
11 of 14 benchmarks (78) for Athlon
Thus, the results are quite compatible with a
lognormal distribution (expect 68)

52
Comparing Summarizing Performance

Wrong summary can present a confusing picture
A is 10 times faster than B for program 1
B is 10 times faster than A for program 2
Total execution time is a consistent summary
measure
The relative execution times for the same
workload is an
informative performance summary
Assuming that programs 1 and 2 are executing for
the same number of
times on computers A and B

Execution time is the only valid and
unimpeachable measure of performance
53
Performance Reports
Guiding principle is reproducibility (report
environment experiments setup)
54
And in conclusion

Computer Architecture gtgt ISA
Tracking and extrapolating technology part of
architects responsibility
Expect Bandwidth in disks, DRAM, network, and
processors to improve by at least as much as the
square of the improvement in Latency
Quantify dynamic and static power
Capacitance x Voltage2 x frequency, Energy vs.
power
Quantify dependability
Reliability (MTTF, FIT), Availability (99.9)
Quantify and summarize performance
Ratios, Geometric Mean, Multiplicative Standard
Deviation
Next Week Quiz Chapter 1, ILP Ch2 Assumes
Appendix A