Chapter 2 Computer Evolution and Performance - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Chapter 2 Computer Evolution and Performance

Description:

Control unit interpreting instructions from memory and executing ... First minicomputer (after miniskirt!) Did not need air conditioned room ... – PowerPoint PPT presentation

Number of Views:1039
Avg rating:3.0/5.0
Slides: 69
Provided by: kmhY
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 Computer Evolution and Performance


1
Chapter 2Computer Evolution and Performance
2
Contents
  • A Brief History of Computers
  • Designing for Performance
  • Pentium and PowerPC Evolution
  • Performance Evaluation

3
ENIAC
A brief history of computers
  • Electronic Numerical Integrator And Computer
  • John Mauchly and John Presper Eckert
  • Trajectory tables for weapons
  • Started 1943 / Finished 1946
  • Too late for war effort
  • Used until 1955
  • Decimal (not binary)
  • 20 accumulators of 10 digits
  • Programmed manually by switches
  • 18,000 vacuum tubes
  • 30 tons
  • 1,500 square feet
  • 140 kW power consumption
  • 5,000 additions per second

4
von Neumann/Turing
A brief history of computers
  • Stored Program concept
  • Main memory storing programs and data
  • ALU operating on binary data
  • Control unit interpreting instructions from
    memory and executing
  • Input and output equipment operated by control
    unit
  • Princeton Institute for Advanced Studies
  • IAS
  • Completed 1952

5
von Nuemann Machine
A brief history of computers
Input Output Equipment
Arithmetic And Logic Unit
Main Memory
Program Control Unit
  • If a program could be represented in a form
    suitable for storing in memory, the programming
    process could be facilitated
  • A computer could get its its instructions from
    memory, and a program could could be set or
    altered by setting the values of a portion of
    memory

6
IAS Memory Formats
A brief history of computers
Sign Bit
  • 1000 x 40 bit words
  • Binary number
  • 2 x 20 bit instructions
  • Each instruction consisting of an 8-bit opcode
  • A 12-bit address designating one of the words in
    memory

7
IAS Registers
A brief history of computers
  • Memory Buffer Register
  • Containing a word to be stored in memory, or used
    to receive a word from memory
  • Memory Address Register
  • Specifying the address in memory of the word to
    be written from or read into the MBR
  • Instruction Register
  • Containing the 8-bit opcode instruction being
    executed
  • Instruction Buffer Register
  • Employed to hold temporarily the righthand
    instruction from a word memory
  • Program Counter
  • Containing the address of the next
    instruction-pair to be fetched from memory
  • Accumulator and Multiplier Quotient
  • Employed to hold temporarily operands and results
    of ALU operations.

8
Structure of IAS
A brief history of computers
9
Partial Flowchart of IAS
A brief history of computers
10
The IAS Instruction Set
A brief history of computers
11
The IAS Instruction Set
A brief history of computers
12
The IAS Instruction Set
A brief history of computers
  • Data transfer
  • Move data between memory and ALU registers or
    between two ALU registers
  • Unconditional branch
  • This sequence can be changed by a branch
    instruction allowing decision points
  • Conditional branch
  • The branch can be made dependent on a condition,
    thus allowing decision points
  • Arithmetic
  • Operations performed by the ALU
  • Address modify
  • Permits addresses to be computed in the ALU and
    then inserted into instruction stored in memory.

13
Commercial Computers
A brief history of computers
  • 1947 - Eckert-Mauchly Computer Corporation
  • UNIVAC I (Universal Automatic Computer)
  • US Bureau of Census 1950 calculations
  • Became part of Sperry-Rand Corporation
  • Late 1950s - UNIVAC II
  • Faster
  • More memory

14
IBM
A brief history of computers
  • Had helped build the Mark I
  • Punched-card processing equipment
  • 1953 - the 701
  • IBMs first stored program computer
  • Scientific calculations
  • 1955 - the 702
  • Business applications
  • Lead to 700/7000 series

15
Computer Generations
A brief history of computers
Typical Speed (operations per second)
Technology
Approximate Dates
Generation
40,000 200,000 1,000,000 10,000,000 100,000,000

Vacuum tube Transistor Small- and Medium-scale Int
egration Large-scale Integration Very-large-scale
Integration
1946-1957 1958-1964 1965-1971 1972-1977 1978-
.
1 2 3 4 5
16
Transistors
A brief history of computers
  • Replaced vacuum tubes
  • Smaller
  • Cheaper
  • Less heat dissipation
  • Solid State device
  • Made from Silicon (Sand)
  • Invented 1947 at Bell Labs
  • William Shockley et al.

17
Transistor Based Computers
A brief history of computers
  • Second generation machines
  • NCR RCA produced small transistor machines
  • IBM 7000
  • Digital Equipment Corporation (DEC) - 1957
  • Produced PDP-1

18
IBM 700/7000 Series
A brief history of computers
19
IBM 700/7000 Series
A brief history of computers
20
An IBM 7094 Configuration
A brief history of computers
21
The IBM 7094
A brief history of computers
  • The most important point is the use of data
    channels. A data channel is an independent I/O
    module with its own processor and its own
    instruction set.
  • Another new feature is the multiplexor, which is
    the central termination point for data channel,
    the CPU, and memory.

22
Microelectronics
A brief history of computers
  • Literally - small electronics
  • A computer is made up of gates, memory cells and
    interconnections
  • These can be manufactured on a semiconductor
  • e.g. silicon wafer

23
Microelectronics
A brief history of computers
  • Data storage
  • Provided by memory cells
  • Data processing
  • Provided by gates
  • Data movement
  • The paths between components are used to move
    data from memory to memory and from memory
    through gates to memory
  • Control
  • The paths between components can carry control
    signals. The memory cell will store the bit on
    its input lead when the WRITE control signal is
    ON and will place that bit on its output lead
    when the READ control signal is ON.

24
Wafer, Chip, and Gate
A brief history of computers
  • Small-scale integration (SSI)

25
Generations of Computer
A brief history of computers
  • Vacuum tube - 1946-1957
  • Transistor - 1958-1964
  • Small scale integration - 1965 on
  • Up to 100 devices on a chip
  • Medium scale integration - to 1971
  • 100-3,000 devices on a chip
  • Large scale integration - 1971-1977
  • 3,000 - 100,000 devices on a chip
  • Very large scale integration - 1978 to date
  • 100,000 - 100,000,000 devices on a chip
  • Ultra large scale integration
  • Over 100,000,000 devices on a chip

26
Moores Law
A brief history of computers
  • Increased density of components on chip
  • Gordon Moore - cofounder of Intel
  • Number of transistors on a chip will double every
    year
  • Since 1970s development has slowed a little
  • Number of transistors doubles every 18 months
  • Cost of a chip has remained almost unchanged
  • Higher packing density means shorter electrical
    paths, giving higher performance
  • Smaller size gives increased flexibility
  • Reduced power and cooling requirements
  • Fewer interconnections increases reliability

27
Growth in CPU Transistor Count
A brief history of computers
28
IBM 360 series
A brief history of computers
  • 1964
  • Replaced ( not compatible with) 7000 series
  • First planned family of computers
  • Similar or identical instruction sets
  • Similar or identical O/S
  • Increasing speed
  • Increasing number of I/O ports(i.e. more
    terminals)
  • Increased memory size
  • Increased cost
  • Multiplexed switch structure

29
Key Characteristics of 360 Family
A brief history of computers
  • Many of its features have become standard on
    other large computers

30
DEC PDP-8
A brief history of computers
  • 1964
  • First minicomputer (after miniskirt!)
  • Did not need air conditioned room
  • Small enough to sit on a lab bench
  • 16,000
  • 100k for IBM 360
  • Embedded applications OEM
  • Later models of the PDP-8 used a bus structure
    that is now virtually universal for minicomputers
    and microcomputers

31
PDP-8/E Block Diagram
A brief history of computers
  • Highly flexible architecture allowing modules to
    be plugged into the bus to create various
    configurations

32
Semiconductor Memory
A brief history of computers
  • The first application of integrated circuit
    technology to computers
  • construction of the processor
  • also used to construct memories
  • 1970
  • Fairchild
  • Size of a single core
  • i.e. 1 bit of magnetic core storage
  • Holds 256 bits
  • Non-destructive read
  • Much faster than core
  • Capacity approximately doubles each year

33
Evolution of Intel Microprocessors
A brief history of computers
34
Evolution of Intel Microprocessors
A brief history of computers
35
Evolution of Intel Microprocessors
A brief history of computers
36
Microprocessor Speed
Design for performance
  • In memory chips, the relentless pursuit of speed
    has quadrupled the capacity of DRAM, every
    years
  • Pipelining
  • On board cache
  • On board L1 L2 cache
  • Branch prediction
  • Data flow analysis
  • Speculative execution

37
Evolution of DRAM / Processor Characteristics
Design for performance
38
Performance Mismatch
Design for performance
  • Processor speed increased
  • Memory capacity increased
  • Memory speed lags behind processor speed

39
Performance Balance
Design for performance
  • It is responsible for carrying a constant flow of
    program instructions and data between memory
    chips and the processor ? The interface between
    processor and main memory is the most crucial
    pathway in the entire computer

40
Trends in DRAM use
Design for performance
41
Performance Balance
Design for performance
  • On average, the number of DRAMs per system is
    going down.
  • The solid black lines in the figure show that,
    for a fixed-sized memory, the number of DRAMs
    needed is declining
  • The shaded bands show that for a particular type
    of system, main memory size has slowly increased
    while the number of DRAMs has declined

42
Solutions
Design for performance
  • Increase number of bits retrieved at one time
  • Make DRAM wider rather than deeper
  • Change DRAM interface
  • Cache
  • Reduce frequency of memory access
  • More complex cache and cache on chip
  • Increase interconnection bandwidth
  • High speed buses
  • Hierarchy of buses

43
Performance Balance
Design for performance
  • Two constantly evolving factors to be coped with
  • The rate at which performance is changing in the
    various technology areas differs greatly from one
    type of element to another
  • New applications and new peripheral devices
    constantly change the nature of the demand on the
    system in terms of typical instruction profile
    and the data access patterns.

44
Intel
Pentium and PowerPC evolution
  • Pentium - results of design effort on CISCs
  • 1971 - 4004
  • First microprocessor
  • All CPU components on a single chip
  • 4 bit
  • Followed in 1972 by 8008
  • 8 bit
  • Both designed for specific applications
  • 1974 - 8080
  • Intels first general purpose microprocessor
  • 8086
  • 16 bit, instruction cache, or queue
  • 80286
  • addressing a 16-Mbyte memory

45
Intel
Pentium and PowerPC evolution
  • 80386
  • 32 bit, multitasking
  • 80486
  • built-in math coprocessor
  • Pentium
  • superscalar techniques
  • Pentium Pro
  • Pentium II
  • Intel MMX thchnology
  • Pentium III
  • additional floating-point instruction
  • Merced
  • 64-bit organization

46
PowerPC
Pentium and PowerPC evolution
  • RISC systems
  • PowerPC Processor Summary

47
Two Notions of Performance
Performance evaluation
Plane
Speed
DC to Paris
Passengers
Throughput (pmph)
Boeing 747
610 mph
6.5 hours
470
286,700
BAD/Sud Concodre
1350 mph
3 hours
132
178,200
  • Which has higher performance?
  • Time to do the task (Execution Time)
  • execution time, response time, latency
  • Tasks per day, hour, week, sec, ns. ..
    (Performance)
  • throughput, bandwidth
  • Response time and throughput often are in
    opposition

48
To Assess Performance
Performance evaluation
  • Response Time
  • Time to complete a task
  • Throughput
  • Total amount of work done per time
  • Execution Time (CPU Time)
  • User CPU time
  • Time spent in the program
  • System CPU time
  • Time spent in OS
  • Elapsed Time
  • Execution Time Time of I/O and time sharing

49
Criteria of Performance
Performance evaluation
  • Execution time seems to measure the power of the
    CPU
  • Elapsed time measures the performance of whole
    system including OS and I/O
  • User is interested in elapsed time
  • Sales people are interested in the highest number
    of performance that can be quoted
  • Performance analysist is interested in both
    execution time and elapsed time

50
Definitions
Performance evaluation
  • Performance is in units of things-per-second
  • bigger is better
  • If we are primarily concerned with response time
  • performance(x) 1
    execution_time(x)
  • " X is n times faster than Y" means
  • Performance(X)
  • n ----------------------
  • Performance(Y)

51
Example
Performance evaluation
  • Time of Concorde vs. Boeing 747?
  • bigger is better
  • Concord is 1350 mph / 610 mph 2.2 times faster

  • 6.5hours/3hours
  • Throughput of Concorde vs. Boeing 747 ?
  • Concord is 178,200 pmph / 286,700 pmph
  • 0.62 times faster
  • Boeing is 286,700 pmph / 178,200 pmph
  • 1.6 times faster
  • Boeing is 1.6 times (60 faster in terms of
    throughput
  • Concord is 2.2 times (220 faster in terms of
    flying time
  • We will focus primarily on execution time for a
    single job

52
Basis of Evaluation
Performance evaluation
Cons
Pros
  • very specific
  • non-portable
  • difficult to run, or
  • measure
  • hard to identify cause
  • representative

Actual Target Workload
  • portable
  • widely used
  • improvements useful in reality
  • less representative

Full Application Benchmarks
  • easy to cool

Small kernel Benchmarks
  • easy to run, early in design cycle
  • peak may be a long way from application
    performance
  • identify peak capability and potential
    bottlenecks

Microbenchmarks
53
MIPS
Performance evaluation
  • Millions of Instruction(Executed) Per Second
  • Often used measure of performance
  • Native MIPS

54
MIPS
Performance evaluation
  • Meaningless information
  • Run a program and time it
  • Count the number of executed instruction to get
    MIPs rating
  • Problems
  • Cannot compare different computers with different
    instruction sets
  • Varies between programs executed on the same
    computer
  • Peak MIPS
  • This is what many manufacturers provide
  • Usually neglecting peako

55
Relative MIPS
Performance evaluation
  • Call VAX 11/780 1 MIPS machine (not true)
  • .
  • .
  • Makes MIPS rating more independent of benchmark
    programs
  • Advantage of relative MIPS is small

56
FLOPS
Performance evaluation
  • Million Floating Point Instructions Per Second
  • Used for engineering and scientific applications
    where floating point operations account for a
    high fraction of all executed instructions
  • Problems
  • Program dependent
  • Many programs does not use floating point
    operations
  • Machine dependent
  • Depends on relative mixture of integer and
    floating point operations
  • Depends on relative mixture of cheep(.-) and
    expensive() floating point operations
  • Normalized FLOPS (relative FLOPS)
  • Peak FLOPS

57
SPEC Marks
Performance evaluation
  • System Performance Evaluation Coorperative
  • Non-profit group initially founded by APOLLO, HP,
    MIPSCO, and SUN
  • Now includes many more like IBM, DEC, ATT,
    MOTOROLA, etc
  • Measures the ratio of execution time on the
    target measure to that on a VAX 11/780
  • Summarizes performance by taking the geometric
    means of the ratios

58
SPEC95
Performance evaluation
  • Eighteen application benchmarks (with inputs)
    reflecting a technical computing workload
  • Eight integer
  • go, m88ksim, gcc, compress, li, ijpeg, perl,
    vortex
  • Ten floating-point intensive
  • tomcatv, swim, su2cor, hydro2d, mgrid, applu,
    turb3d, apsi, fppp, wave5
  • Must run with standard compiler flags
  • eliminate special undocumented incantations that
    may not even generate working code for real
    programs

59
Metrics of performance
Performance evaluation
Answers per month Useful Operations per second
Application
Programming Language
Compiler
(millions) of Instructions per second
?MIPS (millions) of (F.P.) operations per second
?MFLOP/s
ISA
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors
Wires
Pins
Each metric has a place and a purpose, and each
can be misused
60
Aspects of CPU Performance
Performance evaluation
61
Criteria of Performance
Performance evaluation
  • CPU Time
  • (Instruction count) (CPI) (Clock Cycle)
  • number of Instructions
  • Clock Rate
  • .
  • Depends on technology and organization
  • CPI
  • Cycles Per Instruction
  • Depends on organization and instruction set
  • Instruction Count
  • Depends on compiler and instruction set

second

cycle
cycle
instruction
cycle
seconds
62
Criteria of Performance
Performance evaluation
  • If CPI is not uniform across all instructions
  • CPU cycles S (CPIi Ii)
  • n - number of instructions in instruction set
  • CPIi - CPI for instruction i
  • Ii - number of times instruction i occurs in a
    program
  • CPU Time S (CPIi Ii clock cycle)
  • CPI
  • It assumes that a given instruction always takes
    the same number of cycles to execute

63
Aspects of CPU Performance
Performance evaluation
64
CPI
Performance evaluation
average cycles per instruction
CPI (CPU Time Clock Rate) / Instruction Count
Clock Cycles / Instruction Count
CPU time ? (Clock Cycle Time CPI i I I)
I i

CPI ? CPI i F i where F i
Instruction Count
"instruction frequency"
  • Invest Resources where time is Spent!

65
Example of RISC
Performance evaluation
Base Machine (Reg / Reg) Op Freq Cycles CPI(i)
Time ALU 50 1 .5 23 Load 20 5
1.0 45 Store 10 3 .3 14 Branch 20 2
.4 18 2.2
Typical Mix
How much faster would the machine be is a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to shave a cycle off the branch
time? What if two ALU instructions could be
executed at once?
66
Amdahl's Law
Performance evaluation
  • Speedup due to enhancement E
  • ExTime w/o E
    Performance w/ E
  • Speedup(E) --------------------
    --------------------------
  • ExTime w/ E Performance w/o E
  • Suppose that enhancement E accelerates a fraction
    F of the task by a factor S and the remainder of
    the task is unaffected then,
  • ExTime(with E) ((1-F) F/S) X ExTime(without E)
  • Speedup(with E) . 1 .
    (1-F) F/S

67
Cost
Performance evaluation
  • Traditionally ignored by textbooks because of
    rapid change
  • Driven by learning curve manufacturing costs
    decrease with time
  • Understanding learning curve effects on yield is
    key to cost projection
  • Yield
  • Fraction of manufactured items that survive the
    testing procedure
  • Testing and Packaging
  • Big factors in lowering costs

68
Cost
Performance evaluation
  • Cost of Chips
  • Cost
  • Cost of die
  • Wafer Yield dies / wafer
  • Cost vs. Price
  • Component cost 1533
  • Direct cost 68
  • Gross margin 3439
  • Average discount 2540
Write a Comment
User Comments (0)
About PowerShow.com