7. Instruction Encoding - PowerPoint PPT Presentation

About This Presentation
Title:

7. Instruction Encoding

Description:

What does it mean to say 'computer X is faster than computer Y' ... iota:~$ time gcc -g tmpcnv.s -o tmpcnv. real 0m3.352s. user 0m0.367s. sys 0m0.468s ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 33
Provided by: george354
Category:

less

Transcript and Presenter's Notes

Title: 7. Instruction Encoding


1
(No Transcript)
2
Recap
  • Technology trends
  • Cost/performance

3
Measuring and Reporting Performance
  • What does it mean to say computer X is faster
    than computer Y?
  • E.g. Machine A executes a program in 10s Machine
    B executes
  • the same program in 15s.
  • Which is true
  • A is 50 faster than B?
  • A is 33 faster than B?

4
Performance
  • HPs definition X is n times faster than Y
    means

5
Example
  • E.g. Machine A executes a program in 10s Machine
    B executes
  • the same program in 15s.
  • Which is true
  • A is 50 faster than B?
  • A is 33 faster than B?
  • Answer 1) A is 50 faster than B

6
Performance
  • Response time?
  • Throughput?

7
Measuring Performance
  • Focus on execution time of real programs
  • Measuring execution time?
  • Wall clock time (elapsed time)
  • CPU time (excludes I/O and other processes)
  • User CPU time
  • System CPU time

iota time gcc -g tmpcnv.s -o tmpcnv real
0m3.352s user 0m0.367s sys 0m0.468s
8
Choosing Programs to Measure Performance
  • Real Programs
  • Compilers, text-processing, CAD tools, etc.
  • Modified applications
  • Scripted or modified for portability
  • Kernels
  • Attempt to extract key sections from real
    programs (Livermore loops, Linpack)
  • Toy Benchmarks
  • Short examples (e.g. Sieve of Eratosthenes)
  • Synthetic Benchmarks
  • Whetstone, Dhrystone

9
Benchmarking
  • HP car magazines are more scientific about
    reporting performance than many CS journals!

10
Benchmark Suites
  • Collections of benchmarks
  • E.g. SPEC CPU2000 (INT and FP)
  • 25 real FORTRAN/C/C programs, modified for
    portability
  • Specific graphics benchmarks

11
Server Benchmarks
  • SPEC also has server benchmarks
  • File server
  • Web server
  • TPC Transaction Processing Council
  • Various transaction processing benchmarks

12
Embedded Benchmarks
  • Much less well developed
  • Tend to use Dhrystone!
  • EEMBC
  • Recent development
  • 34 benchmarks (mainly kernels) in five
    application areas

13
Summarising Performance Measurements
  • Complex area
  • Weighted arithmetic mean
  • Geometric mean
  • Normalised results

14
1.6 Quantitative Principles
  • Make the common case fast!
  • E.g. addition focus on normal addition, not
    overflow situations
  • Amdahls Law
  • Quantifies improvements gained by focussing on
    one aspect of a design

15
Amdahls Law
16
Example
  • We are considering an enhancement that is 10
    times faster than the original, but is only used
    40 of the time.

17
CPU Performance
  • CPU time related to clock speed
  • Period (e.g. 1ns)
  • Rate (e.g. 1GHz)
  • Also interested in Cycles Per Instruction (CPI)

18
Three Equal Factors
  • Clock rate (technology)
  • CPI (architecture)
  • Instruction count (architecture and compiler)

19
Measuring IC CPI
  • Many modern processors include hardware counters
    for instructions and clock cycles
  • Simulations can give even more detail
  • Time consuming, but can be very accurate

20
Another Principle Locality
  • Locality of Reference
  • 90/10 Rule
  • Also applies to data
  • Two aspects
  • Temporal locality
  • Spatial locality

21
Taking Advantage of Parallelism
  • Key principle for improving performance
  • Examples
  • System level parallel processing, disk arrays,
    etc.
  • Processor level pipelining
  • Digital design caches, ALU adders, etc.

22
1.7 Putting It All Together Performance
Price/Performance
  • Measure performance and performance/cost for
    three categories
  • Desktop (SPEC INT and FP)
  • TP Servers (TPC-C)
  • Embedded Processors (EEMBC)

23
Desktop
  • Integer
  • Performance/cost tracks performance
  • FP
  • Not as closely related
  • Pentium 4 much better than Pentium III
  • AMD Athlon very good value for money

24
Servers
  • Twelve systems
  • Six top performers
  • Six best price-performance
  • Multiprocessors
  • 3 P3s 280 P3s
  • Cost
  • 131,000 15 million

25
Embedded Processors
  • Difficult to assess
  • Benchmarks very new
  • Designs very application-specific
  • Power a major constraint
  • Cost difficult to quantify (are support chips
    required?)

26
Embedded Processors
  • Range
  • 500MHz AMD K6 (78) and IBM PowerPC (94) used
    for network switches, etc.
  • 167MHz NEC VR 5432 (25) popular in colour laser
    printers
  • 180MHz NEC VR 4122 (33) popular in PDAs (low
    power)

27
1.8 Another View Power Consumption and
Efficiency
  • Embedded processors from previous example power
    ranged from 700mW to 9600mW
  • Fig. 1.27 Performance/watt
  • NEC VR 4122 huge leader

28
1.9 Fallacies and Pitfalls
  • Fallacy Relative performance of two similar
    processors can be judged by clock rate or by a
    single benchmark
  • Factors such as pipeline structure and memory
    system have major impact
  • E.g. Pentium III vs. Pentium 4 (Fig. 1.28)

29
1.7GHz P4 vs 1.0GHz P3
30
Fallacies and Pitfalls
  • Fallacy Benchmarks remain valid indefinitely
  • Optimisations change
  • Perhaps deliberately!
  • Even real programs are affected by changes in
    technology
  • E.g. gcc increasing percentage is system time
  • SPEC has adapted considerably

31
Fallacies and Pitfalls
  • Pitfall Comparing hand-coded assembly and
    compiled high-level language performance
  • E.g. embedded processor benchmarks
  • Hand-coded is 5 87 times faster!

32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com