14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system - PowerPoint PPT Presentation

About This Presentation
Title:

14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system

Description:

A rotating platter coated with ... Platters are more rigid (metal or glass) so ... 1 to 15 (2 surface) platters per disk with 1' to 8' diameter. 1,000 to 5,000 ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 43
Provided by: jani177
Category:

less

Transcript and Presenter's Notes

Title: 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system


1
14332331Computer Architecture and Assembly
LanguageSpring 2005Week 12Buses and I/O system
  • Adapted from Dave Pattersons UCB CS152 slides
    and
  • Mary Jane Irwins PSU CSE331 slides

2
Heads Up
  • This weeks material
  • Buses Connecting I/O devices
  • Reading assignment PH 8.4
  • Memory hierarchies
  • Reading assignment PH 7.1 and B.8-9
  • Reminders
  • Next weeks material
  • Basics of caches
  • Reading assignment PH 7.2

3
Review Major Components of a Computer
Processor
Devices
Control
Output
Memory
Datapath
Input
Cache
Main Memory
Secondary Memory (Disk)
4
Input and Output Devices
  • I/O devices are incredibly diverse wrt
  • Behavior
  • Partner
  • Data rate

Device Behavior Partner Data rate (KB/sec)
Keyboard input human 0.01
Mouse input human 0.02
Laser printer output human 200.00
Graphics display output human 60,000.00
Network/LAN input or output machine 500.00-6000.00
Floppy disk storage machine 100.00
Magnetic disk storage machine 2000.00-10,000.00
5
Magnetic Disk
  • Purpose
  • Long term, nonvolatile storage
  • Lowest level in the memory hierarchy
  • slow, large, inexpensive
  • General structure
  • A rotating platter coated with a magnetic surface
  • Use a moveable read/write head to access the disk
  • Advantages of hard disks over floppy disks
  • Platters are more rigid (metal or glass) so they
    can be larger
  • Higher density because it can be controlled more
    precisely
  • Higher data rate because it spins faster
  • Can incorporate more than one platter

6
Organization of a Magnetic Disk
Sector
Platters
Track
  • Typical numbers (depending on the disk size)
  • 1 to 15 (2 surface) platters per disk with 1 to
    8 diameter
  • 1,000 to 5,000 tracks per surface
  • 63 to 256 sectors per track
  • the smallest unit that can be read/written
    (typically 512 to 1,024 B)
  • Traditionally all tracks have the same number of
    sectors
  • Newer disks with smart controllers can record
    more sectors on the outer tracks (constant bit
    density)

7
Magnetic Disk Characteristic
  • Cylinder all the tracks under the heads
    at a given point on all surfaces
  • Read/write data is a three-stage process
  • Seek time position the arm over the
    proper track (6 to
    14 ms avg.)
  • due to locality of disk references
    the
    actual average seek time may
    be only 25 to
    33 of the
    advertised number
  • Rotational latency wait for the desired
    sectorto rotate under the read/write head (½ of
    1/RPM)
  • Transfer time transfer a block of bits
    (sector)under the read-write head (2 to 20
    MB/sec typical)
  • Controller time the overhead the disk controller
    imposes in performing an disk I/O access
    (typically lt 2 ms)

8
Magnetic Disk Examples
Characteristic Sun X6713A Toshiba MK2016
Disk diameter (inches) 3.5 2.5
Capacity 73 GB 20 GB
MTTF (k hrs) 1,200 300
of platters - heads 2 - 4
cylinders 16,383
B/sector - sectors/track 512 - 63
Rotation speed (RPM) 10,000 4,200
Max. - Avg. seek time (ms) ? - 6.6 24 - 13
Avg. rot. latency (ms) 3 7.14
Transfer rate (PIO) 35 MB/sec 16.6 MB/sec
Power (watts) lt 2.5
Volume (in3) 4.01
Weight (oz) 3.49
9
I/O System Interconnect Issues
Processor
Receiver
Main Memory
Keyboard
  • A bus is a shared communication link (a set of
    wires used to connect multiple subsystems)
  • Performance
  • Expandability
  • Resilience in the face of failure fault
    tolerance

10
Performance Measures
  • Latency (execution time, response time) is the
    total time from the start to finish of one
    instruction or action
  • usually used to measure processor performance
  • Throughput total amount of work done in a given
    amount of time
  • aka execution bandwidth
  • the number of operations performed per second
  • Bandwidth amount of information communicated
    across an interconnect (e.g., a bus) per unit
    time
  • the bit width of the operation rate of the
    operation
  • usually used to measure I/O performance

11
I/O System Expandability
  • Usually have more than one I/O device in the
    system
  • each I/O device is controlled by an I/O Controller

interrupt signals
Processor
Cache Memory
Memory - I/O Bus
I/O Controller
I/O Controller
I/O Controller
Main Memory
Terminal
Disk
Disk
Network
12
Quiz
  • What is disk seek time, and what is rotational
    time?

13
Bus Characteristics
  • Control lines
  • Signal requests and acknowledgments
  • Indicate what type of information is on the data
    lines
  • Data lines
  • Data, complex commands, and addresses
  • Bus transaction consists of
  • Sending the address
  • Receiving (or sending) the data

Control Lines
Data Lines
14
Output (Read) Bus Transaction
  • Defined by what they do to memory
  • read output transfers data from memory (read)
    to I/O device (write)

15
Input (Write) Bus Transaction
  • Defined by what they do to memory
  • write input transfers data from I/O device
    (read) to memory (write)

16
Advantages and Disadvantages of Buses
  • Advantages
  • Versatility
  • New devices can be added easily
  • Peripherals can be moved between computer systems
    that use the same bus standard
  • Low Cost
  • A single set of wires is shared in multiple ways
  • Disadvantages
  • It creates a communication bottleneck
  • The bus bandwidth limits the maximum I/O
    throughput
  • The maximum bus speed is largely limited by
  • The length of the bus
  • The number of devices on the bus
  • It needs to support a range of devices with
    widely varying latencies and data transfer rates

17
Types of Buses
  • Processor-Memory Bus (proprietary)
  • Short and high speed
  • Matched to the memory system to maximize the
    memory-processor bandwidth
  • Optimized for cache block transfers
  • I/O Bus (industry standard, e.g., SCSI, USB, ISA,
    IDE)
  • Usually is lengthy and slower
  • Needs to accommodate a wide range of I/O devices
  • Connects to the processor-memory bus or backplane
    bus
  • Backplane Bus (industry standard, e.g., PCI)
  • The backplane is an interconnection structure
    within the chassis
  • Used as an intermediary bus connecting I/O busses
    to the processor-memory bus

18
A Two Bus System
Processor-Memory Bus
Processor
Memory
  • I/O buses tap into the processor-memory bus via
    Bus Adaptors (that do speed matching between
    buses)
  • Processor-memory bus mainly for
    processor-memory traffic
  • I/O busses provide expansion slots for I/O
    devices

19
A Three Bus System
Processor-Memory Bus
Processor
Memory
  • A small number of Backplane Buses tap into the
    Processor-Memory Bus
  • Processor-Memory Bus is used for processor memory
    traffic
  • I/O buses are connected to the Backplane Bus
  • Advantage loading on the Processor-Memory Bus is
    greatly reduced

20
I/O System Example (Apple Mac 7200)
  • Typical of midrange to high-end desktop system in
    1997

Processor
Processor-Memory Bus
Cache Memory
Serial ports
Audio I/O
PCI Interface/ Memory Controller
Main Memory
I/O Controller
I/O Controller
PCI
CDRom
I/O Controller
I/O Controller
SCSI bus
Disk
Graphic Terminal
Network
Tape
21
Example Pentium System Organization
Processor-Memory Bus
Memory controller (Northbridge)
PCI Bus
I/O Busses
http//developer.intel.com/design/chipsets/850/ani
mate.htm?iidPCGdevside
22
Synchronous and Asynchronous Buses
  • Synchronous Bus
  • Includes a clock in the control lines
  • A fixed protocol for communication that is
    relative to the clock
  • Advantage involves very little logic and can run
    very fast
  • Disadvantages
  • Every device on the bus must run at the same
    clock rate
  • To avoid clock skew, they cannot be long if they
    are fast
  • Asynchronous Bus
  • It is not clocked, so requires handshaking
    protocol (req, ack)
  • Implemented with additional control lines
  • Advantages
  • Can accommodate a wide range of devices
  • Can be lengthened without worrying about clock
    skew or synchronization problems
  • Disadvantage slow(er)

23
Asynchronous Handshaking Protocol
  • Output (read) data from memory to an I/O device.

I/O device signals a request by raising
ReadReq and putting the addr on the data lines
  1. Memory sees ReadReq, reads addr from data lines,
    and raises Ack
  2. I/O device sees Ack and releases the ReadReq and
    data lines
  3. Memory sees ReadReq go low and drops Ack
  4. When memory has data ready, it places it on data
    lines and raises DataRdy
  5. I/O device sees DataRdy, reads the data from data
    lines, and raises Ack
  6. Memory sees Ack, releases the data lines, and
    drops DataRdy
  7. I/O device sees DataRdy go low and drops Ack

24
Key Characteristics of Two Bus Standards
Characteristic Firewire (1394) USB 2.0
Type I/O I/O
Data bus width(signals) 4 2
Clocking asynchronous asynchronous
Theoretical Peak bandwidth 50 MB/sec (Firewire 400) or 100 MB/sec (Firewire 800) 0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high)
Hot plugable Yes yes
Max. devices 63 127
Max. length (copper wire) 4.5 meters 5 meters
25
Review Major Components of a Computer
Processor
Devices
Control
Input
Memory
Datapath
Output
26
A Typical Memory Hierarchy
  • By taking advantage of the principle of locality
  • Present the user with as much memory as is
    available in the cheapest technology.
  • Provide access at the speed offered by the
    fastest technology.

On-Chip Components
Control
eDRAM
Secondary Memory (Disk)
Instr Cache
Second Level Cache (SRAM)
ITLB
Main Memory (DRAM)
Datapath
Data Cache
RegFile
DTLB
Speed (ns) .1s 1s
10s 100s
1,000s
Size (bytes) 100s Ks
10Ks Ms
Ts
Cost highest

lowest
27
Characteristics of the Memory Hierarchy
Processor
Increasing distance from the processor in access
time
L1
L2
Main Memory
Secondary Memory
(Relative) size of the memory at each level
28
Memory Hierarchy Technologies
  • Random Access
  • Random is good access time is the same for all
    locations
  • DRAM Dynamic Random Access Memory
  • High density (1 transistor cells), low power,
    cheap, slow
  • Dynamic need to be refreshed regularly (
    every 8 ms)
  • SRAM Static Random Access Memory
  • Low density (6 transistor cells), high power,
    expensive, fast
  • Static content will last forever (until power
    turned off)
  • Size DRAM/SRAM 4 to 8
  • Cost/Cycle time SRAM/DRAM 8 to 16
  • Non-so-random Access Technology
  • Access time varies from location to location and
    from time to time (e.g., Disk, CDROM)

29
Classical SRAM Organization (Square)
r o w d e c o d e r
RAM Cell Array
Column Selector I/O Circuits
column address
row address
One memory row holds a block of data, so the
column address selects the requested word from
that block
data word
30
Classical DRAM Organization (Square Planes)
bit (data) lines
. . .
r o w d e c o d e r
Each intersection represents a 1-T DRAM cell
word (row) select
column address
Column Selector I/O Circuits
row address
  • The column address
  • selects the requested
  • bit from the row in each
  • plane

data bit
. . .
data bit
data bit
data word
31
RAM Memory Definitions
  • Caches use SRAM for speed
  • Main Memory is DRAM for density
  • Addresses divided into 2 halves (row and column)
  • RAS or Row Access Strobe triggering row decoder
  • CAS or Column Access Strobe triggering column
    selector
  • Performance of Main Memory DRAMs
  • Latency Time to access one word
  • Access Time time between request and when word
    arrives
  • Cycle Time time between requests
  • Usually cycle time gt access time
  • Bandwidth How much data can be supplied per unit
    time
  • width of the data channel the rate at which it
    can be used

32
Classical DRAM Operation
Column Address
  • DRAM Organization
  • N rows x N column x M-bit
  • Read or Write M-bit at a time
  • Each M-bit access requiresa RAS / CAS cycle

DRAM
Row Address
N rows
M bits
M-bit Output
Cycle Time
1st M-bit Access
2nd M-bit Access
CAS
Row Address
Col Address
Row Address
Col Address
33
Ways to Improve DRAM Performance
  • Memory interleaving
  • Fast Page Mode DRAMs FPM DRAMs
  • www.usa.samsungsemi.com/products/newsummary/asyncd
    ram/K4F661612D.htm
  • Extended Data Out DRAMs EDO DRAMs
  • www.chips.ibm.com/products/memory/88H2011/88H2011.
    pdf
  • Synchronous DRAMS SDRAMS
  • www.usa.samsungsemi.com/products/newsummary/sdramc
    omp/K4S641632D.htm
  • Rambus DRAMS
  • www.rambus.com/developer/quickfind_documents.html
  • www.usa.samsungsemi.com/products/newsummary/rambus
    comp/K4R271669B.htm
  • Double Data Rate DRAMs DDR DRAMS
  • www.usa.samsungsemi.com/products/newsummary/ddrsyn
    cdram/K4D62323HA.htm
  • . . .

34
Increasing Bandwidth - Interleaving
Access pattern without Interleaving
Cycle Time
CPU
Memory
Access Time
D1 available
Start Access for D1
D2 available
Start Access for D2
Access pattern with 4-way Interleaving
35
Problems with Interleaving
  • How many banks?
  • Ideally, the number of banks ? number of clocks
    we have to wait to access the next word in the
    bank
  • Only works for sequential accesses (i.e., first
    word requested in first bank, second word
    requested in second bank, etc.)
  • Increasing DRAM sizes gt fewer chips gt harder to
    have banks
  • Growth bits/chip DRAM 50-60/yr
  • Only can use for very large memory systems (e.g.,
    those encountered in supercomputer systems)

36
Fast Page Mode DRAM Operation
Column Address
  • Fast Page Mode DRAM
  • N x M SRAM to save a row

N cols
DRAM
Row Address
  • After a row is read into the SRAM register
  • Only CAS is needed to access other M-bit blocks
    on that row
  • RAS remains asserted while CAS is toggled

N rows
M-bit Output
37
Why Care About the Memory Hierarchy?
Processor-DRAM Memory Gap
1000
CPU
Moores Law
Processor-Memory Performance Gap(grows 50 /
year)
100
Performance
10
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
38
Memory Hierarchy Goals
  • Fact Large memories are slow, fast memories are
    small
  • How do we create a memory that gives the illusion
    of being large, cheap and fast (most of the
    time)?
  • by taking advantage of
  • The Principle of Locality Programs access a
    relatively small portion of the address space at
    any instant of time.

39
Memory Hierarchy Why Does it Work?
  • Temporal Locality (Locality in Time)
  • gt Keep most recently accessed data items closer
    to the processor
  • Spatial Locality (Locality in Space)
  • gt Move blocks consists of contiguous words to
    the upper levels

Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
40
Memory Hierarchy Terminology
  • Hit data appears in some block in the upper
    level (Block X)
  • Hit Rate the fraction of memory accesses found
    in the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data needs to be retrieve from a block in
    the lower level (Block Y)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level Time to
    deliver the block the processor
  • Hit Time ltlt Miss Penalty

41
How is the Hierarchy Managed?
  • registers lt-gt memory
  • by compiler (programmer?)
  • cache lt-gt main memory
  • by the hardware
  • main memory lt-gt disks
  • by the hardware and operating system (virtual
    memory)
  • by the programmer (files)

42
Summary
  • DRAM is slow but cheap and dense
  • Good choice for presenting the user with a BIG
    memory system
  • SRAM is fast but expensive and not very dense
  • Good choice for providing the user FAST access
    time
  • Two different types of locality
  • Temporal Locality (Locality in Time) If an item
    is referenced, it will tend to be referenced
    again soon.
  • Spatial Locality (Locality in Space) If an item
    is referenced, items whose addresses are close by
    tend to be referenced soon.
  • By taking advantage of the principle of locality
  • Present the user with as much memory as is
    available in the cheapest technology.
  • Provide access at the speed offered by the
    fastest technology.
Write a Comment
User Comments (0)
About PowerShow.com