Title: 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system
114332331Computer Architecture and Assembly
LanguageSpring 2005Week 12Buses and I/O system
- Adapted from Dave Pattersons UCB CS152 slides
and - Mary Jane Irwins PSU CSE331 slides
2Heads Up
- This weeks material
- Buses Connecting I/O devices
- Reading assignment PH 8.4
- Memory hierarchies
- Reading assignment PH 7.1 and B.8-9
- Reminders
- Next weeks material
- Basics of caches
- Reading assignment PH 7.2
3Review Major Components of a Computer
Processor
Devices
Control
Output
Memory
Datapath
Input
Cache
Main Memory
Secondary Memory (Disk)
4Input and Output Devices
- I/O devices are incredibly diverse wrt
- Behavior
- Partner
- Data rate
Device Behavior Partner Data rate (KB/sec)
Keyboard input human 0.01
Mouse input human 0.02
Laser printer output human 200.00
Graphics display output human 60,000.00
Network/LAN input or output machine 500.00-6000.00
Floppy disk storage machine 100.00
Magnetic disk storage machine 2000.00-10,000.00
5Magnetic Disk
- Purpose
- Long term, nonvolatile storage
- Lowest level in the memory hierarchy
- slow, large, inexpensive
- General structure
- A rotating platter coated with a magnetic surface
- Use a moveable read/write head to access the disk
- Advantages of hard disks over floppy disks
- Platters are more rigid (metal or glass) so they
can be larger - Higher density because it can be controlled more
precisely - Higher data rate because it spins faster
- Can incorporate more than one platter
6Organization of a Magnetic Disk
Sector
Platters
Track
- Typical numbers (depending on the disk size)
- 1 to 15 (2 surface) platters per disk with 1 to
8 diameter - 1,000 to 5,000 tracks per surface
- 63 to 256 sectors per track
- the smallest unit that can be read/written
(typically 512 to 1,024 B) - Traditionally all tracks have the same number of
sectors - Newer disks with smart controllers can record
more sectors on the outer tracks (constant bit
density)
7Magnetic Disk Characteristic
- Cylinder all the tracks under the heads
at a given point on all surfaces - Read/write data is a three-stage process
- Seek time position the arm over the
proper track (6 to
14 ms avg.) - due to locality of disk references
the
actual average seek time may
be only 25 to
33 of the
advertised number - Rotational latency wait for the desired
sectorto rotate under the read/write head (½ of
1/RPM) - Transfer time transfer a block of bits
(sector)under the read-write head (2 to 20
MB/sec typical) - Controller time the overhead the disk controller
imposes in performing an disk I/O access
(typically lt 2 ms)
8Magnetic Disk Examples
Characteristic Sun X6713A Toshiba MK2016
Disk diameter (inches) 3.5 2.5
Capacity 73 GB 20 GB
MTTF (k hrs) 1,200 300
of platters - heads 2 - 4
cylinders 16,383
B/sector - sectors/track 512 - 63
Rotation speed (RPM) 10,000 4,200
Max. - Avg. seek time (ms) ? - 6.6 24 - 13
Avg. rot. latency (ms) 3 7.14
Transfer rate (PIO) 35 MB/sec 16.6 MB/sec
Power (watts) lt 2.5
Volume (in3) 4.01
Weight (oz) 3.49
9I/O System Interconnect Issues
Processor
Receiver
Main Memory
Keyboard
- A bus is a shared communication link (a set of
wires used to connect multiple subsystems) - Performance
- Expandability
- Resilience in the face of failure fault
tolerance
10Performance Measures
- Latency (execution time, response time) is the
total time from the start to finish of one
instruction or action - usually used to measure processor performance
- Throughput total amount of work done in a given
amount of time - aka execution bandwidth
- the number of operations performed per second
- Bandwidth amount of information communicated
across an interconnect (e.g., a bus) per unit
time - the bit width of the operation rate of the
operation - usually used to measure I/O performance
11I/O System Expandability
- Usually have more than one I/O device in the
system - each I/O device is controlled by an I/O Controller
interrupt signals
Processor
Cache Memory
Memory - I/O Bus
I/O Controller
I/O Controller
I/O Controller
Main Memory
Terminal
Disk
Disk
Network
12Quiz
- What is disk seek time, and what is rotational
time?
13Bus Characteristics
- Control lines
- Signal requests and acknowledgments
- Indicate what type of information is on the data
lines - Data lines
- Data, complex commands, and addresses
- Bus transaction consists of
- Sending the address
- Receiving (or sending) the data
Control Lines
Data Lines
14Output (Read) Bus Transaction
- Defined by what they do to memory
- read output transfers data from memory (read)
to I/O device (write)
15Input (Write) Bus Transaction
- Defined by what they do to memory
- write input transfers data from I/O device
(read) to memory (write)
16Advantages and Disadvantages of Buses
- Advantages
- Versatility
- New devices can be added easily
- Peripherals can be moved between computer systems
that use the same bus standard - Low Cost
- A single set of wires is shared in multiple ways
- Disadvantages
- It creates a communication bottleneck
- The bus bandwidth limits the maximum I/O
throughput - The maximum bus speed is largely limited by
- The length of the bus
- The number of devices on the bus
- It needs to support a range of devices with
widely varying latencies and data transfer rates
17Types of Buses
- Processor-Memory Bus (proprietary)
- Short and high speed
- Matched to the memory system to maximize the
memory-processor bandwidth - Optimized for cache block transfers
- I/O Bus (industry standard, e.g., SCSI, USB, ISA,
IDE) - Usually is lengthy and slower
- Needs to accommodate a wide range of I/O devices
- Connects to the processor-memory bus or backplane
bus - Backplane Bus (industry standard, e.g., PCI)
- The backplane is an interconnection structure
within the chassis - Used as an intermediary bus connecting I/O busses
to the processor-memory bus
18A Two Bus System
Processor-Memory Bus
Processor
Memory
- I/O buses tap into the processor-memory bus via
Bus Adaptors (that do speed matching between
buses) - Processor-memory bus mainly for
processor-memory traffic - I/O busses provide expansion slots for I/O
devices
19A Three Bus System
Processor-Memory Bus
Processor
Memory
- A small number of Backplane Buses tap into the
Processor-Memory Bus - Processor-Memory Bus is used for processor memory
traffic - I/O buses are connected to the Backplane Bus
- Advantage loading on the Processor-Memory Bus is
greatly reduced
20I/O System Example (Apple Mac 7200)
- Typical of midrange to high-end desktop system in
1997
Processor
Processor-Memory Bus
Cache Memory
Serial ports
Audio I/O
PCI Interface/ Memory Controller
Main Memory
I/O Controller
I/O Controller
PCI
CDRom
I/O Controller
I/O Controller
SCSI bus
Disk
Graphic Terminal
Network
Tape
21Example Pentium System Organization
Processor-Memory Bus
Memory controller (Northbridge)
PCI Bus
I/O Busses
http//developer.intel.com/design/chipsets/850/ani
mate.htm?iidPCGdevside
22Synchronous and Asynchronous Buses
- Synchronous Bus
- Includes a clock in the control lines
- A fixed protocol for communication that is
relative to the clock - Advantage involves very little logic and can run
very fast - Disadvantages
- Every device on the bus must run at the same
clock rate - To avoid clock skew, they cannot be long if they
are fast - Asynchronous Bus
- It is not clocked, so requires handshaking
protocol (req, ack) - Implemented with additional control lines
- Advantages
- Can accommodate a wide range of devices
- Can be lengthened without worrying about clock
skew or synchronization problems - Disadvantage slow(er)
23Asynchronous Handshaking Protocol
- Output (read) data from memory to an I/O device.
I/O device signals a request by raising
ReadReq and putting the addr on the data lines
- Memory sees ReadReq, reads addr from data lines,
and raises Ack - I/O device sees Ack and releases the ReadReq and
data lines - Memory sees ReadReq go low and drops Ack
- When memory has data ready, it places it on data
lines and raises DataRdy - I/O device sees DataRdy, reads the data from data
lines, and raises Ack - Memory sees Ack, releases the data lines, and
drops DataRdy - I/O device sees DataRdy go low and drops Ack
24Key Characteristics of Two Bus Standards
Characteristic Firewire (1394) USB 2.0
Type I/O I/O
Data bus width(signals) 4 2
Clocking asynchronous asynchronous
Theoretical Peak bandwidth 50 MB/sec (Firewire 400) or 100 MB/sec (Firewire 800) 0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high)
Hot plugable Yes yes
Max. devices 63 127
Max. length (copper wire) 4.5 meters 5 meters
25Review Major Components of a Computer
Processor
Devices
Control
Input
Memory
Datapath
Output
26A Typical Memory Hierarchy
- By taking advantage of the principle of locality
- Present the user with as much memory as is
available in the cheapest technology. - Provide access at the speed offered by the
fastest technology.
On-Chip Components
Control
eDRAM
Secondary Memory (Disk)
Instr Cache
Second Level Cache (SRAM)
ITLB
Main Memory (DRAM)
Datapath
Data Cache
RegFile
DTLB
Speed (ns) .1s 1s
10s 100s
1,000s
Size (bytes) 100s Ks
10Ks Ms
Ts
Cost highest
lowest
27Characteristics of the Memory Hierarchy
Processor
Increasing distance from the processor in access
time
L1
L2
Main Memory
Secondary Memory
(Relative) size of the memory at each level
28Memory Hierarchy Technologies
- Random Access
- Random is good access time is the same for all
locations - DRAM Dynamic Random Access Memory
- High density (1 transistor cells), low power,
cheap, slow - Dynamic need to be refreshed regularly (
every 8 ms) - SRAM Static Random Access Memory
- Low density (6 transistor cells), high power,
expensive, fast - Static content will last forever (until power
turned off) - Size DRAM/SRAM 4 to 8
- Cost/Cycle time SRAM/DRAM 8 to 16
- Non-so-random Access Technology
- Access time varies from location to location and
from time to time (e.g., Disk, CDROM)
29Classical SRAM Organization (Square)
r o w d e c o d e r
RAM Cell Array
Column Selector I/O Circuits
column address
row address
One memory row holds a block of data, so the
column address selects the requested word from
that block
data word
30Classical DRAM Organization (Square Planes)
bit (data) lines
. . .
r o w d e c o d e r
Each intersection represents a 1-T DRAM cell
word (row) select
column address
Column Selector I/O Circuits
row address
- The column address
- selects the requested
- bit from the row in each
- plane
data bit
. . .
data bit
data bit
data word
31RAM Memory Definitions
- Caches use SRAM for speed
- Main Memory is DRAM for density
- Addresses divided into 2 halves (row and column)
- RAS or Row Access Strobe triggering row decoder
- CAS or Column Access Strobe triggering column
selector - Performance of Main Memory DRAMs
- Latency Time to access one word
- Access Time time between request and when word
arrives - Cycle Time time between requests
- Usually cycle time gt access time
- Bandwidth How much data can be supplied per unit
time - width of the data channel the rate at which it
can be used
32Classical DRAM Operation
Column Address
- DRAM Organization
- N rows x N column x M-bit
- Read or Write M-bit at a time
- Each M-bit access requiresa RAS / CAS cycle
DRAM
Row Address
N rows
M bits
M-bit Output
Cycle Time
1st M-bit Access
2nd M-bit Access
CAS
Row Address
Col Address
Row Address
Col Address
33Ways to Improve DRAM Performance
- Memory interleaving
- Fast Page Mode DRAMs FPM DRAMs
- www.usa.samsungsemi.com/products/newsummary/asyncd
ram/K4F661612D.htm - Extended Data Out DRAMs EDO DRAMs
- www.chips.ibm.com/products/memory/88H2011/88H2011.
pdf - Synchronous DRAMS SDRAMS
- www.usa.samsungsemi.com/products/newsummary/sdramc
omp/K4S641632D.htm - Rambus DRAMS
- www.rambus.com/developer/quickfind_documents.html
- www.usa.samsungsemi.com/products/newsummary/rambus
comp/K4R271669B.htm - Double Data Rate DRAMs DDR DRAMS
- www.usa.samsungsemi.com/products/newsummary/ddrsyn
cdram/K4D62323HA.htm - . . .
34Increasing Bandwidth - Interleaving
Access pattern without Interleaving
Cycle Time
CPU
Memory
Access Time
D1 available
Start Access for D1
D2 available
Start Access for D2
Access pattern with 4-way Interleaving
35Problems with Interleaving
- How many banks?
- Ideally, the number of banks ? number of clocks
we have to wait to access the next word in the
bank - Only works for sequential accesses (i.e., first
word requested in first bank, second word
requested in second bank, etc.) - Increasing DRAM sizes gt fewer chips gt harder to
have banks - Growth bits/chip DRAM 50-60/yr
- Only can use for very large memory systems (e.g.,
those encountered in supercomputer systems)
36Fast Page Mode DRAM Operation
Column Address
- Fast Page Mode DRAM
- N x M SRAM to save a row
N cols
DRAM
Row Address
- After a row is read into the SRAM register
- Only CAS is needed to access other M-bit blocks
on that row - RAS remains asserted while CAS is toggled
N rows
M-bit Output
37Why Care About the Memory Hierarchy?
Processor-DRAM Memory Gap
1000
CPU
Moores Law
Processor-Memory Performance Gap(grows 50 /
year)
100
Performance
10
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
38Memory Hierarchy Goals
- Fact Large memories are slow, fast memories are
small - How do we create a memory that gives the illusion
of being large, cheap and fast (most of the
time)? - by taking advantage of
- The Principle of Locality Programs access a
relatively small portion of the address space at
any instant of time.
39Memory Hierarchy Why Does it Work?
- Temporal Locality (Locality in Time)
- gt Keep most recently accessed data items closer
to the processor - Spatial Locality (Locality in Space)
- gt Move blocks consists of contiguous words to
the upper levels
Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
40Memory Hierarchy Terminology
- Hit data appears in some block in the upper
level (Block X) - Hit Rate the fraction of memory accesses found
in the upper level - Hit Time Time to access the upper level which
consists of - RAM access time Time to determine hit/miss
- Miss data needs to be retrieve from a block in
the lower level (Block Y) - Miss Rate 1 - (Hit Rate)
- Miss Penalty Time to replace a block in the
upper level Time to
deliver the block the processor - Hit Time ltlt Miss Penalty
41How is the Hierarchy Managed?
- registers lt-gt memory
- by compiler (programmer?)
- cache lt-gt main memory
- by the hardware
- main memory lt-gt disks
- by the hardware and operating system (virtual
memory) - by the programmer (files)
42Summary
- DRAM is slow but cheap and dense
- Good choice for presenting the user with a BIG
memory system - SRAM is fast but expensive and not very dense
- Good choice for providing the user FAST access
time - Two different types of locality
- Temporal Locality (Locality in Time) If an item
is referenced, it will tend to be referenced
again soon. - Spatial Locality (Locality in Space) If an item
is referenced, items whose addresses are close by
tend to be referenced soon. - By taking advantage of the principle of locality
- Present the user with as much memory as is
available in the cheapest technology. - Provide access at the speed offered by the
fastest technology.