14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system - PowerPoint PPT Presentation

About This Presentation

Title:

14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system

Description:

A rotating platter coated with ... Platters are more rigid (metal or glass) so ... 1 to 15 (2 surface) platters per disk with 1' to 8' diameter. 1,000 to 5,000 ... – PowerPoint PPT presentation

Number of Views:243

Avg rating:3.0/5.0

Slides: 43

Provided by: jani177

Learn more at: https://www.ece.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system

1
14332331Computer Architecture and Assembly
LanguageSpring 2005Week 12Buses and I/O system

Adapted from Dave Pattersons UCB CS152 slides
and
Mary Jane Irwins PSU CSE331 slides

2
Heads Up

This weeks material
Buses Connecting I/O devices
Reading assignment PH 8.4
Memory hierarchies
Reading assignment PH 7.1 and B.8-9
Reminders
Next weeks material
Basics of caches
Reading assignment PH 7.2

3
Review Major Components of a Computer
Processor
Devices
Control
Output
Memory
Datapath
Input
Cache
Main Memory
Secondary Memory (Disk)
4
Input and Output Devices

I/O devices are incredibly diverse wrt
Behavior
Partner
Data rate

Device Behavior Partner Data rate (KB/sec)
Keyboard input human 0.01
Mouse input human 0.02
Laser printer output human 200.00
Graphics display output human 60,000.00
Network/LAN input or output machine 500.00-6000.00
Floppy disk storage machine 100.00
Magnetic disk storage machine 2000.00-10,000.00
5
Magnetic Disk

Purpose
Long term, nonvolatile storage
Lowest level in the memory hierarchy
slow, large, inexpensive
General structure
A rotating platter coated with a magnetic surface
Use a moveable read/write head to access the disk
Advantages of hard disks over floppy disks
Platters are more rigid (metal or glass) so they
can be larger
Higher density because it can be controlled more
precisely
Higher data rate because it spins faster
Can incorporate more than one platter

6
Organization of a Magnetic Disk
Sector
Platters
Track

Typical numbers (depending on the disk size)
1 to 15 (2 surface) platters per disk with 1 to
8 diameter
1,000 to 5,000 tracks per surface
63 to 256 sectors per track
the smallest unit that can be read/written
(typically 512 to 1,024 B)
Traditionally all tracks have the same number of
sectors
Newer disks with smart controllers can record
more sectors on the outer tracks (constant bit
density)

7
Magnetic Disk Characteristic

Cylinder all the tracks under the heads
at a given point on all surfaces
Read/write data is a three-stage process
Seek time position the arm over the
proper track (6 to
14 ms avg.)
due to locality of disk references
the
actual average seek time may
be only 25 to
33 of the
advertised number
Rotational latency wait for the desired
sectorto rotate under the read/write head (½ of
1/RPM)
Transfer time transfer a block of bits
(sector)under the read-write head (2 to 20
MB/sec typical)
Controller time the overhead the disk controller
imposes in performing an disk I/O access
(typically lt 2 ms)

8
Magnetic Disk Examples
Characteristic Sun X6713A Toshiba MK2016
Disk diameter (inches) 3.5 2.5
Capacity 73 GB 20 GB
MTTF (k hrs) 1,200 300
of platters - heads 2 - 4
cylinders 16,383
B/sector - sectors/track 512 - 63
Rotation speed (RPM) 10,000 4,200
Max. - Avg. seek time (ms) ? - 6.6 24 - 13
Avg. rot. latency (ms) 3 7.14
Transfer rate (PIO) 35 MB/sec 16.6 MB/sec
Power (watts) lt 2.5
Volume (in3) 4.01
Weight (oz) 3.49
9
I/O System Interconnect Issues
Processor
Receiver
Main Memory
Keyboard

A bus is a shared communication link (a set of
wires used to connect multiple subsystems)
Performance
Expandability
Resilience in the face of failure fault
tolerance

10
Performance Measures

Latency (execution time, response time) is the
total time from the start to finish of one
instruction or action
usually used to measure processor performance
Throughput total amount of work done in a given
amount of time
aka execution bandwidth
the number of operations performed per second
Bandwidth amount of information communicated
across an interconnect (e.g., a bus) per unit
time
the bit width of the operation rate of the
operation
usually used to measure I/O performance

11
I/O System Expandability

Usually have more than one I/O device in the
system
each I/O device is controlled by an I/O Controller

interrupt signals
Processor
Cache Memory
Memory - I/O Bus
I/O Controller
I/O Controller
I/O Controller
Main Memory
Terminal
Disk
Disk
Network
12
Quiz

What is disk seek time, and what is rotational
time?

13
Bus Characteristics

Control lines
Signal requests and acknowledgments
Indicate what type of information is on the data
lines
Data lines
Data, complex commands, and addresses
Bus transaction consists of
Sending the address
Receiving (or sending) the data

Control Lines
Data Lines
14
Output (Read) Bus Transaction

Defined by what they do to memory
read output transfers data from memory (read)
to I/O device (write)

15
Input (Write) Bus Transaction

Defined by what they do to memory
write input transfers data from I/O device
(read) to memory (write)

16
Advantages and Disadvantages of Buses

Advantages
Versatility
New devices can be added easily
Peripherals can be moved between computer systems
that use the same bus standard
Low Cost
A single set of wires is shared in multiple ways
Disadvantages
It creates a communication bottleneck
The bus bandwidth limits the maximum I/O
throughput
The maximum bus speed is largely limited by
The length of the bus
The number of devices on the bus
It needs to support a range of devices with
widely varying latencies and data transfer rates

17
Types of Buses

Processor-Memory Bus (proprietary)
Short and high speed
Matched to the memory system to maximize the
memory-processor bandwidth
Optimized for cache block transfers
I/O Bus (industry standard, e.g., SCSI, USB, ISA,
IDE)
Usually is lengthy and slower
Needs to accommodate a wide range of I/O devices
Connects to the processor-memory bus or backplane
bus
Backplane Bus (industry standard, e.g., PCI)
The backplane is an interconnection structure
within the chassis
Used as an intermediary bus connecting I/O busses
to the processor-memory bus

18
A Two Bus System
Processor-Memory Bus
Processor
Memory

I/O buses tap into the processor-memory bus via
Bus Adaptors (that do speed matching between
buses)
Processor-memory bus mainly for
processor-memory traffic
I/O busses provide expansion slots for I/O
devices

19
A Three Bus System
Processor-Memory Bus
Processor
Memory

A small number of Backplane Buses tap into the
Processor-Memory Bus
Processor-Memory Bus is used for processor memory
traffic
I/O buses are connected to the Backplane Bus
Advantage loading on the Processor-Memory Bus is
greatly reduced

20
I/O System Example (Apple Mac 7200)

Typical of midrange to high-end desktop system in
1997

Processor
Processor-Memory Bus
Cache Memory
Serial ports
Audio I/O
PCI Interface/ Memory Controller
Main Memory
I/O Controller
I/O Controller
PCI
CDRom
I/O Controller
I/O Controller
SCSI bus
Disk
Graphic Terminal
Network
Tape
21
Example Pentium System Organization
Processor-Memory Bus
Memory controller (Northbridge)
PCI Bus
I/O Busses
http//developer.intel.com/design/chipsets/850/ani
mate.htm?iidPCGdevside
22
Synchronous and Asynchronous Buses

Synchronous Bus
Includes a clock in the control lines
A fixed protocol for communication that is
relative to the clock
Advantage involves very little logic and can run
very fast
Disadvantages
Every device on the bus must run at the same
clock rate
To avoid clock skew, they cannot be long if they
are fast
Asynchronous Bus
It is not clocked, so requires handshaking
protocol (req, ack)
Implemented with additional control lines
Advantages
Can accommodate a wide range of devices
Can be lengthened without worrying about clock
skew or synchronization problems
Disadvantage slow(er)

23
Asynchronous Handshaking Protocol

Output (read) data from memory to an I/O device.

I/O device signals a request by raising
ReadReq and putting the addr on the data lines

Memory sees ReadReq, reads addr from data lines,
and raises Ack
I/O device sees Ack and releases the ReadReq and
data lines
Memory sees ReadReq go low and drops Ack
When memory has data ready, it places it on data
lines and raises DataRdy
I/O device sees DataRdy, reads the data from data
lines, and raises Ack
Memory sees Ack, releases the data lines, and
drops DataRdy
I/O device sees DataRdy go low and drops Ack

24
Key Characteristics of Two Bus Standards
Characteristic Firewire (1394) USB 2.0
Type I/O I/O
Data bus width(signals) 4 2
Clocking asynchronous asynchronous
Theoretical Peak bandwidth 50 MB/sec (Firewire 400) or 100 MB/sec (Firewire 800) 0.2 MB/sec (low speed), 1.5 MB/sec (full) or 60MB/sec (high)
Hot plugable Yes yes
Max. devices 63 127
Max. length (copper wire) 4.5 meters 5 meters
25
Review Major Components of a Computer
Processor
Devices
Control
Input
Memory
Datapath
Output
26
A Typical Memory Hierarchy

By taking advantage of the principle of locality
Present the user with as much memory as is
available in the cheapest technology.
Provide access at the speed offered by the
fastest technology.

On-Chip Components
Control
eDRAM
Secondary Memory (Disk)
Instr Cache
Second Level Cache (SRAM)
ITLB
Main Memory (DRAM)
Datapath
Data Cache
RegFile
DTLB
Speed (ns) .1s 1s
10s 100s
1,000s
Size (bytes) 100s Ks
10Ks Ms
Ts
Cost highest

lowest
27
Characteristics of the Memory Hierarchy
Processor
Increasing distance from the processor in access
time
L1
L2
Main Memory
Secondary Memory
(Relative) size of the memory at each level
28
Memory Hierarchy Technologies

Random Access
Random is good access time is the same for all
locations
DRAM Dynamic Random Access Memory
High density (1 transistor cells), low power,
cheap, slow
Dynamic need to be refreshed regularly (
every 8 ms)
SRAM Static Random Access Memory
Low density (6 transistor cells), high power,
expensive, fast
Static content will last forever (until power
turned off)
Size DRAM/SRAM 4 to 8
Cost/Cycle time SRAM/DRAM 8 to 16
Non-so-random Access Technology
Access time varies from location to location and
from time to time (e.g., Disk, CDROM)

29
Classical SRAM Organization (Square)
r o w d e c o d e r
RAM Cell Array
Column Selector I/O Circuits
column address
row address
One memory row holds a block of data, so the
column address selects the requested word from
that block
data word
30
Classical DRAM Organization (Square Planes)
bit (data) lines
. . .
r o w d e c o d e r
Each intersection represents a 1-T DRAM cell
word (row) select
column address
Column Selector I/O Circuits
row address

The column address
selects the requested
bit from the row in each
plane

data bit
. . .
data bit
data bit
data word
31
RAM Memory Definitions

Caches use SRAM for speed
Main Memory is DRAM for density
Addresses divided into 2 halves (row and column)
RAS or Row Access Strobe triggering row decoder
CAS or Column Access Strobe triggering column
selector
Performance of Main Memory DRAMs
Latency Time to access one word
Access Time time between request and when word
arrives
Cycle Time time between requests
Usually cycle time gt access time
Bandwidth How much data can be supplied per unit
time
width of the data channel the rate at which it
can be used

32
Classical DRAM Operation
Column Address

DRAM Organization
N rows x N column x M-bit
Read or Write M-bit at a time
Each M-bit access requiresa RAS / CAS cycle

DRAM
Row Address
N rows
M bits
M-bit Output
Cycle Time
1st M-bit Access
2nd M-bit Access
CAS
Row Address
Col Address
Row Address
Col Address
33
Ways to Improve DRAM Performance

Memory interleaving
Fast Page Mode DRAMs FPM DRAMs
www.usa.samsungsemi.com/products/newsummary/asyncd
ram/K4F661612D.htm
Extended Data Out DRAMs EDO DRAMs
www.chips.ibm.com/products/memory/88H2011/88H2011.
pdf
Synchronous DRAMS SDRAMS
www.usa.samsungsemi.com/products/newsummary/sdramc
omp/K4S641632D.htm
Rambus DRAMS
www.rambus.com/developer/quickfind_documents.html
www.usa.samsungsemi.com/products/newsummary/rambus
comp/K4R271669B.htm
Double Data Rate DRAMs DDR DRAMS
www.usa.samsungsemi.com/products/newsummary/ddrsyn
cdram/K4D62323HA.htm
. . .

34
Increasing Bandwidth - Interleaving
Access pattern without Interleaving
Cycle Time
CPU
Memory
Access Time
D1 available
Start Access for D1
D2 available
Start Access for D2
Access pattern with 4-way Interleaving
35
Problems with Interleaving

How many banks?
Ideally, the number of banks ? number of clocks
we have to wait to access the next word in the
bank
Only works for sequential accesses (i.e., first
word requested in first bank, second word
requested in second bank, etc.)
Increasing DRAM sizes gt fewer chips gt harder to
have banks
Growth bits/chip DRAM 50-60/yr
Only can use for very large memory systems (e.g.,
those encountered in supercomputer systems)

36
Fast Page Mode DRAM Operation
Column Address

Fast Page Mode DRAM
N x M SRAM to save a row

N cols
DRAM
Row Address

After a row is read into the SRAM register
Only CAS is needed to access other M-bit blocks
on that row
RAS remains asserted while CAS is toggled

N rows
M-bit Output
37
Why Care About the Memory Hierarchy?
Processor-DRAM Memory Gap
1000
CPU
Moores Law
Processor-Memory Performance Gap(grows 50 /
year)
100
Performance
10
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
38
Memory Hierarchy Goals

Fact Large memories are slow, fast memories are
small
How do we create a memory that gives the illusion
of being large, cheap and fast (most of the
time)?
by taking advantage of
The Principle of Locality Programs access a
relatively small portion of the address space at
any instant of time.

39
Memory Hierarchy Why Does it Work?

Temporal Locality (Locality in Time)
gt Keep most recently accessed data items closer
to the processor
Spatial Locality (Locality in Space)
gt Move blocks consists of contiguous words to
the upper levels

Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
40
Memory Hierarchy Terminology

Hit data appears in some block in the upper
level (Block X)
Hit Rate the fraction of memory accesses found
in the upper level
Hit Time Time to access the upper level which
consists of
RAM access time Time to determine hit/miss
Miss data needs to be retrieve from a block in
the lower level (Block Y)
Miss Rate 1 - (Hit Rate)
Miss Penalty Time to replace a block in the
upper level Time to
deliver the block the processor
Hit Time ltlt Miss Penalty

41
How is the Hierarchy Managed?

registers lt-gt memory
by compiler (programmer?)
cache lt-gt main memory
by the hardware
main memory lt-gt disks
by the hardware and operating system (virtual
memory)
by the programmer (files)

42
Summary

DRAM is slow but cheap and dense
Good choice for presenting the user with a BIG
memory system
SRAM is fast but expensive and not very dense
Good choice for providing the user FAST access
time
Two different types of locality
Temporal Locality (Locality in Time) If an item
is referenced, it will tend to be referenced
again soon.
Spatial Locality (Locality in Space) If an item
is referenced, items whose addresses are close by
tend to be referenced soon.
By taking advantage of the principle of locality
Present the user with as much memory as is
available in the cheapest technology.
Provide access at the speed offered by the
fastest technology.