Chapter 8: Part II - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Chapter 8: Part II

Description:

two clock cycles needed between ... 1 clock cycle to send address to memory. 40 cycles to read first 4 ... Assume the number of clock cycles for a polling ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 34
Provided by: WenH
Category:
Tags: chapter | clocked | part

less

Transcript and Presenter's Notes

Title: Chapter 8: Part II


1
Chapter 8 Part II
  • Storage, Network and Other Peripherals

2
Performance Analysis Sync. vs. Async.
  • Synchronous bus clock time50ns, each
    transaction takes one clock cycle
  • Asynchronous bus 40 ns per handshake
  • Data portion32 bits
  • Question Find the bandwidth of each bus when
    performing one-word reads from a 200ns memory.

3
Sync. vs. Async. Buses (I)
  • For the synchronous bus
  • Send the address to memory50 ns
  • Read the memory 200 ns
  • Send the data to the device 50 ns
  • Total time 300 ns, bandwidth4bytes/300ns13.3
    MB/sec

4
Sync. vs. Async. Buses (II)
  • For the asynchronous bus
  • Step 1 40 ns
  • Step 2,3,4 max(3x40, 200ns)200ns
  • Step 5,6,7 3x40ns 120ns
  • Total time 360 ns, maximum bandwidth
    4bytes/360ns 11.1 MB/s

5
Increasing Bus Bandwidth
  • Data bus width
  • Separate versus multiplexed address and data
    lines
  • Block transfers

6
Performance Analysis of Two Bus Schemes
  • Given a system with
  • a memory and bus system supporting block access
    of 4 to 16 words
  • a 64-bit synchronous bus clocked at 200MHz, with
    each 64-bit transfer taking 1 clock cycle, and 1
    clock cycle to send an address to memory
  • two clock cycles needed between each bus
    operation
  • memory access for first 4 words takes 200ns, each
    additional set of 4 words requires 20ns

7
Question
  • Find the sustained bandwidth and latency for a
    read of 256 words for transfers using 4-word
    blocks and 16-word blocks.
  • Find the effective number of bus transactions for
    each case.

8
4-Word Block Transfer
  • 1 clock cycle to send address to memory
  • 200ns/(5ns/cycle) 40 cycles to read memory
  • 2 cycles to send data from memory
  • 2 idle cycles
  • Total 45 cycles
  • 256 words requires 45x64 2880 cycles

9
4-Word Block Transfer
  • Latency 2880 cycles x 5ns/cycle 14400 ns
  • Number of bus transactions 64 x 1s/14400ns
    4.44M transactions/s
  • Bandwidth (256x4 bytes)x 1/14400ns 71.11 MB/s

10
16-Word Block Transfer
  • 1 clock cycle to send address to memory
  • 40 cycles to read first 4 words from memory
  • 2 cycles to send data, during which the read of
    the next 4 words is started.
  • 2 idle cycles between transfers, during which the
    read of the next block is completed.
  • Need to repeat the last two steps 3 times to read
    a total of 16 words.

11
16-Word Block Transfer
  • Total cycles required 1 40 4x(22) 57
    cycles
  • 256/1616 transactions are required
  • Total number of cycles required for 256 word
    16x57 912 cycles, latency 4560 ns
  • Number of bus transactions 16 x 1s/4560ns
    3.51M transactions/s
  • Bandwidth (256x4 bytes)x 1/4560ns 224.56 MB/

12
Bus Arbitration
  • Daisy chain arbitration (not very fair)
  • Centralized arbitration (requires an arbiter),
    e.g., PCI
  • Self selection, e.g., NuBus used in Macintosh
  • Collision detection, e.g., Ethernet

13
Bus Standards
  • PCI ( a general purpose backplane bus)
  • SCSI (Small Computer System Interface)
  • IEEE 1394 (Firewire)
  • USB 2.0

Characteristic Firewire(1394) USB 2.0
Bus width 4 2
Clocking asynchronous asynchronous
Peak bandwidth 50MB/s (Firewire 400) 100MB/s (Firewire 800) 0.2 MB/s 1.5 MB/s 60 MB/s
Hot pluggable Yes Yes
Max of devices 63 127
Max. Bus length 4.5M 5M
14
Interfacing I/O Devices
  • How is a user I/O request transformed into a
    device command and communicated to the device?
  • How is data actually transferred to or from a
    memory location?
  • What is the role of the operating system?

15
Role of the OS
  • The OS plays a major role in handling I/O, in
    that
  • I/O system is shared by multiple programs using
    the processor
  • I/O system often use interrupts (cause transfer
    to supervisor mode)
  • low-level control of I/O is complex

16
Communications between OS and I/O Devices
  • The OS must be able to give commands to I/O.
  • The I/O must be able to notify the OS when
    operation is completed or error has occurred.
  • Data must be transferred between memory and an
    I/O device.

17
Giving Commands to I/O
  • To give a command, the processor must be able to
    address the device and to supply command words
  • memory-mapped I/O portions of the address space
    is assigned to I/O devices
  • special I/O dedicated I/O instructions in the
    processor.

18
Communicating with the Processor
  • Polling
  • Interrupts
  • DMA

19
Polling
  • Polling processor periodically checks the status
    of I/O.
  • Overhead of polling in an I/O system
  • Example 1 mouse
  • Example 2 floppy disk
  • Example 3 hard disk

20
Mouse
  • Assume the number of clock cycles for a polling
    operation, including transferring to the polling
    routine, accessing the device, and restarting the
    user program, is 400, with a 500 MHz clock.
  • The mouse must be polled 30 times a second to
    ensure that no user movement is missed.
  • Fraction of CPU time 30x400/(500x106) 0.002

21
Floppy Disk
  • The floppy disk transfers data to the processor
    in 16-bit units and has a data rate of 50KB/s.
  • Polling rate (50KB/s)/(2 Bytes/polling) 25K
    polling/sec
  • Fraction of CPU time 25Kx400/(500x106) 2

22
Hard Disk
  • Transfer in 4-word blocks
  • transfer rate 4MB/s
  • Polling rate (4MB/s)/(4x4 Bytes/polling) 250K
    polling/sec
  • Fraction of CPU time 250Kx400/(500x106) 20

23
Overhead of Polling
  • Can do the polling only when the device is
    active, thus reducing the overhead.
  • However, the overhead is still significant,
    resulting in another design called
    interrupt-driven I/O.

24
Overhead of Interrupt-Driven I/O
  • Assume the overhead for each transfer, including
    the interrupt, is 500 cycles.
  • Cycles per second for disk 250Kx500 125x106
    cycles
  • Fraction of processor consumed
    125x106/(500x106) 25
  • Assuming disk is transferring data 5 of the
    time, fraction of CPU on average 25x51.25

25
Direct Memory Access(DMA)
  • If disk is transferring data most of the time,
    the overhead for interrupt-driven I/O is still
    high.
  • For high-bandwidth device, let the device
    controller transfer data directly to or from the
    memory without involving the processor, known as
    direct memory access.
  • Interrupt is used to signal the completion of I/O
    transfer or error.
  • Note How does it affect cache design?

26
Overhead of I/O Using DMA
  • Assume initial setup of DMA transfer takes 1000
    cycles, handling of interrupt at DMA completion
    takes 500 cycles, average transfer from disk is
    8KB
  • Each DMA transfer takes 8KB/(4MB/s) 2x10-3s
  • If the disk is constantly transferring data, it
    requires (1000500)/(2x10-3) 750x103 cycles
  • Fraction of CPU time 750x103/(500x106) 0.15

27
I/O System Design
  • Latency constraints ensuring the latency to
    complete and I/O operation is bounded.
  • Bandwidth constraints
  • Performance Analysis techniques queuing
    theory simulation analysis

28
I/O System Design- Example
  • CPU 3 BIPS, average 100,000 instructions in the
    OS per I/O operation
  • backplane bus transfer rate 1000 MB/s
  • SCSI-Ultra 320 controller with transfer rate
    320 MB/s, accommodating up to 7 disks
  • Disk bandwidth 75MB/s, seekrotational
    latency6 ms
  • Workload 64-KB reads, user program need 200,000
    instructions per I/O

29
Example
  • Find
  • the maximum sustainable I/O rate
  • the number of disks and SCSI controller required.

30
Real Stuff Buses and Network of P4
31
Intel P4 I/O Chip Sets
32
A Digital Camera
33
SoC (System on a chip)
Write a Comment
User Comments (0)
About PowerShow.com