Computer Architecture - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Computer Architecture

Description:

Speed:5,000 operations per second. Input/output:cards, lights, switches, plugs ... http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture-13/node4.ht ml ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 89
Provided by: BJS3
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture


1
Computer Architecture Related Topics
  • Ben Schrooten
  • Shawn Borchardt, Eddie Willett
  • Vandana Chopra

2
PresentationTopics
  • Computer Architecture History
  • Single Cpu Design
  • GPU Design (Brief)
  • Memory Architecture
  • Communications Architecture
  • Dual Processor Design
  • Parallel Supercomputing Design

3
Part 1 History and Single Cpu
  • Ben Schrooten

4
HISTORY!!!
One of the first computing devices to come about
was . .
The ABACUS!
5
The ENIAC 1946
  • Completed1946
  • Programmedplug board and switches
  • Speed5,000 operations per second
  • Input/outputcards, lights, switches, plugs
  • Floor space1,000 square feet

6
The EDSAC(1949) and The UNIVAC I(1951)
UNIVAC Speed1,905 operations per
second Input/outputmagnetic tape, unityper,
printer Memory size1,000 12-digit words in delay
lines Memory typedelay lines, magnetic
tape Technologyserial vacuum tubes, delay lines,
magnetic tape Floor space943 cubic
feet CostF.O.B. factory 750,000 plus 185,000
for a high speed printer
EDSAC Technologyvacuum tubes Memory1K
words Speed714 operations per second First
practical stored-program computer
7
Progression of The Architecture
Intel 4004 1971
Vacuum tubes -- 1940 1950 Transistors -- 1950
1964 Integrated circuits -- 1964 1971
Microprocessor chips -- 1971 present
8
Current CPUArchitecture
9
  • Basic CPU Overview

10
Single Bus Slow Performance
11
Example of Triple Bus Architecture
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Motherboards / Chipsets / Sockets
OH MY!
  • Chipset
  • In charge of
  • Memory Controller
  • EIDE Controller
  • PCI Bridge
  • Real Time Clock
  • DMA Controller
  • IRDA Controller
  • Keyboard
  • Mouse
  • Secondary Cache
  • Low-Power CMOS SRAM

17
Sockets
  • Socket 4 5
  • Socket 7
  • Socket 8
  • Slot 1
  • Slot A

18
(No Transcript)
19
(No Transcript)
20
GPUs
  • Allows for Real Time Rendering Graphics on a
    small PC
  • GPUs are true processing units
  • Pentium 4 contains 42 million transistors on a
    0.18 micron process
  • Geforce3 contains 57 million transistors on a
    0.15 micron manufacturing process

21
More GPU
22
Sources
Memory Functionality Dana Angluin http//zoo.cs.ya
le.edu/classes/cs201/Fall_2001/handouts/lecture-13
/node4.html   Benchmark Graphics Digital
Life http//www.digit-life.com/articles/pentium4/i
ndex3.html   Chipset and Socket
Information Motherboards.org http//www.motherboar
ds.org/articlesd/tech-planations/17_2.html   Amd
Processor Pictures Toms hardware http//www6.tomsh
ardware.com/search/search.html?categoryallwords
Athlon   GPU Info 4th Wave Inc. http//www.wave-re
port.com/tutorials/gpu.htm   NV20 Design
Pictures Digital Life http//www.digit-life.com/ar
ticles/nv20/  
Source for DX4100 Picture Oneironaut http//oneiro
naut.tripod.com/dx4100.jpg   Source for Computer
Architecture Overview Picture http//www.eecs.tula
ne.edu/courses/cpen201/slides/201Intro.pdf   Pictu
res of CPU Overview, Single Bus Architecture,
Tripe Bus Architecture Roy M. Wnek Virginia Tech.
CS5515 Lecture 5 http//www.nvc.cs.vt.edu/wnek/cs
5515/slide/Grad_Arch_5.PDF   Historical Data and
Pictures The Computer Museum History
Center. http//www.computerhistory.org/   Intel
Motherboard Diagram/Pentium 4 Picture Intel
Corporation http//www.intel.com   The Abacus
Abacus-Online-Museum http//www.hh.schule.de/meta
lltechnik-didaktik/users/luetjens/abakus/china/chi
na.htm   Information Also from Clint
Fleri http//www.geocities.com/cfleri/
23
Main Memory
24
Memory Hierarchy
25
DRAM vs. SRAM
  • DRAM is short for Dynamic Random Access Memory
  • SRAM is short for Static Random Access Memory
  • DRAM is dynamic in that, unlike SRAM, it needs to
    have
  • its storage cells refreshed or given a new
    electronic charge
  • every few milliseconds. SRAM does not need
    refreshing
  • because it operates on the principle of moving
    current that
  • is switched in one of two directions rather than
    a storage cell
  • that holds a charge in place.

26
(No Transcript)
27
Parity vs. Non-Parity
  • Parity is error detection that was developed to
    notify the user of any data errors. By adding a
    single bit to each byte of data, this bit is
    responsible for checking the integrity of the
    other 8 bits while the byte is moved or stored.
  • Since memory errors are so rare, many of todays
    memory is non-parity.

28
SIMM vs. DIMM vs. RIMM?
  • SIMM-Single In-line Memory Module
  • DIMM-Dual In-line Memory Modules
  • RIMM-Rambus In-line Memory Modules
  • SIMMs offer a 32-bit data path while DIMMs offer
    a 64-bit data path. SIMMs have to be used in
    pairs on Pentiums and more recent processors
  • RIMM is the one of the latest designs. Because
    of the fast data transfer rate of these modules,
    a heat spreader (aluminum plate covering) is used
    for each module

29
Evolution of Memory
1970 RAM / DRAM 4.77 MHz 1987
FPM 20 MHz 1995 EDO 20 MHz 1997
PC66 SDRAM 66 MHz 1998 PC100
SDRAM 100 MHz 1999 RDRAM 800 MHz
1999/2000 PC133 SDRAM 133 MHz 2000 DDR SDRAM
266 MHz 2001 EDRAM 450MHz
30
  • FPM-Fast Page Mode DRAM
  • -traditional DRAM
  • EDO-Extended Data Output
  • -increases the Read cycle between Memory and
    the CPU
  • SDRAM-Synchronous DRAM
  • -synchronizes itself with the CPU bus and runs
    at higher
  • clock speeds

31
  • RDRAM-Rambus DRAM
  • -DRAM with a very high bandwidth (1.6 GBps)
  • EDRAM-Enhanced DRAM
  • -(dynamic or power-refreshed RAM) that
    includes a
  • small amount of static RAM (SRAM) inside a
    larger
  • amount of DRAM so that many memory accesses
    will
  • be to the faster SRAM. EDRAM is sometimes
    used as
  • L1 and L2 memory and, together with Enhanced
  • Synchronous Dynamic DRAM, is known as cached
  • DRAM.

32
Read Operation
  • On a read the CPU will first try to find the data
    in the cache, if it is not there the cache will
    get updated from the main memory and then return
    the data to the CPU.

33
Write Operation
  • On a write the CPU will write the information
    into the cache and the main memory.

34
References
  • http//www-ece.ucsd.edu/weathers/ece30/downloads/
    Ch7_memory(4x).pdf
  • http//home.cfl.rr.com/bjp/eric/ComputerMemory.htm
    l
  • http//aggregate.org/EE380/JEL/ch1.pdf

35
(No Transcript)
36
Defining a Bus
  • A parallel circuit that connects the major
    components of a computer, allowing the transfer
    of electric impulses from one connected component
    to any other

37
VESA - Video Electronics Standards Association
  • 32 bit bus
  • Found mostly on 486 machines
  • Relied on the 486 processor to function
  • People started to switch to the PCI bus because
    of this
  • Otherwise known as VLB

38
ISA - Industry Standard Architecture
  • Very old technology
  • Bus speed 8mhz
  • Speed of 42.4 Mb/s maximum
  • Very few ISA ports are found in modern machines.

39
MCA - Micro Channel Bus
  • IBMs attempt to compete with the ISA bus
  • 32 bit bus
  • Automatically configured cards (Like Plug and
    Play)
  • Not compatible with ISA

40
EISA - Extended Industry Standard Architecture
  • Attempt to compete with IBMs MCA bus
  • Ran on a 8.33Mhz cycle rate
  • 32 bit slots
  • Backward compatible with ISA
  • Went the way of MCA

41
PCI Peripheral Component Interconnect
  • Speeds up to 960 Mb/s
  • Bus speed of 33mhz
  • 16-bit architecture
  • Developed by Intel in 1993
  • Synchronous or Asynchronous
  • PCI popularized Plug and Play
  • Runs at half of the system bus speed

42
PCI X
  • Up to 133 Mhz bus speed
  • 64-bit bandwidth
  • 1GB/sec throughput
  • Backwards compatible with all PCI
  • Primarily developed for increased I/O demands of
    technologies such as Fibre Channel, Gigabit
    Ethernet and Ultra3 SCSI.

43
AGP Accelerated Graphics Port
  • Essentially a high speed PCI Port
  • Capable of running at 4 times PCI bus speed.
    (133mhz)
  • Used for High speed 3D graphics cards
  • Considered a port not a bus
  • Only two devices involved
  • Is not expandable

44
(No Transcript)
45
IDE - Integrated Drive Electronics
  • Tons of other names ATA, ATA/ATAPI, EIDE,
    ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA
  • Good performance at a cheap cost
  • Most widely used interface for hard disks

46
SCSI - Small Computer System Interface skuzzy
  • Capable of handling internal/external peripherals
  • Speed anywhere from 80 640 Mb/s
  • Many types of SCSI

47

48
Serial Port
  • Uses DB9 or DB25 connector
  • Adheres to RS-232c spec
  • Capable of speeds up to 115kb/sec

49
USB
  • 1.0
  • hot plug-and-play
  • Full speed USB devices signal at 12Mb/s
  • Low speed devices use a 1.5Mb/s subchannel.
  • Up to 127 devices chained together
  • 2.0
  • data rate of 480 mega bits per second

50
USB On-The-Go
  • For portable devices.
  • Limited host capability to communicate with
    selected other USB peripherals
  • A small USB connector to fit the mobile form
    factor

51
Firewire i.e. IEEE 1394 and i.LINK
  • High speed serial port
  • 400 mbps transfer rate
  • 30 times faster than USB 1.0
  • hot plug-and-play

52
PS/2 Port
  • Mini Din Plug with 6 pins
  • Mouse port and keyboard port
  • Developed by IBM

53
Parallel port i.e. printer port
  • Old type
  • Two new types
  • ECP (extended capabilities port)
  • and EPP (enhanced parallel port)
  • Ten times faster than old parallel port
  • Capable of bi-directional communication.

54
Game Port
  • Uses a db15 port
  • Used for joystick connection to the computer

55
(No Transcript)
56
Parallel Computer Architecture
  • By
  • Vandana Chopra

57
Need for High Performance Computing
  • Theres a need for tremendous computational
    capabilities in science engineering and business
  • There are applications that require gigabytes of
    memory and gigaflops of performance

58
What is a High Performance Computer
  • Definition of a High Performance computer An
    HPC computer can solve large problems in a
    reasonable amount of time
  • Characteristics Fast Computation
  • Large memory
  • High speed
    interconnect
  • High speed input
    /output

59
How is an HPC computer made to go fast
  • Make the sequential computation faster
  • Do more things in parallel

60
Applications
  • 1gt Weather Prediction
  • 2gt Aircraft and Automobile Design
  • 3gt Artificial Intelligence
  • 4gt Entertainment Industry
  • 5gt Military Applications
  • 6gt Financial Analysis
  • 7gt Seismic exploration
  • 8gt Automobile crash testing

61
Who Makes High Performance Computers
  • SGI/Cray
  • Power Challenge Array
  • Origin-2000
  • T3D/T3E
  • HP/Convex
  • SPP-1200
  • SPP-2000
  • IBM
  • SP2
  • Tandem

62
Trends in Computer Design
  • Performance of the fastest computer has grown
    exponentially from 1945 to the present averaging
    a factor of 10 every five years
  • The growth flattened somewhat in 1980s but is
    accelerating again as massively parallel
    computers became available

63
(No Transcript)
64
Increase in the No of Processors
65
Real World Sequential Processes
  • Sequential processes we find in the world.
  • The passage of time is a classic example of a
    sequential process.
  • Day breaks as the sun rises in the morning.
  • Daytime has its sunlight and bright sky.
  • Dusk sees the sun setting in the horizon.
  • Nighttime descends with its moonlight, dark sky
    and stars.

66
Parallel Processes
  • Music
  • An orchestra performance, where every instrument
    plays its own part, and playing together they
    make beautiful music.

67
Parallel Features of Computers
  • Various methods available on computers for doing
    work in parallel are
  • Computing environment
  • Operating system
  • Memory
  • Disk
  • Arithmetic

68
Computing Environment - Parallel Features
  • Using a timesharing environment
  • The computer's resources are shared among many
    users who are logged in simultaneously.
  • Your process uses the cpu for a time slice, and
    then is rolled out while another users process
    is allowed to compute.
  • The opposite of this is to use dedicated mode
    where yours is the only job running.
  • The computer overlaps computation and I/O
  • While one process is writing to disk, the
    computer lets another process do some computation

69
Operating System - Parallel Features
  • Using the UNIX background processing facility
  • a.out gt results
  • man etime
  • Using the UNIX Cron jobs feature
  • You submit a job that will run at a later time.
  • Then you can play tennis while the computer
    continues to work.
  • This overlaps your computer work with your
    personal time.

70
Memory - Parallel Features
  • Memory Interleaving
  • Memory is divided into multiple banks, and
    consecutive data elements are interleaved among
    them.
  • There are multiple ports to memory. When the
    data elements that are spread across the banks
    are needed, they can be accessed and fetched in
    parallel.
  • The memory interleaving increases the memory
    bandwidth.

71
Memory - Parallel Features(Cont)
  • Multiple levels of the memory hierarchy
  • Global memory which any processor can access.
  • Memory local to a partition of the processors.
  • Memory local to a single processor
  • cache memory
  • memory elements held in registers

72
Disk - Parallel Features
  • RAID disk
  • Redundant Array of Inexpensive Disk
  • Striped disk
  • When a dataset is written to disk, it is broken
    into pieces which are written simultaneously to
    different disks in a RAID disk system.
  • When the same dataset is read back in, the pieces
    of the dataset are read in parallel, and the
    original dataset is reassembled in memory.

73
Arithmetic - Parallel Features
  • We will examine the following features that lend
    themselves to parallel arithmetic
  • Multiple Functional Units
  • Super Scalar arithmetic
  • Instruction Pipelining

74
Parallel Machine Model
(Architectures)
  • von Neumann Computer

75
MultiComputer
  • A multicomputer comprises a number of von Neumann
    computers or nodes linked by a interconnection
    network
  • In a idealized network the cost of sending the a
    message between two nodes is independent of both
    node location and other network traffic but does
    depend on message length

76
  • Locality
  • Scalibility
  • Concurrency

77
Distributed Memory (MIMD)
  • MIMD means that each processor can execute
  • separate stream of instructions on its own
    local data,distributed memory means that memory
    is distributed among the processors rather than
    placed in a central location

78
  • Difference between multicomputer and MIMD
  • The cost of sending a message between
    multicomputer and the distributed memory is not
    independent of node location and other network
    traffic

79
Examples of MIMD machine
80
MultiProcessor or Shared Memory MIMD
  • All processors share access to a common memory
    via bus or hierarchy of buses

81
Example for Shared Memory MIMD
  • Silicon Graphics Challenge

82
SIMD Machines
  • All processors execute the same instruction
    stream on a different piece of data

83
Example of SIMD machine
  • MasPar MP

84
Use of Cache
  • Why is cache used on parallel computers?
  • The advances in memory technology arent keeping
    up with processor innovations.
  • Memory isnt speeding up as fast as the
    processors.
  • One way to alleviate the performance gap between
    main memory and the processors is to have local
    cache.
  • The cache memory can be accessed faster than the
    main memory.
  • Cache keeps up with the fast processors, and
    keeps them busy with data.

85
Shared Memory
Network
Cache
Cache
Cache
Memory 1
Memory 2
Memory 3
processor
processor
processor
1
2
3
86
Cache Coherence
  • What is cache coherence?
  • Keeps a data element found in several caches
    current with each other and with the value in
    main memory.
  • Various cache coherence protocols are used.
  • snoopy protocol
  • directory based protocol

87
Various Other Issues
  • Data Locality Issue
  • Distributed Memory Issue
  • Shared Memory Issue

88
Thanks
Write a Comment
User Comments (0)
About PowerShow.com