Computer Architecture

About This Presentation

Title:

Computer Architecture

Description:

Speed:5,000 operations per second. Input/output:cards, lights, switches, plugs ... http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture-13/node4.ht ml ... – PowerPoint PPT presentation

Number of Views:211

Avg rating:3.0/5.0

Slides: 89

Provided by: BJS3

Category:

more less

Transcript and Presenter's Notes

Title: Computer Architecture

1
Computer Architecture Related Topics

Ben Schrooten
Shawn Borchardt, Eddie Willett
Vandana Chopra

2
PresentationTopics

Computer Architecture History
Single Cpu Design
GPU Design (Brief)
Memory Architecture

Communications Architecture
Dual Processor Design
Parallel Supercomputing Design

3
Part 1 History and Single Cpu

Ben Schrooten

4
HISTORY!!!
One of the first computing devices to come about
was . .
The ABACUS!
5
The ENIAC 1946

Completed1946
Programmedplug board and switches
Speed5,000 operations per second
Input/outputcards, lights, switches, plugs
Floor space1,000 square feet

6
The EDSAC(1949) and The UNIVAC I(1951)
UNIVAC Speed1,905 operations per
second Input/outputmagnetic tape, unityper,
printer Memory size1,000 12-digit words in delay
lines Memory typedelay lines, magnetic
tape Technologyserial vacuum tubes, delay lines,
magnetic tape Floor space943 cubic
feet CostF.O.B. factory 750,000 plus 185,000
for a high speed printer
EDSAC Technologyvacuum tubes Memory1K
words Speed714 operations per second First
practical stored-program computer
7
Progression of The Architecture
Intel 4004 1971
Vacuum tubes -- 1940 1950 Transistors -- 1950
1964 Integrated circuits -- 1964 1971
Microprocessor chips -- 1971 present
8
Current CPUArchitecture
9

Basic CPU Overview

10
Single Bus Slow Performance
11
Example of Triple Bus Architecture
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Motherboards / Chipsets / Sockets
OH MY!

Chipset

In charge of
Memory Controller
EIDE Controller
PCI Bridge
Real Time Clock
DMA Controller

IRDA Controller
Keyboard
Mouse
Secondary Cache
Low-Power CMOS SRAM

17
Sockets

Socket 4 5
Socket 7
Socket 8
Slot 1
Slot A

18
(No Transcript)
19
(No Transcript)
20
GPUs

Allows for Real Time Rendering Graphics on a
small PC

GPUs are true processing units

Pentium 4 contains 42 million transistors on a
0.18 micron process

Geforce3 contains 57 million transistors on a
0.15 micron manufacturing process

21
More GPU
22
Sources
Memory Functionality Dana Angluin http//zoo.cs.ya
le.edu/classes/cs201/Fall_2001/handouts/lecture-13
/node4.html Benchmark Graphics Digital
Life http//www.digit-life.com/articles/pentium4/i
ndex3.html Chipset and Socket
Information Motherboards.org http//www.motherboar
ds.org/articlesd/tech-planations/17_2.html Amd
Processor Pictures Toms hardware http//www6.tomsh
ardware.com/search/search.html?categoryallwords
Athlon GPU Info 4th Wave Inc. http//www.wave-re
port.com/tutorials/gpu.htm NV20 Design
Pictures Digital Life http//www.digit-life.com/ar
ticles/nv20/
Source for DX4100 Picture Oneironaut http//oneiro
naut.tripod.com/dx4100.jpg Source for Computer
Architecture Overview Picture http//www.eecs.tula
ne.edu/courses/cpen201/slides/201Intro.pdf Pictu
res of CPU Overview, Single Bus Architecture,
Tripe Bus Architecture Roy M. Wnek Virginia Tech.
CS5515 Lecture 5 http//www.nvc.cs.vt.edu/wnek/cs
5515/slide/Grad_Arch_5.PDF Historical Data and
Pictures The Computer Museum History
Center. http//www.computerhistory.org/ Intel
Motherboard Diagram/Pentium 4 Picture Intel
Corporation http//www.intel.com The Abacus
Abacus-Online-Museum http//www.hh.schule.de/meta
lltechnik-didaktik/users/luetjens/abakus/china/chi
na.htm Information Also from Clint
Fleri http//www.geocities.com/cfleri/
23
Main Memory
24
Memory Hierarchy
25
DRAM vs. SRAM

DRAM is short for Dynamic Random Access Memory
SRAM is short for Static Random Access Memory
DRAM is dynamic in that, unlike SRAM, it needs to
have
its storage cells refreshed or given a new
electronic charge
every few milliseconds. SRAM does not need
refreshing
because it operates on the principle of moving
current that
is switched in one of two directions rather than
a storage cell
that holds a charge in place.

26
(No Transcript)
27
Parity vs. Non-Parity

Parity is error detection that was developed to
notify the user of any data errors. By adding a
single bit to each byte of data, this bit is
responsible for checking the integrity of the
other 8 bits while the byte is moved or stored.
Since memory errors are so rare, many of todays
memory is non-parity.

28
SIMM vs. DIMM vs. RIMM?

SIMM-Single In-line Memory Module
DIMM-Dual In-line Memory Modules
RIMM-Rambus In-line Memory Modules
SIMMs offer a 32-bit data path while DIMMs offer
a 64-bit data path. SIMMs have to be used in
pairs on Pentiums and more recent processors
RIMM is the one of the latest designs. Because
of the fast data transfer rate of these modules,
a heat spreader (aluminum plate covering) is used
for each module

29
Evolution of Memory
1970 RAM / DRAM 4.77 MHz 1987
FPM 20 MHz 1995 EDO 20 MHz 1997
PC66 SDRAM 66 MHz 1998 PC100
SDRAM 100 MHz 1999 RDRAM 800 MHz
1999/2000 PC133 SDRAM 133 MHz 2000 DDR SDRAM
266 MHz 2001 EDRAM 450MHz
30

FPM-Fast Page Mode DRAM
-traditional DRAM
EDO-Extended Data Output
-increases the Read cycle between Memory and
the CPU
SDRAM-Synchronous DRAM
-synchronizes itself with the CPU bus and runs
at higher
clock speeds

RDRAM-Rambus DRAM
-DRAM with a very high bandwidth (1.6 GBps)
EDRAM-Enhanced DRAM
-(dynamic or power-refreshed RAM) that
includes a
small amount of static RAM (SRAM) inside a
larger
amount of DRAM so that many memory accesses
will
be to the faster SRAM. EDRAM is sometimes
used as
L1 and L2 memory and, together with Enhanced
Synchronous Dynamic DRAM, is known as cached
DRAM.

32
Read Operation

On a read the CPU will first try to find the data
in the cache, if it is not there the cache will
get updated from the main memory and then return
the data to the CPU.

33
Write Operation

On a write the CPU will write the information
into the cache and the main memory.

34
References

http//www-ece.ucsd.edu/weathers/ece30/downloads/
Ch7_memory(4x).pdf
http//home.cfl.rr.com/bjp/eric/ComputerMemory.htm
l
http//aggregate.org/EE380/JEL/ch1.pdf

35
(No Transcript)
36
Defining a Bus

A parallel circuit that connects the major
components of a computer, allowing the transfer
of electric impulses from one connected component
to any other

37
VESA - Video Electronics Standards Association

32 bit bus
Found mostly on 486 machines
Relied on the 486 processor to function
People started to switch to the PCI bus because
of this
Otherwise known as VLB

38
ISA - Industry Standard Architecture

Very old technology
Bus speed 8mhz
Speed of 42.4 Mb/s maximum
Very few ISA ports are found in modern machines.

39
MCA - Micro Channel Bus

IBMs attempt to compete with the ISA bus
32 bit bus
Automatically configured cards (Like Plug and
Play)
Not compatible with ISA

40
EISA - Extended Industry Standard Architecture

Attempt to compete with IBMs MCA bus
Ran on a 8.33Mhz cycle rate
32 bit slots
Backward compatible with ISA
Went the way of MCA

41
PCI Peripheral Component Interconnect

Speeds up to 960 Mb/s
Bus speed of 33mhz
16-bit architecture
Developed by Intel in 1993
Synchronous or Asynchronous
PCI popularized Plug and Play
Runs at half of the system bus speed

42
PCI X

Up to 133 Mhz bus speed
64-bit bandwidth
1GB/sec throughput
Backwards compatible with all PCI
Primarily developed for increased I/O demands of
technologies such as Fibre Channel, Gigabit
Ethernet and Ultra3 SCSI.

43
AGP Accelerated Graphics Port

Essentially a high speed PCI Port
Capable of running at 4 times PCI bus speed.
(133mhz)
Used for High speed 3D graphics cards
Considered a port not a bus
Only two devices involved
Is not expandable

44
(No Transcript)
45
IDE - Integrated Drive Electronics

Tons of other names ATA, ATA/ATAPI, EIDE,
ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA
Good performance at a cheap cost
Most widely used interface for hard disks

46
SCSI - Small Computer System Interface skuzzy

Capable of handling internal/external peripherals
Speed anywhere from 80 640 Mb/s
Many types of SCSI

47

48
Serial Port

Uses DB9 or DB25 connector
Adheres to RS-232c spec
Capable of speeds up to 115kb/sec

49
USB

1.0
hot plug-and-play
Full speed USB devices signal at 12Mb/s
Low speed devices use a 1.5Mb/s subchannel.
Up to 127 devices chained together
2.0
data rate of 480 mega bits per second

50
USB On-The-Go

For portable devices.
Limited host capability to communicate with
selected other USB peripherals
A small USB connector to fit the mobile form
factor

51
Firewire i.e. IEEE 1394 and i.LINK

High speed serial port
400 mbps transfer rate
30 times faster than USB 1.0
hot plug-and-play

52
PS/2 Port

Mini Din Plug with 6 pins
Mouse port and keyboard port
Developed by IBM

53
Parallel port i.e. printer port

Old type
Two new types
ECP (extended capabilities port)
and EPP (enhanced parallel port)
Ten times faster than old parallel port
Capable of bi-directional communication.

54
Game Port

Uses a db15 port
Used for joystick connection to the computer

55
(No Transcript)
56
Parallel Computer Architecture

By
Vandana Chopra

57
Need for High Performance Computing

Theres a need for tremendous computational
capabilities in science engineering and business
There are applications that require gigabytes of
memory and gigaflops of performance

58
What is a High Performance Computer

Definition of a High Performance computer An
HPC computer can solve large problems in a
reasonable amount of time
Characteristics Fast Computation
Large memory
High speed
interconnect
High speed input
/output

59
How is an HPC computer made to go fast

Make the sequential computation faster
Do more things in parallel

60
Applications

1gt Weather Prediction
2gt Aircraft and Automobile Design
3gt Artificial Intelligence
4gt Entertainment Industry
5gt Military Applications
6gt Financial Analysis
7gt Seismic exploration
8gt Automobile crash testing

61
Who Makes High Performance Computers

SGI/Cray
Power Challenge Array
Origin-2000
T3D/T3E
HP/Convex
SPP-1200
SPP-2000
IBM
SP2
Tandem

62
Trends in Computer Design

Performance of the fastest computer has grown
exponentially from 1945 to the present averaging
a factor of 10 every five years
The growth flattened somewhat in 1980s but is
accelerating again as massively parallel
computers became available

63
(No Transcript)
64
Increase in the No of Processors
65
Real World Sequential Processes

Sequential processes we find in the world.
The passage of time is a classic example of a
sequential process.
Day breaks as the sun rises in the morning.
Daytime has its sunlight and bright sky.
Dusk sees the sun setting in the horizon.
Nighttime descends with its moonlight, dark sky
and stars.

66
Parallel Processes

Music
An orchestra performance, where every instrument
plays its own part, and playing together they
make beautiful music.

67
Parallel Features of Computers

Various methods available on computers for doing
work in parallel are
Computing environment
Operating system
Memory
Disk
Arithmetic

68
Computing Environment - Parallel Features

Using a timesharing environment
The computer's resources are shared among many
users who are logged in simultaneously.
Your process uses the cpu for a time slice, and
then is rolled out while another users process
is allowed to compute.
The opposite of this is to use dedicated mode
where yours is the only job running.
The computer overlaps computation and I/O
While one process is writing to disk, the
computer lets another process do some computation

69
Operating System - Parallel Features

Using the UNIX background processing facility
a.out gt results
man etime
Using the UNIX Cron jobs feature
You submit a job that will run at a later time.
Then you can play tennis while the computer
continues to work.
This overlaps your computer work with your
personal time.

70
Memory - Parallel Features

Memory Interleaving
Memory is divided into multiple banks, and
consecutive data elements are interleaved among
them.
There are multiple ports to memory. When the
data elements that are spread across the banks
are needed, they can be accessed and fetched in
parallel.
The memory interleaving increases the memory
bandwidth.

71
Memory - Parallel Features(Cont)

Multiple levels of the memory hierarchy
Global memory which any processor can access.
Memory local to a partition of the processors.
Memory local to a single processor
cache memory
memory elements held in registers

72
Disk - Parallel Features

RAID disk
Redundant Array of Inexpensive Disk
Striped disk
When a dataset is written to disk, it is broken
into pieces which are written simultaneously to
different disks in a RAID disk system.
When the same dataset is read back in, the pieces
of the dataset are read in parallel, and the
original dataset is reassembled in memory.

73
Arithmetic - Parallel Features

We will examine the following features that lend
themselves to parallel arithmetic
Multiple Functional Units
Super Scalar arithmetic
Instruction Pipelining

74
Parallel Machine Model
(Architectures)

von Neumann Computer

75
MultiComputer

A multicomputer comprises a number of von Neumann
computers or nodes linked by a interconnection
network
In a idealized network the cost of sending the a
message between two nodes is independent of both
node location and other network traffic but does
depend on message length

Locality
Scalibility
Concurrency

77
Distributed Memory (MIMD)

MIMD means that each processor can execute
separate stream of instructions on its own
local data,distributed memory means that memory
is distributed among the processors rather than
placed in a central location

Difference between multicomputer and MIMD
The cost of sending a message between
multicomputer and the distributed memory is not
independent of node location and other network
traffic

79
Examples of MIMD machine
80
MultiProcessor or Shared Memory MIMD

All processors share access to a common memory
via bus or hierarchy of buses

81
Example for Shared Memory MIMD

Silicon Graphics Challenge

82
SIMD Machines

All processors execute the same instruction
stream on a different piece of data

83
Example of SIMD machine

MasPar MP

84
Use of Cache

Why is cache used on parallel computers?
The advances in memory technology arent keeping
up with processor innovations.
Memory isnt speeding up as fast as the
processors.
One way to alleviate the performance gap between
main memory and the processors is to have local
cache.
The cache memory can be accessed faster than the
main memory.
Cache keeps up with the fast processors, and
keeps them busy with data.

85
Shared Memory
Network
Cache
Cache
Cache
Memory 1
Memory 2
Memory 3
processor
processor
processor
1
2
3
86
Cache Coherence

What is cache coherence?
Keeps a data element found in several caches
current with each other and with the value in
main memory.
Various cache coherence protocols are used.
snoopy protocol
directory based protocol

87
Various Other Issues