Title: 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 6
114332331Computer Architecture and Assembly
LanguageFall 2003Week 6
- Adapted from Dave Pattersons UCB CS152 slides
and - Mary Jane Irwins PSU CSE331 slides
2Heads Up
- This weeks material
- VHDL modeling
- Reading assignment Y, Chapter 4 and 5
- MIPS arithmetic operations
- Reading assignment PH 4.1 through 4.3
- Next weeks material
- MIPS logic and multiply instructions
- Reading assignment PH 4.4
- MIPS ALU design
- Reading assignment PH 4.5
3Review Entity-Architecture Features
Entity External Characteristics
Design Entity-Architecture Hardware
Component
Architecture (Body ) Internal Behavior or
Structure
- Entity defines externally visible characteristics
- Ports channels of communication
- Architecture defines the internal behavior or
structure - Declaration of internal signals
- Description of behavior
- concurrent behavioral description collection of
CSAs - process behavioral description CSAs and
variable assignment statements within a process
description - structural description system described in
terms of the interconnections of its components
4Review Model of Execution
- CSAs are executed concurrently - textural order
of the statements is irrelevant to the correct
operation - Two stage model of circuit execution
- first stage
- all CSAs with events occurring at the current
time on signals on their right hand side (RHS)
are evaluated - all future events that are generated from this
evaluation are scheduled - second stage
- time is advanced to the time of the next event
- VHDL programmer specifies
- events - with CSAs
- delays - with CSAs with delay annotation
- concurrency - by having a distinct CSA for each
signal
5Review Signal Resolution
- Resolving values of pairs of std_logic type
signals - When a signal has multiple drivers (e.g., a bus),
the value of the resulting signal is determined
by a resolution function
U unknown X forcing unknown 0 1 Z high imped W weak unknown L weak 0 H weak 1 - dont care
U U U U U U U U U U
X U X X X X X X X X
0 U X 0 X 0 0 0 0 X
1 U X X 1 1 1 1 1 X
Z U X 0 1 Z W L H X
W U X 0 1 W W W W X
L U X 0 1 L W L W X
H U X 0 1 H W W H X
- U X X X X X X X X
6Motivation for Process Construct
- How would you build the logic (and the VHDL code)
for a 32 by 2 multiplexor given inverters and 2
input nands?
SEL
A
0
DOUT
1
B
7MUX CSA Description
entity MUX32X2 is port(A,B in
std_logic_vector(31 downto 0) DOUT out
std_logic_vector(31 downto 0) SEL in
std_logic) end MUX32X2
- How can we describe the circuit in VHDL if we
dont know what primitive gates we will be
designing with?
8Mux Process Description
entity MUX32X2 is port(A,B in
std_logic_vector(31 downto 0) DOUT out
std_logic_vector(31 downto 0) SEL in
std_logic) end MUX32X2 architecture
process_behavior of MUX32X2 is begin
mux32x2_process process(A, B, SEL) begin if
(SEL 0) then DOUT lt A after 5 ns
else DOUT lt B after 4 ns end if end
process mux32x2_process end process_behavior
SEL
A
0
DOUT
1
B
- Process fires whenever a signal in the
sensitivity list changes
9VHDL Process Features
- Process body is executed sequentially to
completion in zero (simulation) time - Delays are associated only with assignment of
values to signals - marked by CSAs lt operator
- Variable assignments take effect immediately
- marked by operator
- Upon initialization all processes are executed
once - After initialization processes are data-driven
- activated by events on signals in sensitivity
list - waiting for the occurrence of specific events
using wait statements
10Process Programming Constructs
- if-then-else
- Boolean valued expressions are evaluated
sequentially until first true is encountered - case
- branches must cover all
possible values for
the case
expression - for loop
- loop index declared (locally) by virtue of use in
loop stmt - loop index cannot be assigned a value or altered
in loop body - while loop
- condition may involve variables modified within
the loop
if (expression1 value1) then . . . elsif
(expression2 value2) then . . . end if
case (expression) is when value0 gt . .
. end case
for index in value1 to value2 loop
while (condition) loop
11Behavioral Description of a Register File
Register File
write_cntrl
src1_addr
src1_data
src2_addr
32 words
dst_addr
src2_data
write_data
32 bits
library IEEE use IEEE.std_logic_1164.all use
IEEE.std_logic_arith.all entity regfile is
port(write_data in std_logic_vector(31 downto
0) dst_addr,src1_addr,src2_addr in
UNSIGNED(4 downto 0) write_cntrl in
std_logic src1_data,src2_data out
std_logic_vector(31 downto 0)) end regfile
12Behavioral Description of a Register File, cont
architecture process_behavior of regfile is
type reg_array is array(0 to 31) of
std_logic_vector (31 downto 0) begin
regfile_process process(src1_addr,src2_addr,write
_cntrl) variable data_array reg_array (
(X00000000), (X00000000), . . .
(X00000000)) variable addrofsrc1,
addrofsrc2, addrofdst integer begin
addrofsrc1 conv_integer(src1_addr)
addrofsrc2 conv_integer(src2_addr)
addrofdst conv_integer(dst_addr) if
write_cntrl 1 then data_array(addrofds
t) write_data end if src1_data lt
data_array(addrofsrc1) after 10 ns
src2_data lt data_array(addrofsrc2) after 10 ns
end process regfile_process end
process_behavior
13Process Construct with Wait Statement
Q
library IEEE use IEEE.std_logic_1164.all use
IEEE.std_logic_arith.all entity dff is
port(D,clk in std_logic Q,Qbar out
std_logic) end dff architecture dff_behavior
of dff is begin output process begin
wait until (clkevent and clk 1) Q lt D
after 5 ns Qbar lt not D after 5 ns end
process output end dff_behavior
D
dff
Qbar
clk
positive edge-triggered
14Wait Statement Types
- Wait statements specify conditions under which a
process may resume execution after suspension - wait for time expression
- suspends process for a period of time defined by
the time expression - wait on signal
- suspends process until an event occurs on one (or
more) of the signals - wait until condition
- suspends process until condition evaluates to
specified Boolean - wait
- Process resumes execution at the first statement
following the wait statement
wait for (20 ns)
wait on clk, reset, status
wait until (clkevent and clk 1)
15Signal Attributes
- Attributes are used to return various types of
information about a signal
Function attribute Function
signal_nameevent Boolean value signifying a change in value on this signal
signal_nameactive Boolean value singifying an assignment made to this signal (may not be a new value!)
signal_namelast_event Time since the last event on this signal
signal_namelast_active Time since the signal was last active
signal_namelast_value Previous value of this signal
16Things to Remember About Processes
- A process must have either a sensitivity list or
at least one wait statement - A process cannot have both a sensitivity list and
a wait statement - Remember, all processes are executed once when
the simulation is started - Dont confuse signals and variables.
- Signals are declared either in the port
definitions in the entity description or as
internal signals in the architecture description.
They are used in CSAs. Signals will be updated
only after the next simulation cycle. - Variable exist only inside architecture process
descriptions. They are used in variable
assignment statements. Variables are updated
immediately.
17Finite State Machine Structure
a
z
comb
b
Q(0)
D(0)
Q(1)
D(1)
clk
18Structural VHDL Model
- System is described by its component
interconnections - assumes we have previously designed
entity-architecture descriptions for both comb
and dff with behavioral models
a
in1
z
out1
b
comb
in2
c_state(1)
nxt_state(1)
nxt_state(0)
c_state(0)
Q(0)
D(0)
Qbar(0)
Q(1)
D(1)
Qbar(1)
clk
clk
19Finite State Machine Structural VHDL
entity seq_circuit is port(in1,in2,clk in
std_logic out1 out std_logic) end
seq_circuit architecture structural of
seq_circuit is component comb port(a,b in
std_logic z out std_logic c_state in
std_logic_vector (1 downto 0) nxt_state out
std_logic_vector (1 downto 0)) end
component component dff port(D,clk in
std_logic Q,Qbar out std_logic) end
component for all comb use entity
work.comb(comb_behavior) for all dff use entity
work.dff(dff_behavior) signal s1,s2
std_logic_vector (1 downto 0) begin C0comb
port map(agtin1,bgtin2,c_stategts1,zgtout1,
nxt_stategts2) D0dff port
map(Dgts2(0),clkgtclk,Qgts1(0),Qbargtopen) D1df
f port map(Dgts2(1),clkgtclk,Qgts1(1),Qbargtopen)
end structural
20Summary
- Introduction to VHDL
- A language to describe hardware
- entity symbol, architecture schematic,
signals wires - Inherently concurrent (parallel)
- Has time as concept
- Behavioral descriptions of a component
- can be specified using CSAs
- can be specified using one or more processes and
sequential statements - Structural descriptions of a system are specified
in terms of its interconnections - behavioral models of each component must be
provided
21- Because ease of use is the purpose, this ratio of
function to conceptual complexity is the ultimate
test of system design. Neither function alone
nor simplicity alone defines a good design. - The Mythical Man-Month, Brooks, pg.
43
22Review MIPS ISA
Category Instr Op Code Example Meaning
Arithmetic (R I format) add 0 and 32 add s1, s2, s3 s1 s2 s3
Arithmetic (R I format) subtract 0 and 34 sub s1, s2, s3 s1 s2 - s3
Arithmetic (R I format) add immediate 8 addi s1, s2, 6 s1 s2 6
Arithmetic (R I format) or immediate 13 ori s1, s2, 6 s1 s2 v 6
Data Transfer (I format) load word 35 lw s1, 24(s2) s1 Memory(s224)
Data Transfer (I format) store word 43 sw s1, 24(s2) Memory(s224) s1
Data Transfer (I format) load byte 32 lb s1, 25(s2) s1 Memory(s225)
Data Transfer (I format) store byte 40 sb s1, 25(s2) Memory(s225) s1
Data Transfer (I format) load upper imm 15 lui s1, 6 s1 6 216
Cond. Branch (I R format) br on equal 4 beq s1, s2, L if (s1s2) go to L
Cond. Branch (I R format) br on not equal 5 bne s1, s2, L if (s1 !s2) go to L
Cond. Branch (I R format) set on less than 0 and 42 slt s1, s2, s3 if (s2lts3) s11 else s10
Cond. Branch (I R format) set on less than immediate 10 slti s1, s2, 6 if (s2lt6) s11 else s10
Uncond. Jump (J R format) jump 2 j 2500 go to 10000
Uncond. Jump (J R format) jump register 0 and 8 jr t1 go to t1
Uncond. Jump (J R format) jump and link 3 jal 2500 go to 10000 raPC4
23Review MIPS Organization, so far
Processor
Memory
Register File
11100
src1 addr
src1 data
5
32
src2 addr
32 registers (zero - ra)
5
dst addr
read/write addr
src2 data
5
write data
230 words
32
32
32
32 bits
br offset
read data
32
Add
PC
32
32
32
32
Add
32
4
write data
01100
32
01000
32
00100
7
6
5
4
32
00000
ALU
0
1
2
3
32
word address (binary)
32 bits
32
byte address (big Endian)
24Arithmetic
- Where we've been
- Abstractions
- Instruction Set Architecture (ISA)
- Assembly and machine language
- What's up ahead
- Implementing the architecture (in VHDL)
zero
ovf
1
1
A
32
result
ALU
32
B
32
4
m (operation)
25ALU VHDL Representation
entity ALU is port(A, B in std_logic_vector
(31 downto 0) m in std_logic_vector (3
downto 0) result out std_logic_vector (31
downto 0) zero out std_logic ovf out
std_logic) end ALU architecture
process_behavior of ALU is . . . begin ALU
process begin . . . result A
B . . . end process ALU end
process_behavior
26Number Representation
- Bits are just bits (have no inherent meaning)
- conventions define the relationships between bits
and numbers - Binary numbers (base 2) - integers
- 0000 ? 0001 ? 0010 ? 0011 ? 0100 ? 0101 ? 0110 ?
0111 ? 1000 ? 1001 ? . . . - in decimal from 0 to 2n-1 for n bits
- Of course, it gets more complicated
- storage locations (e.g., register file words) are
finite, so have to worry about overflow (i.e.,
when the number is too big to fit into 32 bits) - have to be able to represent negative numbers,
e.g., how do we specify -8 in - addi sp, sp, -8 sp sp - 8
- in real systems have to provide for more that
just integers, e.g., fractions and real numbers
(and floating point)
27Possible Representations
Sign Mag. Twos Comp. Ones Comp.
1000 -8
1111 -7 1001 -7 1000 -7
1110 -6 1010 -6 1001 -6
1101 -5 1011 -5 1010 -5
1100 -4 1100 -4 1011 -4
1011 -3 1101 -3 1100 -3
1010 -2 1110 -2 1101 -2
1001 -1 1111 -1 1110 -1
1000 -0 1111 -0
0000 0 0000 0 0000 0
0001 1 0001 1 0001 1
0010 2 0010 2 0010 2
0011 3 0011 3 0011 3
0100 4 0100 4 0100 4
0101 5 0101 5 0101 5
0110 6 0110 6 0110 6
0111 7 0111 7 0111 7
- Issues
- balance
- number of zeros
- ease of operations
- Which one is best? Why?
28MIPS Representations
- 32-bit signed numbers (2s complement)0000
0000 0000 0000 0000 0000 0000 0000two 0ten0000
0000 0000 0000 0000 0000 0000 0001two
1ten0000 0000 0000 0000 0000 0000 0000 0010two
2ten... - 0111 1111 1111 1111 1111 1111 1111 1110two
2,147,483,646ten0111 1111 1111 1111 1111 1111
1111 1111two 2,147,483,647ten1000 0000 0000
0000 0000 0000 0000 0000two
2,147,483,648ten1000 0000 0000 0000 0000 0000
0000 0001two 2,147,483,647ten1000 0000 0000
0000 0000 0000 0000 0010two
2,147,483,646ten... - 1111 1111 1111 1111 1111 1111 1111 1101two
3ten1111 1111 1111 1111 1111 1111 1111 1110two
2ten1111 1111 1111 1111 1111 1111 1111 1111two
1ten - What if the bit string represented addresses?
- need operations that also deal with only positive
(unsigned) integers
maxint
minint
29Review Signed Binary Representation
2s comp decimal
1000 -8
1001 -7
1010 -6
1011 -5
1100 -4
1101 -3
1110 -2
1111 -1
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
-23
-(23 - 1)
1011 then add a 1 1010 complement all the bits
23 - 1
30Two's Complement Operations
- Negating a two's complement number complement
all the bits and add a 1 - remember negate and invert are quite
different! - Converting n-bit numbers into numbers with more
than n bits - MIPS 16-bit immediate gets converted to 32 bits
for arithmetic - copy the most significant bit (the sign bit) into
the other bits 0010 -gt 0000 0010 1010 -gt
1111 1010 - sign extension versus zero extend (lb vs. lbu)
31Goal Design a ALU for the MIPS ISA
- Must support the Arithmetic/Logic operations of
the ISA - Tradeoffs of cost and speed based on frequency of
occurrence, hardware budget
32MIPS Arithmetic and Logic Instructions
31
25
20
15
5
0
R-type
op
Rs
Rt
Rd
funct
I-Type
op
Rs
Rt
Immed 16
Type op funct ADDI 001000 xx ADDIU 001001 xx S
LTI 001010 xx SLTIU 001011 xx ANDI 001100 xx ORI 0
01101 xx XORI 001110 xx LUI 001111 xx
Type op funct ADD 000000 100000 ADDU 000000
100001 SUB 000000 100010 SUBU 000000 100011 AND 00
0000 100100 OR 000000 100101 XOR 000000 100110 NOR
000000 100111
Type op funct 000000 101000 000000 101001 SLT
000000 101010 SLTU 000000 101011 000000 101100
- Signed arithmetic generates overflow, but no
carry out
33Design Trick Divide Conquer
- Break the problem into simpler problems, solve
them and glue together the solution - Example assume the immediates have been taken
care of before the ALU - now down to 10 operations
- can encode in 4 bits
00 add 01 addu 02 sub 03 subu 04 and 05 or
06 xor 07 nor 12 slt 13 sltu
34Addition Subtraction
- Just like in grade school (carry/borrow 1s)
0111 0111 0110 0110 - 0110 - 0101 - Two's complement operations easy
- subtraction using addition of negative numbers
0111 ? 0111
- 0110 ? 1010 - Overflow (result too large for finite computer
word) - e.g., adding two n-bit numbers does not yield an
n-bit number 0111 0001 1000
35Building a 1-bit Binary Adder
carry_in
A B carry_in carry_out S
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
A
1 bit Full Adder
S
B
carry_out
- S A xor B xor carry_in
- carry_out A?B v A?carry_in v
B?carry_in - (majority function)
- How can we use it to build a 32-bit adder?
- How can we modify it easily to build an
adder/subtractor?
36Building 32-bit Adder
- Just connect the carry-out of the least
significant bit FA to the carry-in of the next
least significant bit and connect . . .
- Ripple Carry Adder (RCA)
- advantage simple logic, so small (low cost)
- disadvantage slow and lots of glitching (so
lots of energy consumption)
37Building 32-bit Adder/Subtractor
- Remember 2s complement is just
- complement all the bits
- add a 1 in the least significant bit
A 0111 ? 0111
B - 0110 ? 1010
38Overflow Detection and Effects
- Overflow the result is too large to represent
in the number of bits allocated - When adding operands with different signs,
overflow cannot occur! Overflow occurs when - adding two positives yields a negative
- or, adding two negatives gives a positive
- or, subtract a negative from a positive gives a
negative - or, subtract a positive from a negative gives a
positive - On overflow, an exception (interrupt) occurs
- Control jumps to predefined address for exception
- Interrupted address (address of instruction
causing the overflow) is saved for possible
resumption - Don't always want to detect (interrupt on)
overflow
39New MIPS Instructions
Category Instr Op Code Example Meaning
Arithmetic (R I format) add unsigned 0 and 33 addu s1, s2, s3 s1 s2 s3
Arithmetic (R I format) subt unsigned 0 and 35 subu s1, s2, s3 s1 s2 - s3
Arithmetic (R I format) add imm. unsigned 9 addiu s1, s2, 6 s1 s2 6
Data Transfer load byte unsigned 36 lbu s1, 25(s2) s1 Memory(s225)
Cond. Branch (I R format) set on less than unsigned 0 and 43 sltu s1, s2, s3 if (s2lts3) s11 else s10
Cond. Branch (I R format) set on less than imm. unsigned 11 sltiu s1, s2, 6 if (s2lt6) s11 else s10
- Sign extend - addiu, sltiu
- Zero extend - lbu
- No overflow detected - addu, subu, addiu, sltu,
sltiu
40Conclusion
- We can build an ALU to support the MIPS ISA
- we can efficiently perform subtraction using
twos complement - we can replicate a 1-bit ALU to produce a 32-bit
ALU - Important points about hardware
- all of the gates are always working (concurrent)
- the speed of a gate is affected by the number of
inputs to the gate (fan-in) and the number of
gates that the output is connected to (fan-out) - the speed of a circuit is affected by the number
of gates in series (on the critical path or the
number of levels of logic) - Our primary focus comprehension, however,
- Clever changes to organization can improve
performance (similar to using better algorithms
in software)