ECE 545 Project 1 Introduction - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

ECE 545 Project 1 Introduction

Description:

ONE-person and TWO-person teams allowed. Teams must be ... MICKEY-128 - Steve Babbage and Matthew Dodd. Phelix - Doug Whiting, Bruce Schneier, Stefan Lucks, ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 57
Provided by: Krzysz1
Category:

less

Transcript and Presenter's Notes

Title: ECE 545 Project 1 Introduction


1
ECE 545 Project 1Introduction Specification
2
Schedule
Project 1 RTL design for FPGAs (30 points) Due
date Tuesday, November 21, midnight Final
choice of the project topic Thursday,
October 19 Progress reports
Thursday-Friday, November 2-3
Thursday-Friday, November 16-17
3
Groups
  • ONE-person and TWO-person teams allowed
  • Teams must be formed at the moment when the
    project
  • topic is selected, i.e., by Thursday, October
    19
  • TWO-person teams work on more complex versions
  • of each project topic
  • One final grade per entire team

4
Honor Code Rules
  • Using somebodys else code and presenting it as
    your own is a serious Honor Code violation and
    may result in an F grade for the entire course.
  • All student teams are expected to write and debug
    their codes by themselves and are not allowed to
    share their codes with other teams.
  • Students are encouraged to help and support each
    other in all problems related to the
  • basic understanding of the problem
  • operation of the CAD tools.

5
Project 1 - Platform tools
  • Target devices Xilinx FPGA Spartan 3 family
  • Tools
  • VHDL Simulation Aldec Active HDL or ModelSim
  • VHDL Synthesis Synplify Pro or Xilinx XST
  • Implementation Xilinx ISE or Xilinx WebPack

6
Project 1 - Final Deliverables
  • All block diagrams and ASM chartsdescribing the
    entire circuit and its components(electronic
    form, PDF)
  • All synthesizable VHDL source codes
  • All testbenches used to verify the operation of
    the entire circuit and its components, and the
    correspondinginput files containing test
    vectors, and output files containing results
  • Timing waveforms demonstrating the correct
    operationof the entire circuit and its
    components
  • Final report

7
Final Report (1)
  • Short description of the block diagrams and ASM
    charts.
  • Discussion of any alternative architectures
    and solutions.
  • 2. List of source codes and a short description
    of major
  • modules.
  • 3. Source of test vectors and a way of generating
  • these test vectors.
  • 4. Format of input output files.
  • Short description of a testbench.

8
Final Report (2)
  • 5. Results
  • resource utilization (CLB slices, LUTs,
    FFs,BRAMs, etc.)
  • post-synthesis timing
  • clock frequency
  • throughput
  • latency
  • critical path
  • post placing routing timing
  • clock frequency
  • throughput
  • latency
  • critical path

9
Final Report (3)
6. Discussion of the obtained results and and
any optimizations applied in order to obtain
the optimum design. 7. Speed-up vs. software
implementation. 8. Discussion of dependence of
results on parameters of the application. 9.
Deviations from the original specification,
encountered problems, and unresolved issues.
10
Two topics from two different areas to choose
from
Cryptography
Stream cipher qualified to Phase 2 of the eSTREAM
contest
Digital Signal Processing
Finite Impulse Response Filter
11
Stream cipher qualified to Phase 2 of the eSTREAM
contest
12
Cipher
Message / Ciphertext
m bits
Cryptographic Key
Encrypt/Decrypt
k bits
1 bit
m bits
Ciphertext / Message
13
Secret-Key Ciphers
key of Alice and Bob - KAB
key of Alice and Bob - KAB
Network
Decryption
Encryption
Bob
Alice
14
Block vs. stream ciphers
M1, M2, , Mn
m1, m2, , mn
memory
Block cipher
K
K
Stream cipher
C1, C2, , Cn
c1, c2, , cn
CifK(Mi)
ci fK(mi, mi-1, , m2, m1)
Every block of ciphertext is a function of only
one corresponding block of plaintext
Every block of ciphertext is a function of the
current and all proceeding blocks of plaintext
15
Typical stream cipher
Sender
Receiver
initialization vector (seed)
initialization vector (seed)
key
key
Pseudorandom Key Generator
Pseudorandom Key Generator
keystream
ki
keystream
ki
mi
ci
ci
mi
plaintext
ciphertext
plaintext
ciphertext
16
eSTREAM - Contest for a new stream cipher
standard, 2004-2008
PROFILE 1
  • Stream cipher suitable for
  • software implementations optimized for high
    speed
  • Key size - 128 bits
  • Initialization vector 64 bits or 128 bits

PROFILE 2
  • Stream cipher suitable for
  • hardware implementations with limited memory,
  • number of gates, or power supply
  • Key size - 80 bits
  • Initialization vector 32 bits or 64 bits

17
eSTREAM - Contest for a new stream cipher
standard, 2004-2008
Schedule of the contest
November 2004 Request for proposals 29 April
2005 Deadline for submissions 34
ciphers, 23 candidates for PROFILE 1
26 candidates for PROFILE 2 26-27
May 2005 Stream Cipher Workshop, Danmark March
2006 End of Phase I July 2006
Beginning of the evaluation part of Phase
II September 2007 End of Phase II January 2008
Final report
time
http//www.ecrypt.eu.org/stream/timetable.html
18
10 focus candidates
PROFILE 1 (Software) Dragon - Ed Dawson, Kevin
Chen, Matt Henricksen, William Millan,
Leonie Simpson, HoonJae Lee, SangJae
Moon HC-256 - Hongjun Wu LEX - Alex
Biryukov Phelix - Doug Whiting, Bruce
Schneier, Stefan Lucks,
Frédéric Muller Py - Eli Biham and
Jennifer Seberry Salsa20 - Daniel
Bernstein SOSEMANUK - Come Berbain, Olivier
Billet, Anne Canteaut,
Nicolas Courtois, Henri Gilbert, Louis Goubin,
Aline Gouget, Louis
Granboulan, Cédric Lauradoux, Marine Minier,
Thomas Pornin, Hervé
Sibert PROFILE 2 (Hardware) Grain - Martin
Hell, Thomas Johansson and Willi Meier MICKEY-128
- Steve Babbage and Matthew Dodd Phelix - Doug
Whiting, Bruce Schneier, Stefan Lucks,
Frédéric Muller Trivium -
Christophe De Cannière and Bart Preneel
19
Your task
For groups of the size ONE
implement ONE out of the following FIVE ciphers
For groups of the size TWO
implement TWO out of the following FIVE ciphers
Grain, MICKEY-128, Phelix, Salsa, Trivium
20
Optimization Criteria
I. Minimum area
II.
  • Maximum ratio

Throughput divided by Total Circuit Area CLB
slices
21
Required interface
clk
eSTREAM cipher
reset
k
key_IV
key_IV_ready
data_out
d
key_IV_write
write
d
full
data_in
data_in_ready
data_in_write
enc_dec
k1, 2, 4, 8, 16, 32, 64 d set of allowed
values specific to a given algorithm
22
Tasks of a TWO-person team
  • Implement TWO ciphers
  • Compare TWO ciphers against each other

23
eSTRAM Implementation Hints
24
Example of an eSTRAM cipher
25
Linear Feedback Shift Register (LFSR)
? L, C(D) ?
Connection polynomial, C(D)
Length
C(D) 1 c1D c2D2 . . . cLDL
26
Example of LFSR
? 4, 1DD4?
Connection polynomial, C(D)
Length
27
sj-L
sj-1
sj-2
sj-(L-1)
Initial state
sL-1, sL-2, . . . , s1, s0
LSFR recursion
sj c1sj-1 ? c2sj-2 ? . . . ? cL-1sj-(L-1) ?
cLsj-L
for j ? L
28
LFSR State Sequence
29
Non-linear Feedback Shift Register (NFSR)
30
Doubling the speed of Grain
31
Resources
eSTREAM PHASE 2 the ECRYPT Stream Cipher Project
available at http//www.ecrypt.eu.org/stream
/
Source of test vectors
Reference C implementations provided by the
authors of the algorithms.
32
Finite Impulse Response Filter
33
Topic proposed and co-advised by
Dr. David Hwang
Dr. Kathleen Wage
34
DSP Project FIR Digital Filter Design
  • Digital filters are widely used in digital
    communications and audio/video processing.
  • In particular, finite impulse response (FIR)
    filters are used for their ease of implementation
    and stability.
  • In this project, you will investigate different
    FIR filter structures and their VLSI
    implementations
  • Step 1 Implement and compare direct form versus
    direct form transposed structures
  • Step 2 Implement and compare fast FIR structures
    which reduce the number of required
    multiplications per sample

35
Example Gigabit Ethernet Transceiver
  • As seen above digital filters, boxed in blue,
    play a crucial role in digital communication
    chips such as Ethernet transceivers, cable
    modems, DSL modems, satellite receivers, mobile
    phones, etc.

36
Step 1a Direct Form FIR Filter
x(n)
Z-1
Z-1
Z-1
h0
h1
h2
hN-1
y(n)
  • An FIR filter implements a convolution in the
    time-domain
  • Critical path of N-tap filter
  • N-1 adds 1 multiply
  • Arithmetic complexity of N-tap filter modeled as
  • N multiplications/sample N-1 adds/sample
  • Problem 1a Design a parametrizable direct form
    FIR filter

37
Step 1b Direct Form Transpose FIR Filter
x(n)
hN-1
hN-2
hN-3
h0
Z-1
Z-1
Z-1
y(n)
  • Use a signal flow graph reversal to reduce the
    critical path ? transpose structure
  • Critical path of N-tap transposed filter
  • 1 add 1 multiply
  • Arithmetic complexity of N-tap filter modeled as
  • N multiplications/sample N-1 adds/sample
  • Problem 1b Design a parametrizable direct form
    transpose FIR filter

38
Step 2 Power Reduction via Parallel
Subexpression Sharing
N/2 taps
x(2n)
H0(z)
y(2n)
H0(z)H1(z)
y(2n1)
x(2n1)
H1(z)
Z-1
  • Direct form and transpose form structures
    (running at the same rate) require N
    multiplications/sample and N-1 adds/sample
  • Methods exist to reduce this complexity by
    parallel processing and subexpression sharing.
    See 1 and 2 for details and derivation.
  • In the 2-parallel structure above, two inputs
    arrive at half the original clock rate and are
    processed in parallel by three ceil(N/2)-tap
    filters ceil() is the ceiling function
  • Arithmetic complexity of the 2-parallel filter is
    approximately
  • 3 x N/2 multiplications / two samples 3 x
    (N/2-1) adds / two samples 4 adds / two samples
  • 3/4 N multiplications/sample (3N/4 1/2)
    adds/sample
  • If power is dominated by multipliers, 25 power
    savings over traditional structures!
  • Problem 2a Design a 2-parallel parametrizable
    FIR filter

39
Obtaining Coefficients of 2-Parallel Subfilters
  • Example for N 8
  • H(z) h0, h1, h2, h3, h4, h5, h6, h7
  • Subfilter coefficients obtained by performing a
    polyphase decomposition by 2. Each subfilter has
    N/2 4 coefficients
  • H0(z) h0, h2, h4, h6
  • H1(z) h1, h3, h5, h7
  • H0(z) H1(z) h0h1, h2h3, h4h5, h6h7

40
3-parallel filter
  • In the 3-parallel filter, three inputs arriving
    at a third of the original rate are processed by
    six parallel ceil(N/3)-tap filters
  • Arithmetic complexity of the 3-parallel filter is
    approximately
  • 2/3 N multiplications/sample (2/3N 4/3) adds
  • 33 reduction in multiplications/sample
  • Problem 2b Design a 3-parallel parametrizable
    FIR filter

41
Obtaining Coefficients of 3-Parallel Subfilters
  • Example for N 9
  • H(z) h0, h1, h2, h3, h4, h5, h6, h7, h8
  • Subfilter coefficients obtained by performing a
    polyphase decomposition by 3. Each subfilter has
    N/3 3 coefficients
  • H0(z) h0, h3, h6
  • H1(z) h1, h4, h7
  • H2(z) h2, h5, h8
  • H0(z) H1(z) h0h1, h3h4, h6h7
  • H1(z) H2(z) h1h2, h4h5, h7h8
  • H0(z) H1(z) H2(z) h0h1h2, h3h4h5,
    h6h7h8

42
Further parallelism
  • These parallel structures introduce issues such
    as increased area, adder overhead (pre- and
    post-processing), etc. which eventually become
    prohibitive as the subsampling rate increases

43
Assumptions
All coefficients are loaded to the circuit before
the start of processing and do not change during
the runtime. Registers storing coefficients are
connected in chain, so coefficients must be
loaded serially, in the proper order,
starting from the ones with the smallest indices.
44
Parameters of the design
N number of taps (N8, 12, 16, 24, 32) M
fractional wordlength of input (M8..10) K
fractional wordlength of output (K8..10) L
fractional wordlength of coefficients
(L7-11)
45
Required interface - basic architecture
clk
FIR Filter
reset_datapath
1.K
reset_coeff
1.M
d_out
d_in
1.L
load_coeff_done
coeff
filt_mode
( 0load coefficients, 1run filter)
load_begin
( 0idle, 1start to load coefficients)
46
Required interface 2-parallel structure
FIR Filter
clk
reset_datapath
1.K
reset_coeff
1.M
d_out_1
d_in_1
1.K
1.M
d_in_2
d_out_2
1.L
coeff
load_coeff_done
filt_mode
( 0load coefficients, 1run filter)
load_begin
( 0idle, 1start to load coefficients)
47
One-Person Team Requirements
  • Matlab code will be given for five different
    configurations (A, B, C, D, E), each with
    different values of N, M, L, and K.
  • CASE A N 8, M 8, K 8, L 7
  • CASE B N 12, M 9, K 9, L 8
  • CASE C N 16, M 9, K 10, L 9
  • CASE D N 24, M 10, K 11, L 10
  • CASE E N 32, M 11, K 12, L 11
  • Step 1 Direct form and transpose form
    structures
  • Generate parametrizable VHDL code round output
    of each multiplier to K fractional bits
  • Generate test vectors using Matlab and verify the
    test vectors in RTL for configurations A-E
  • Implement configurations B and D on FPGA
  • Optimize for minimum area
  • Optimize for maximum ratio of throughput / area
    (CLB slices)
  • Step 2 2-parallel and 3-parallel fast FIR
    structures
  • Generate parametrizable VHDL code round output
    of each multiplier to K fractional bits
  • Generate test vectors using Matlab and verify the
    test vectors in RTL for configurations B and D
  • Implement configurations B and D on FPGA
  • Optimize for minimum area
  • Optimize for maximum ratio of throughput / area
    (CLB slices)

48
Two-Person Team Additional Requirements
  • Step 3 4-parallel and 6-parallel fast FIR
    structures. See ref 2 for block diagrams.
  • Generate parametrizable VHDL code round output
    of each multiplier to K fractional bits
  • Generate test vectors using Matlab and verify the
    test vectors in RTL for configurations B and D
  • Implement configurations B and D on FPGA
  • Optimize for minimum area
  • Optimize for maximum ratio of throughput / area
    (CLB slices)
  • Step 4 Quantization studies
  • For the 6-parallel filter and configurations B
    and D, implement truncation instead of rounding
    after the multipliers.
  • Optimize for minimum area
  • Optimize for maximum ratio of throughput / area
    (CLB slices)
  • For the 4-parallel filter and configurations B
    and D, round to K4 bits after the multipliers.
    Round again to K bits right before the filter
    outputs to produce a 1.K output.
  • Optimize for minimum area
  • Optimize for maximum ratio of throughput / area
    (CLB slices)

49
Required reading
1 Z. Mou and P. Duhamel, Short-length
FIR filters and their use in fast
nonrecursive filtering, IEEE Transactions
on Signal Processing, vol. 39, no. 6, pp.
1322-1332, June 1991. 2 K.K. Parhi, VLSI
Digital Signal Processing Systems Design
and Implementation, John Wiley, pp.
256-275, 1999.
Source of test vectors
Matlab implementation provided by Dr. Hwang
50
Important Notes on Twos Complement Arithmetic
51
Project Notation
  • For this project, we are using twos complement
    fractional notation
  • An m.M number indicates a twos complement mM
    bit word with m integer bits and M fractional
    bits
  • Example 1.4 number
  • 0.111 0.875
  • 1.000 -1
  • 1.111 -0.125
  • Example 2.2 number
  • 00.11 0.75
  • 10.00 -2
  • 01.01 1.25
  • The dynamic range of an m.M number is -2m-1,
    2m-1)

52
Twos Complement Multiplication
a
1.M
1.L
b
1.ML
  • The wordlength required for the product of 1.M x
    1.L numbers
  • 2.(ML) if we assume -1 x -1 1 may occur
  • 1.(ML) if we assume -1 x -1 1 will never
    occur
  • In general a product of m.M x l.L numbers
  • (ml).(ML) if assume (most neg value of a) x
    (most neg value of b) may occur
  • (ml-1).(ML) if assume (most neg value of a) x
    (most neg value of b) will never occur
  • In this project, we assume that (most neg value
    of a) x (most neg value of b) will never occur
    for any multiplier in any filter structure. This
    is guaranteed by scaling the inputs and
    coefficients properly in Matlab.
  • Examples 1.5 x 2.5 2.10, 1.4 x 1.6 1.10, 3.4
    x 2.3 4.7

53
Twos Complement Truncation versus Rounding
  • In this project, we ask you to round the output
    of each multiplier to K fractional bits.
  • To round a k.K number to a k.K number (K lt K)
  • Truncate the k.K number to become a k.K number
  • Add the former fractional K1 bit to fractional
    position K
  • For information purposes, to truncate a k.K
    number to a k.K number (K lt K)
  • Truncate the k.K number to become a k.K number
  • Rounding and truncation produce equal noise
    variance, whereas rounding is (approximately)
    unbiased and truncation is biased

54
Truncation versus RoundingExample 2.5 number
to a 2.3 number
ROUNDING
00.01110 1 00.100
11.01000 0 11.010
10.00110 1 10.010
TRUNCATION
00.01110 00.011
10.00110 10.001
11.01000 11.010
55
Twos Complement Addition
1.M
1.M
. . .
x(n)
Z-1
Z-1
h0
h1
1.N
1.MN
ROUND
ROUND
1.K
1.K
1.K
1.K
y(n)
  • FIR filters perform chains of additions
  • A k.K number plus a k.K number requires a (k1).K
    number to represent the sum
  • Ex. 0.111 (0.75) 0.111 (0.75) 01.100 (1.5)
  • Ex. 1.000 (-1) 1.000 (-1) 10.000 (-2)
  • In general, an adder chain summing J numbers,
    each of wordlength k.K, requires a wordlength of
    (k ceil(log2(J)).K after the final adder
  • This grows for a large number of coefficients N

56
Twos Complement Adder Chain Trick using Modulo
Arithmetic
1.K
1.K
1.K
1.K
y(n)
2.K
3.K
3.K
1.K
1.K
1.K
1.K
1.K
1.K
1.K
y(n)
  • Trick if we know output of adder is bounded
    within a k.K value (where k is some known
    value), then all intermediate addition nodes only
    require k.K bit wordlengths
  • Provides hardware savings for large number of
    coefficients N!
  • This is only true if we know the output of the
    adder chain is bounded
  • Be careful, because x(2n) x(2n1) is not
    guaranteed to be bounded in 1.M you need the
    full 2.M
  • h(0) h(1) is not guaranteed to be bounded in
    1.L you need the full 2.L
  • In this project, this trick helps after
    multiplier outputs, not on multiplier inputs
  • In our project, the final output y(n) is bounded
    within a 1.K bit wordlength. This has been
    controlled by scaling the inputs and coefficients
    in Matlab.
  • To learn about more helpful hardware tricks
    take ECE 645 next semester!
Write a Comment
User Comments (0)
About PowerShow.com