Digital Filtering In Hardware PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Digital Filtering In Hardware


1
Digital Filtering In Hardware
  • Adnan Aziz

2
Introduction
  • Digital filtering vs Analog filtering
  • More robust (process variations, temperature),
    flexible (bit precision, program), store
    recover
  • Lower performance (esp high freq), more
    area/power, cannot sense, need data-converters
  • Can perform digital filtering in hardware or
    software
  • Software (DSP/generic microprocessors) flexible,
    less up-front cost
  • Hardware (ASIC/FPGA) customized, cheaper in
    volume, lower area/power

3
Applications
  • Applications noise filtering, equalization,
    image processing, seismology, radar, ECC,
    audio/image compression
  • Focus implementing difference equations
  • No feedback FIR, feedback IIR
  • Assume coefficient synthesis done
  • Operate almost exclusively in time domain (FFT
    done)

4
Evolution
5
Various Representations
  • 3-tap FIR
  • Non terminating, repeatedly execute same code
  • iteration Execute all operations, iteration
    period time to perform iteration, iteration
    rate inverse of iteration period
  • sampling rate (aka throughput) number of samples
    per second, critical path max combinational
    delay (no wave pipelining!)
  • Block Diagram
  • Close to actual hardware interconnected
    functional blocks, potentially with delay
    elements between blocks

6
Block Diagram
7
Block Diagram
8
Signal Flow Graph
  • Unique source, sink (input and output)
  • Edges represent const multiplier, delay
  • Nodes represent I/O, adder, mult
  • Useful for wordlength effects, less for
    architecture design

9
SFG
10
Dataflow Graph
  • DFG
  • Nodes computations (functions, subtasks)
  • Edges datapaths
  • Capture data-driven nature of DSP,
    intra-iteration and inter-iteration constraints
  • Very general nonlinear, multirate, asynchronous,
    synchronous
  • Difference from block diagram
  • Hardware not allocated, scheduled in DFG

11
DFG
12
DFG
13
Multirate DFG
14
Iteration Bound
  • In DFG, execution each node once in an iteration
  • All nodes executed iteration
  • Critical path combinational path with maximum
    total execution time (Note were reserving the
    term delay for sequential delay)
  • Loop (cycle) path beginning and ending at same
    node
  • Loop bound for loop L TL/WL
  • Iteration Bound maximum of all loop bounds
  • Lower bound on execution time for DFG (assuming
    only pipelining, retiming, unfolding)

15
Iteration Bound
16
Iteration Bound
17
Iteration Bound
18
2.3
19
2.4
20
2.5
21
2.6
22
2.7
23
Pipeline and Parallelize
  • Pipelining insert delay elements to reduce
    critical path length
  • Faster (more throughput), lower power
  • Added latency, latches/clocking
  • Parallelism compute multiple outputs in a single
    clock cycle
  • Faster, lower power
  • Added hardware, sequencing logic

24
Pipelining
  • General applicable to microprocessor
    architectures, logic circuits, DFGs
  • Have to place delays (flops) carefully
  • On feed forward cutsets

25
Pipelining
26
Pipelining Parallel
27
Pipelining
28
Feed-forward Cutset
29
Transposition
30
Transposition
31
Data Broadcast
32
Fine-grain Pipelining
33
Parallel Processing
  • Process blocks at a time
  • Clock period L Sample Rate

34
Parallelism
35
Parallelism
36
Components
37
Need for Parallelism
38
Parallelism
  • Why not use pipelining?
  • May have a single large delay element that cannot
    be divided (communication between chips)
  • Can use in conjunction with pipelining
  • Relatively less efficient than pipelining (area
    cost and power savings)
  • Note that weve skirted the issue of parallizing
    general DFGs
  • Loops make life hard

39
Parallelize Pipeline
40
Area Efficiency
41
Pipelining Processors
  • Classic DLX processor
  • ISA Load/Store or Mem Access
  • 5 stages IF, ID, EX, MEM, WB
  • Pipelining processors is hard
  • Data hazards
  • ADD r1, r2, r3 SUB r4, r5, r1
  • Solution Use bypass logic
  • LD r1, r2 ADD r4, r1, r2
  • Solution?
  • Branch hazards
  • PC not changed till end of ID
  • Solution redo IF (only) if branch taken
  • Pipelining DFGs is easy (no control flow!)

42
Pipelining Processors
43
Retiming
  • Basic idea (for logic circuits)
  • Move flops back and forth across gates
  • Use for clock period reduction, flop
    minimization, power minimization, resynthesis
  • Same idea holds for DFGs
  • Examples
  • Algorithm
  • C-slow retiming

44
Retiming
45
Retiming
46
Cutset Retiming
47
Cutset Retiming
48
C-Slow Retiming
49
Min Delay Retiming
  • Formalize use notion of retiming function on
    nodes
  • Amount of delay pushed back of node (can be
    negative think of as retardation function)
  • Want to know if cycle time TC is feasible
  • set up constraints
  • Long paths have to be broken
  • No negative delays on edges
  • Solve using a custom ILP
  • Uses efficient graph algorithms

50
Unfolding
  • Analagous to loop unrolling for programs
  • for (I1 Ilt 5 I) aI bIcI
  • Many benefits, at the price of potential increase
    in code size
  • Look at 2-unfolding of
  • Y(n) x(n) a y(n-9)
  • General algorithm for J-unfolding a DFG
  • Uses J nodes for each original node, new delay
    values
  • Nontrivial fact algorithm works

51
Unfolding
52
Unfolding
53
Applications
  • Meet iteration bound
  • When a single node has large execution time
  • When IB is nonintegral

54
Applications IB
55
Application fractional IB
56
Applications Parallelize
  • Recall in Chapter 3, we never gave a systematic
    way of generating parallel circuits
  • Loop unfolding gives a way

57
Applications Bit-Digit
  • Convert a bit-serial architecture to a
    digit-serial architecture

58
Folding
  • Trade area for time
  • Use same hardware unit for multiple nodes in the
    DFG
  • Example y(n) a(n) b(n) c(n)
  • Need general systematic approach to folding
  • Math formulation folding orders, folding sets,
    folding factors

59
Folding
60
Folding
61
Folding
62
Folding
63
Folding
64
Folding
65
Register Minimization
  • Consider DSP program that produces 3 variables
  • a 1,2,3,4
  • b 2,3,4,5,6,7
  • c 5,6,7
  • Number of live variables 1,2,2,2,2,2,2
  • Intuitively, should be able to get by with 2
    registers
  • However, DSP programs are periodic
  • May have variable live across iterations

66
Linear Lifetime Chart
67
Lifetime Analysis Matrix
68
Lifetime Chart Matrix
69
Register Allocation Table
70
Reg Assignment Matrix
71
Reg Assignment Biquad
72
Reg Assignment Biquad
73
Pipelined Parallel IIR
  • Feedback loops makes pipelining and parallelism
    very hard
  • Impossible to beat iteration bound without
    rewriting the difference equation
  • Example
  • Pipeline interleaving of y(n1) a y(n) b u(n)
  • Note that IB goes up, but can run multiple
    streams in parallel

74
Pipeline Interleaved IIR
75
Pipeline Interleaved IIR
76
Pipeline Interleaved IIR
77
Pipelining 1-st Order IIR
  • Y(n) a y(n) u(n)
  • Sample rate set by multiply and add time
  • Can do better by look ahead pipelining
  • Basically, changing the difference equation to
    get more delays in the loop
  • Key functionality unchanged
  • Best understood in terms of Z-transforms

78
Pipelining 1-st Order IIR
79
Pipelining 1-st Order IIR
80
Pipelining High Order IIR
  • Three basic approaches
  • Clustered look-ahead
  • Scattered look-ahead
  • Direct synthesis with constraints

81
Pipelining High Order IIR
82
Pipelining High Order IIR
83
Pipelining High Order IIR
84
Pipelining High Order IIR
85
Pipelining High Order IIR
86
Pipelining High Order IIR
87
Pipelining High Order IIR
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com