Advanced Topics on FPGA Applications Screen A - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Advanced Topics on FPGA Applications Screen A

Description:

... planes generated by same particle tracks are organized together to form triplets. ... Comb (CIC) Sum of. 2nd Order. The CIC sum is a sliding sum of sliding sums. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 36
Provided by: jywu2
Learn more at: http://www-ppd.fnal.gov
Category:

less

Transcript and Presenter's Notes

Title: Advanced Topics on FPGA Applications Screen A


1
Advanced Topics on FPGA ApplicationsScreen A
  • Wu, Jinyuan
  • Fermilab
  • IEEE NSS 2007 Refresher Course
  • Supplemental Materials
  • Oct, 2007

2
Doublet Matching,Hash Sorter
3
Hit Matching
Software FPGA Typical FPGA Resource Saving Approaches
O(n2) for() for() O(n)O(N) Comparator Array Hash Sorter O(n)O(N) in RAM
O(n3) for() for() for() O(n)O(N2) CAM, Hugh Trans. Tiny Triplet Finder O(n)O(NlogN)
O(n4) for() for() for() for()
4
Hash Sorter
  • Pass 1
  • Data in Group 1 are stored in the hash sorter
    bins based on key number K.
  • Pass 2
  • Data in Group 2 are fetched though and paired up
    with corresponding Group 1 data with same key
    number K.

K
D
Group 1
K
Group 2
K
D
5
Hash Sorter
K
6
Hash Sorter Implementation
Single clock cycle fast reset
Pipelined structure Single clock cycle push or
pop
7
An Example of Track Recognition Event
  • We explain the track recognition process using
    this 20-track example.

8
Tangent Angle Measurements
  • There are various techniques to measure the
    tangent angle of the track segment (or doublet,
    or cluster).
  • Sometimes extra ghost segments may exist.
  • The ghost segments may be resolved in track
    recognition process later.

a
9
A Large Curvature Track
  • A soft track hits large f region.
  • A global algorithm is better suited.
  • The high-pT approximation is not valid
    globally.
  • Exact track equation is needed.

R
r
Parameter Radius of curvature
Measure the tangent angle..
f
a0
Parameter Initial angle
10
An Example of Track Recognition Clustering
For doublets on the seeding super layer in this
bin
The 9-bin scheme
The 4-bin scheme
For doublets on the seeding super layer in this
bin
search for coincident in these 9 bins.
search for coincident in these 4 bins.
The doublets in clusters are grouped together.
clustering
c0
The ghost doublets are gone.
a0
11
FPGA Block Diagram
Hash sorters for a0
Hash sorters for c0
12
Without Full Track Recognition
  • Two track parameters can be calculated for each
    doublet.
  • Useful trigger primitives can be found without
    full track recognition.
  • For example

13
Triplet Finding,Tiny Triplet Finder
14
Hit Matching
Software FPGA Typical FPGA Resource Saving Approaches
O(n2) for() for() O(n)O(N) Comparator Array Hash Sorter O(n)O(N) in RAM
O(n3) for() for() for() O(n)O(N2) AM, CAM, Hugh Trans. Tiny Triplet Finder O(n)O(NlogN)
O(n4) for() for() for() for()
15
Hits, Hit Data Triplets
  • Hit data come out of the detector planes in
    random order.
  • Hit data from 3 planes generated by same particle
    tracks are organized together to form triplets.

16
TTF OperationsPhase I Filling Bit Arrays
Bit Array/Shifters
Note Flipped Bit Order
  • xA xC 2 xB
  • xA - xC constant

Physical Planes
Fill a corresponding logic cell.
For any hit
17
TTF Operations Phase II Making Match
Bit Array/Shifters
Triplet is found.
Logically shift the bit array.
Perform bit-wise AND in this range.
Physical Planes
For any center plane hit
18
Tiny Triplet FinderReuse Coincident Logic via
Shifting Hit Patterns
C3
C2
C1
One set of coincident logic is implemented.
For an arbitrary hit on C3, rotate, i.e., shift
the hit patterns for C1 and C2 to search for
coincidence.
19
Tiny Triplet Finder for Circular Tracks
Also works with more than 3 layers
Shifter
Shifter
Bit-wise Coincident Logic
Bit Array
Bit Array
  1. Fill the C1 and C2 bit arrays. (n1 clock cycles)
  2. Loop over C3 hits, shift bit arrays and check for
    coincidence. (n3 clock cycles)

R1/R3
R2/R3
Triplet Map Output To Decoder
20
Tiny? Yes, Tiny! Logic Cell Usage
AM, CAM, Hough Transform etc., O(N2)
Tiny Triplet Finder O(NlogN)
21
Complex Triplet Fining Problems
22
Options of Sequence Control
23
Micro-computing vs. Reconfigurable Computing
(1003-4)57 ?
100
3
Data 100,3,4,5,7
4
5
7
Control
LD
(-)
()
()
()
FPGA
Data
CPU
Data
Program
Program
Configuration
  • In microprocessor, the users specify program on
    fixed logic circuits.
  • In FPGA, the users specify logic circuits (as
    well as program).
  • The FPGA computing needs not to follow
    microprocessor architectures. (But useful
    experiences can be borrowed.)
  • The usefulness of FPGA reconfigurable computing
    is still to be fully appreciated.

24
ELMS Enclosed Loop Micro-Sequencer
Allows jump back as in microprocessors
Special in ELMS Supports FOR loops at machine
code level
  • PCROM is a good sequencer in FPGA.
  • Adding Conditional Branch Logic allows the
    program to loop back.
  • Loop Return Logic Stack is a special feature
    in ELMS that supports FOR loops at machine code
    level.

PC Control Signals Opration 00 000000000000000
01 001000100011010 LD R1, n 02 000010001000000
LD R2, addr_a 03 000000000000100 LD R3,
addr_X 04 000000010001000 LD R7,
0 05 000000000100001 BckA1 LD R4,
(R2) 06 000100000010000 INC R2 07 000001000100000
LD R5, (R3) 08 000100010000001 INC R3 09 001001
000100000 MUL R6, R4, R5 0a 000000010001000 EndA1
ADD R7, R7, R6 0b 000010000010000 DEC R1 0c 0000
00100000100 BRNZ BckA1
25
Software Using Spread Sheet as Compiler
26
Whats Good about ELMSNo ALU gt Small Resource
Usage
Princeton Architecture
Harvard Architecture
Fermilab Architecture(?)
Program DATA Memory
Program Control
Program Memory
Program Control
Program Memory
Sequencer (ELMS)
ALU
ALU
DATA Memory
DATA Memory
Data Processor
  • The Princeton Architecture is more suitable at
    system level while Harvard Architecture is better
    suited at micro-structure level.
  • Regular microprocessors cannot run looped program
    without an ALU.
  • The ALU takes large amount of resource while may
    not be efficiently utilized for data processing
    tasks in FPGA.
  • The ELMS can run nested loop program without an
    ALU.
  • Further separation of Program and data is
    therefore possible.
  • The ELMS is kept small.

27
Recursive Structure
28
The Digitizer Card for the Fermilab Beam Loss
Monitor System
  • Beam loss input signals from ion chambers are
    integrated and digitized.
  • Sliding sums are accumulated and compared with
    pre-loaded thresholds.
  • Over threshold in several places causes beam
    abort based on pre-defined setting.
  • Beam loss signals are filtered and de-rippled
    for display purposes.
  • Sequence is controlled by Seq128 block.

29
Filter Functions
21ms/sample 124 samples
Sliding Sum
Cascaded Integrator Comb (CIC) Sum of 2nd Order
First Zero _at_ 360 Hz
  • The CIC sum is a sliding sum of sliding sums.
  • The frequency response of CIC sum is a sinc2(x)
    function that has 2nd order zeros and better stop
    band suppression.

Frequency
30
Filter Implementation
Recursive ! IIR
Finite Impulse Respond (FIR)
Infinite Impulse Respond (IIR)
Non-Recursive Implementation
Yes
NO
Resource Friendly
Recursive Implementation
Possible
Yes
Sliding Sum
  • The non-recursive implementation needs
  • 124 memory fetches,
  • 124 additions and
  • more ops for longer sum lengths.
  • The recursive implementation needs
  • 1 memory fetch,
  • 2 add/sub operations
  • regardless sum length.

31
BLM DC Process Sequencing
Fully Sequencing
Partially Flat
  • The processes of calculating sliding sums and CIC
    sums are fully sequenced.
  • The de-ripple processor is flat for the process
    path. But it operates sequentially for 4
    channels.

32
The EndThanks
33
Resource Saving Tricks
Loop Reduction Tricks The number of computations
in a given task is reduced by (1) using fewer
iterations in loops or/and (2) using fewer
operations in each iteration.
Non-Loop Reduction Tricks The number of
computations in a given task is unchanged. The
FPGA resource is saved by (1) reusing the
resources multiple times via sequencing or/and
(2) using transistor-saving resources such as RAM.
34
Resource Saving TricksLoop-Reduction
Recursive Implementation of FIR Filter
Tiny Triplet Finder O(n)O(Nlog(N))
Multiplier-less (ML) Approaches
FFT O(n)O(log(N))
35
Resource Saving TricksNon-Loop-Reduction
Sequencing
Using RAM Hash Sorter/Histogram
Initialization
Initialization 1
Initialization 2
Initialization 3
OP4
OP3
OP2
OP1
OP2
OP3
OP4
OP1
OP4
OP3
OP2
OP1
OP2
OP3
OP4
OP1
OP4
OP3
OP2
OP1
OP2
OP3
OP4
OP1
OP4
OP3
OP2
OP1
OP2
OP3
OP4
OP1
Write a Comment
User Comments (0)
About PowerShow.com