SKAMPLFD Correlator - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

SKAMPLFD Correlator

Description:

A new low-frequency spectral line instrument. ... Concept and design Sydney. Hardware Tasmania. Correlator Verilog MIT. Filterbank VHDL Sydney ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 32
Provided by: bun9
Category:

less

Transcript and Presenter's Notes

Title: SKAMPLFD Correlator


1
SKAMP/LFD Correlator
  • FPGAs in Radioastronomy
  • 5-8 February 2007, Hobart, Tasmania,
  • John Bunton
  • CSIRO ICT Centre, Sydney

2
Molonglo
  • 1960s
  • Commissioned as 408MHz One--Mille Mills Cross
  • (1600m long arms)
  • 1980s
  • Converted to the Molonglo Observatory Synthesis
    Telescope
  • (MOST), a 843MHzsynthesis telescope, using
    just the E--W arm
  • 2000s
  • SKAMP

3
SKAMP
  • A new low-frequency spectral line instrument.
  • Features wide field of view imaging,
    polarisation, spectral line capability, RFI
    mitigation..
  • Strategy parallel 3-stage re-development of MOST
  • Science technology prototyping for the Square
    Kilometre Array (SKA)
  • 1 collecting area,
  • wide-field imaging
  • Line feeds
  • Signal transport
  • Correlators

4
SKAMP stage 1 (2004-2006)
  • Continuum correlator using existing IF
  • Observing frequency 843MHz,
  • IF 4.4 MHz wide at 11MHz,
  • 96 inputs 4560 baselines
  • Processing in Xilinx Spartan II and 3e FPGAs
  • After digitising - FPGA processing for delay,
    complex to real conversion and fringe stopping
  • Correlator systolic array 16x16 (256 baselines at
    once)
  • Array reuse to calculate all baselines (18
    passes)
  • Lesson data duplication leads to an increase in
    routing resources
  • Must store all correlations on chip (18 per MAC)

5
SKAMP stage 2 3 (2006 - )
  • Stage 2
  • New downconversion system 100MHz bandwidth
  • 30MHz useable limited by existing feed
  • Digitise at antenna fibre optic transmission to
    correlator
  • Spectral line correlator 368 inputs, resolution
    5-7kHz
  • Stage 3
  • Broadband dual polarisation linefeed, 0.6-1.2 GHz
  • New LNAs, analogue beamforming
  • 100 MHz bandwidth
  • Same digitisation, signal transport, and
    correlator

6
Common Correlator Design
  • Developing a correlator of technologies for
    SKAMP2/3 has now been adopted by LFD (MIT and
    Aust Uni Consortium) only modification needed
    -4 extra GigE
  • Prototype for the xNTD correlator
  • Same correlator board useable for 30 to 512
    antennas
  • Team SKAMP / LFD correlator
  • Uni Sydney/ ATNF - Ludi de Souza,
  • Uni Sydney Duncan Campbell-Wilson, Adrian Blake
  • Domain 42 John Russel, Chris Weimann
  • MIT Roger Cappallo, Bart Kincaid
  • ATNF/ ICT Centre John Bunton, Jaysri Joseph

7
SKAMP 3 specification
  • 368 inputs 67,712 signal pairs
  • 100 MHz 36.8GHz total bandwidth
  • Close to Australia Telescope upgrade
  • 6.7 Tera complex multiply-accumulates per sec
  • Similar to EVLA
  • 5kHz resolution 20,000 channels
  • average to get less resolution extra
    programming for finer
  • 1,354 Mega correlations per integration period
  • Enormous data set on FPGA storage insufficient
  • Image Processing !!!!!

8
FX correlators
  • General principle divide and conquer
  • Filter data to multiple subbands and distribute a
    subset of subbands for all inputs to each
    correlator units
  • Main problem is
  • Getting the right data to right FPGA at the right
    time

9
384 way Cross Connect
  • Switch? 736 GigeE ports -
  • Leads to custom design
  • Too hard to do one stage so do in stages
  • 1st stage (8-way)
  • First FPGA Coarse Filter bank can handle 8 inputs
  • 2nd stage (12-way)
  • Backplane interconnection between 12 filterbank
    board
  • 3rd stage (2 way)
  • Data aggregation after fine filterbanks
  • 4th stage (2 way)
  • Cables from two card cages

10
System Overview
Diagram Ludi de Souza
11
Advanced TCA
  • 12 filterbank board 12 way cross connect on
    backplane

12
Backplane Crossconnect
  • Using Rocket I/O over backplane
  • Two pairs to and from each pair of board
  • Implementation three FX20
  • Output mapping and null connection different for
    each board
  • Send nine bands/sets to outer FPGA and re-route
    selected fourth band to centre FPGA
  • Input to board invariant

Diagram Ludi de Souza
13
Filterbank/Crossconnect board
Diagram Ludi de Souza
14
First Stage Filterbank
  • Have broken filtering into two stages
  • Cant do eight 32k filterbanks in one SX35
  • First stage needs to divide data into 12
    sub-bands
  • But 12 requires mixed radix FFT
  • instead implement 256 real
  • Gives 128 channel
  • 8 boards get 11 channels and 4 get 10
  • Standard processing advance data by length of FFT
    at each operation of the FFT
  • Critical sampling - corruption due aliasing
  • Solution oversampled filtebank

Diagram Ludi de Souza
15
Oversampling
  • Increase output data rate for each channel
    without altering channel characteristics
  • 15 increase in data rate
  • All data across 100 MHz can now be recovered
    without aliasing
  • Method for successive filter bank operations
    advance input data by less than FFT length

16
Second stage filterbank
  • Oversampling leads to data duplication at band
    edges
  • Pass each band through second filterbank and
    discard duplicated data
  • Hope to use SX25 in final implementation
  • Example show 12 channel first filterbank and 2048
    second

Diagram Ludi de Souza
Diagram Ludi de Souza
17
Second stage problem
  • In simplest implementation each second stage FPGA
    is processing 48 2048-band filter banks
  • 12 stage FIR x 1010 bits x 2048 491520 bits
    by 48 inputs
  • Not enough on chip memory
  • Use external RAM, need capacity of DRAM
  • DRAM bandwidth limits performance
  • 2 SX35 have sufficient processing but have 4 to
    get DRAM BW
  • Must reorder data so that long sequences for each
    input are processed
  • Process each antenna for 2048512 samples..
  • For a PFB with 12-point FIR, efficiency is
    2048/2060 99

18
Fine Filter Bank Input Re-ordering
Diagram Ludi de Souza
Diagram Ludi de Souza
19
Input Ordering for Correlator
  • Correlator needs data ordered so that it receives
    data for a single frequency channel for all
    inputs at one time
  • Filter bank is generating data for one input at a
    time
  • Second re-ordering operation needed
  • Originally in same memory as input re-ordering
  • Design modification
  • Replace two FX20 by single FX60 and add DRAM

20
Output re-ordering
Diagram Ludi de Souza
21
Correlator Task
  • 368 inputs 67,712 signal pairs
  • (Even worse for LFD 1024 inputs - 523,776
    signal pairs)
  • 20,000 frequency channels
  • 1,354 Mega correlations
  • DRAM for long term accumulations
  • But each correlation occurs at a 5kHz rate
  • CMAC units operate at 300MHz
  • Each handles 60,000 correlation
  • With hundreds of CMAC/FPGA still need hundreds of
    FPGAs
  • Developed Correlation Cell concept to ease data
    flow
  • 35,000 correlation at one time in a single SX35
  • Two FPGAs to process a single channel

22
67,712 correlation in two SX35
  • Systolic array too inflexible
  • 144 CMAC before new data required
  • New data needed 470 times to process one time
    sample
  • 24 values for each use of the array total input
    11,280 samples
  • Problem is even worse for LFD
  • Approach developed Correlation Cell
  • Combination of multiply-accumulate and storage
  • Each cell handles 256 correlations at a time
  • 37,000 correlations per FPGA simultaneously
  • Total input data less than 1000 samples per time
    instance
  • 512 time sample short integration on chip

23
Correlation Cell
  • Input 16 pairs of data
  • 4bit complex multiply in 18-bit multiplier
  • Accumulation to block RAM
  • Calculate 256 correlation, 512 successive time
    samples
  • Data reordering in filterbank
  • xNTD 4-7 cells for all correlations
  • 30-70 MHz BW per FPGA
  • All baselines LFD 16, SKAMP3 2 FPGAs
  • 1.2-1.5 MHz of bandwidth

24
Board Manufacture Simplification (1)
  • Manufacture of correlator board a major task
  • Examples SKAMP1, EVLA
  • Correlation cell reduces input data rate into
    correlation chip
  • For individual correlation cell 2 sets of 16
    inputs requires 256 clock cycles to process.
  • Data rate reduction up to a factor of 16
  • This value approached for SKAMP3 and LFD

25
Board Manufacture Simplification (2)
  • Correlation cell also reduces data duplication
  • SKAMP1 4x4 systolic array, EVLA 8x8 systolic
    array
  • Data duplication 8 in EVLA, higher in SKAMP1 due
    to array reuse
  • Each Correlation Cell process 256 correlations at
    once
  • Can reduce size of systolic array by sqrt(256)16
  • No data duplication on board for up to 150
    antennas
  • Data duplication 1.5 for SKAMP3, and 4 for LFD,
    LFD 16 FPGAs data input 1/sqrt(16) of total per
    FPGA
  • Correlation cell leads to a large simplification
    of correlator board

26
Putting it Together The SKAMP prototype
Correlations
Correlator interface
Control and data output for SKAMP
Long Term Accumulations
27
Input, Daisy Chain, Route, Autocorrelate
Infiniband 1X
  • Two FX20
  • Interconnection for high antenna number designs
  • Input 16 rocket I/O on unidirectional Infinband
  • Output 16 Rocket I/O unidirectional Infiniband
  • Can daisy chain modules for reuse of data in
    further processing modules beamforming, pulsar
    processor

Infiniband 4X
28
Compute Engine
  • Eight SX35 FPGAs
  • Input 16, Output 18 LVDS per FPGA low I/O leads
    to reduction in support chips
  • 1152 Correlation cells total
  • Up to 294k correlations on board at a time (256
    per cell)
  • Data re-ordering in filterbank to achieve
  • Processing of 512 time values for each frequency
    channel
  • Then dump to LTA

29
Long Term Accumulator intermediate routing
  • Number of Correlation require DRAM for storage
  • Data rate requires a DIMM modules per pair of
    SX35

30
Estimated Performance
  • Correlator board clock rate 330MHz, 192
    cells/FPGA, 6 FPGAs
  • Board processing rate 400GCMACs/s 2.8Tops/s
  • Power consumption 100W per board
  • Power efficiency 0.25W/GCMAC/s (4bit FX)
  • EVLA an order of magnitude higher (not pure FX)
  • Filterbank board 3.2Gsample/s, two polyphase
    filterbanks 32 operations per sample
  • Board processing rate 100Gops/s (18 bit)
  • Power consumption 60W
  • Power efficiency 0.6W/Gop

31
Conclusion
  • Common hardware, hardware modules and VHDL for
    SKAMP3 and LFD (prototyping for xNTD)
  • SKAMP3 in the lead with filterbank and correlator
    hardware well on the way
  • Initial Manufacture this year
  • Correlator common to all using correlation cell
    to gain required flexibility
  • Developing international project - distributed
    design team
  • Concept and design Sydney
  • Hardware Tasmania
  • Correlator Verilog MIT
  • Filterbank VHDL Sydney
Write a Comment
User Comments (0)
About PowerShow.com