TigerSHARC CLU Exploration of XCORRS for TakeHome Quiz 4 BIAWPQHI 13 April start of class - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

TigerSHARC CLU Exploration of XCORRS for TakeHome Quiz 4 BIAWPQHI 13 April start of class

Description:

parR, PRN32I, TEST_SIZE, resR, resI, &size3, false); CHECK(size3 == TEST_SIZE) ... ConvertC32_2_C1(parR, parI, PRNC1 1, size2); *size3 = size1 - size2; ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 30
Provided by: electr76
Category:

less

Transcript and Presenter's Notes

Title: TigerSHARC CLU Exploration of XCORRS for TakeHome Quiz 4 BIAWPQHI 13 April start of class


1
TigerSHARC CLUExploration of XCORRS for
Take-Home Quiz 4BIAWPQHI -- 13 April start of
class
  • M. Smith,
  • University of Calgary, Canada
  • smithmr_at_ucalgary.ca

2
Ideal -- Take Home Quiz
  • Develop tests for complex correlation
  • Time and functionality
  • Evaluate on
  • C in default and optimized mode (especially
    optimized)
  • Your optimized complex assembly code in complex
    correlation in SID and SIMD modes
  • XCORRS in complex correlation in SID and SIMD
    modes

3
Reasonable -- Take Home QuizCode and report
  • Develop Functionality and Time tests for real FIR
    -- based on Lab. 3
  • Use on optimized C and your SISD and SIMD FIR
  • Develop Functionality and Time tests for real
    correlation -- based on Lab. 3 / 4
  • Use on optimized C and your SISD and SIMD
    correlation
  • Work out (theory) speed changes expected on your
    SISD and SIMD if went to complex. Use as template
    for expected changes in optimized C
  • Develop Functionality and Time tests for complex
    FIR
  • Use on optimized C
  • Develop Functionality and Time tests for complex
    correlation
  • Use on optimized C and your SISD and SIMD
    XCORRS only
  • Report on whether changes in C code speed work
    the way you expect
  • Use these figures to scale for FIR and
    correlation to complex data
  • Report on relative speeds
  • C in default and optimized mode (especially
    optimized)
  • Your optimized complex assembly code in complex
    correlation in SID and SIMD modes
  • XCORRS in complex correlation in SID and SIMD
    modes

4
Mark assignment
  • My tests and C are available on the web
  • If you use my tests, then you must say so, and
    10 of marks are deducted
  • If you use my C code, then you must say so, and
    10 of marks are deducted
  • If you use my C code and my test, then you must
    say so, and 20 of marks are deducted

5
Speed comparison Part 1
  • Real FIR
  • float / int values , params
  • Loop
  • sum sum values params
  • 2 memory fetches
  • 1 add and 1 mult per loop cycle
  • done in ½ cycle in theory
  • Time N / 2 overhead
  • Determine overhead by measuring with and without
    the loop-sum
  • Complex FIR
  • CMPX float / int values , params
  • Loop many common factors with FFT Hint for
    final?
  • sum sum values params
  • Real sum v.re p.re v.im p.im
  • Imag sum v.re p.im v.im p.re
  • 8 memory fetches
  • 3 add / sub and 4 mult per loop
  • Time ??? overhead

6
Speed comparison Part 2
  • Speed in theory without doing anything special
  • Any special way to store complex values to speed
    up memory access?
  • Do we need to do 8 memory fetches
  • On the Blackfin?
  • In the TigerSHARC?
  • Expected optimal speed?
  • Time ??? overhead
  • Complex FIR
  • CMPX float / int values , params
  • Loop many common factors with FFT Hint for
    final?
  • sum sum values params
  • Real sum v.re p.re v.im p.im
  • Imag sum v.re p.im v.im p.re
  • 8 memory fetches
  • 3 add / sub and 4 mult per loop
  • Time ??? overhead

7
Speed comparison Part 3?
  • Do these speed calculations scale the same way
    for complex correlation as for complex FIR?
  • Do a theory calculation and then compare result
    for debug and optimized C code to validate
    within 25 of predicted changes is probably more
    than reasonable for a back-of-envelope
    calculation
  • Use scaling factor on your real FIR and
    correlation functions

8
Tests for following functions neededWhen convert
from float to int?
  • void ConvertReal2Complex(float , CMPX32 , int
    size)
  • Make Complex Real j0
  • bool ConvertC32_2_C8(CMPX32 , CMPX8 , int
    size)
  • Take bottom 8 bits of complex 32
  • Return false if overflows
  • Complex 8 is padded 2 complex in to 32 bits
    --- int in format
  • bool ConvertC32_2_C1(CMPX32 , CMPX1 , int
    size)
  • Take bottom 1 bits of complex 32
  • Return false if overflows, or if not -1
    -j1 format
  • Complex 1 is padded 16 complex in to 32 bits
    --- int in format
  • void ConvertC8_2_C32(CMPX8 , CMPX32 , int
    size) needed? YES
  • um
  • void ConvertC1_2_C32(CMPX1 , CMPX32 , int
    size) needed?

9
Tests for following functions needed
  • float RealFIR(float vals, float params, int
    size, bool overhead)
  • CMPLX ComplexFIR(CMPLX vals, CMPLX params, int
    size,
    bool
    overhead)vals in dm and params in pm
  • void RealCorrs(float vals, int size1, float
    params, int size2, float
    result, int size3, bool overhead)
  • void ComplexCorrs(CMPLX vals, int size1, CMPLX
    params, int size2, CMPLX result, int size3,
    bool overhead)
  • void XCORRS(CMPLX vals, int size1, CMPLX params,
    int size2, CMPLX result, int size3, bool
    overhead, int version)
  • version is 0 works, 1 SISD, 2 SIMD

10
Some hints
  • void XCORRS(CMPLX vals, int size1, CMPLX params,
    int size2, CMPLX result, int size3, bool
    overhead, version)
  • bool ConvertC32_2_C8(CMPX32 , dm CMPX8 ,
    int size1)
  • bool ConvertC32_2_C1(CMPX32 ,pm CMPX1 ,
    int size2)
  • size3 size1 size2
  • for result 1 to size 3
  • result 0
  • if (!overhead) XCORRS(dm CMPX8 , pm
    CMPX1 , dm? Result, size1, size2, size 3,
    whichversion

11
Some Hints
  • void ComplexCorrs(CMPLX vals, int size1, CMPLX
    params, int size2, CMPLX result, int size3,
    bool overhead)
  • if (overhead) return
  • size3 size1 size 2
  • for loop to size 3
  • resultloop ComplexFIR(vals, CMPLX
    params, int size, bool overhead)
  • val
  • end loop

12
Some decisions
  • Complex 32 first decision
  • Store real in dm space and imaginary in pm space?
  • Complex8 in dm space, Complex1 in pm space
  • Doing everything with static pm variables
  • Using dm variables on stack, in an attempt to
    avoid running out of memory
  • Try with satellite of size 2048 and PRN data of
    size 1024 but suspect may not have enough room
    when doing with Complex 32 so may have to test on
    smaller for comparison
  • I ended up generating the same data as for
    thexcorrs( ) shown last Friday size 48 16
    3. Decided that if I could handle that (3 times
    round xcorrs loop) then far enough test

13
Some Tests developed 1
TEST(ConvertReal2CMPLX32, D_TEST)
TEST_LEVEL(1) define TEST_SIZE 8 float
valuesTEST_SIZE 1.0, 2.0, 3.0, 4.0, 5.0,
6.0, 7.0, 8.0 float zerosTEST_SIZE 0, 0,
0, 0, 0, 0, 0, 0 ConvertReal2Complex(values,
C32Real, C32Imag, TEST_SIZE) ARRAYS_EQUAL(values
, C32Real, TEST_SIZE) ARRAYS_EQUAL(zeros,
C32Imag, TEST_SIZE)
14
Test for padded data C8 format
define TEST_SIZE 8 pm float imag1 TEST_SIZE
0x04, 0x14, -0x8, -0x18, 0x24, 0x34, 0x44,
0x54 float real1TEST_SIZE 0x08, 0x18, -1,
-2, 0x28, 0x38, 0x48, 0x58 TEST(ConvertToCMPLX
8, D_TEST) TEST_LEVEL(1) define TEST_SIZE
8 unsigned int result4 0x14180408,
0xE8FEF8FF, 0x34382428, 0x54584448 CHECK(!Conve
rtC32_2_C8(real1, imag1, DATAC8,
1)) CHECK(ConvertC32_2_C8(real1, imag1, DATAC8,
TEST_SIZE)) ARRAYS_EQUAL(DATAC8, result,
TEST_SIZE / 2)
15
Test for padded data C1 format
define LONGER_SIZE 32 pm float
imag2LONGER_SIZE 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, .. float
real2LONGER_SIZE 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, .. pm float
imag4LONGER_SIZE float real4LONGER_SIZE TE
ST(ConvertCMPLX1, D_TEST) TEST_LEVEL(1) unsi
gned int result12 0x00000000,
0x00000000 unsigned int result22
0xFFFFFFFF, 0xFFFFFFFF CHECK(!ConvertC32_2_C1
(real1, imag1, PRNC1, 1)) CHECK(!ConvertC32_2_C1
(real1, imag1, PRNC1, TEST_SIZE)) CHECK(!Convert
C32_2_C1(real2, imag2, PRNC1, 1)) CHECK(ConvertC
32_2_C1(real2, imag2, PRNC1, LONGER_SIZE)) ARRAY
S_EQUAL(PRNC1, result1, LONGER_SIZE / 16) for
(int i 0 i lt LONGER_SIZE i) real4i
-1 real2i imag4i -1
imag2i CHECK(ConvertC32_2_C1(real4, imag4,
PRNC1, LONGER_SIZE)) ARRAYS_EQUAL(PRNC1,
result2, LONGER_SIZE / 16)
16
RealFIR
define TEST_SIZE 8 pm float paramsTEST_SIZE
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
8.0 TEST(RealFIR, D_TEST) TEST_LEVEL(1) flo
at impulseTEST_SIZE float resultsTEST_SIZE
for (int i 0 i lt TEST_SIZE i) for
(int j 0 j lt TEST_SIZE j) // Set to
zero impulsej 0 impulsei
1 resultsi RealFIR(impulse, params,
TEST_SIZE, false) ARRAYS_EQUAL(results,
params, TEST_SIZE)
17
Complex FIR tests (3 of them)To see if I got
both Real and Imag correct
pm float resultsITEST_SIZE TEST(ComplexFIR,
D_TEST) TEST_LEVEL(1) float
impulseTEST_SIZE float resultsRTEST_SIZE
float zerosTEST_SIZE 0, 0, 0, 0, 0, 0, 0,
0 for (int i 0 i lt TEST_SIZE i)
for (int j 0 j lt TEST_SIZE j) // Set to
zero impulsej 0 impulsei 1 for
(int j 0 j lt TEST_SIZE j) C32Realj
impulsej C32Imagj 0 C32Real1j
paramsj C32Imag1j 0 ComplexFI
R(C32Real, C32Imag, C32Real1, C32Imag1,
resultsRi,
resultsIi, TEST_SIZE, false) ARRAYS_EQUA
L(resultsR, params, TEST_SIZE) ARRAYS_EQUAL(resu
ltsI, zeros, TEST_SIZE)
18
Real Correlation
pm float PRN32ITEST_SIZE 1, -1, 1, -1, 1,
0, 0, 0 TEST(RealCorrelation, D_TEST)
TEST_LEVEL(1) float dataTEST_SIZE 2
0, 0, 0, 0, 1, -1, 1, -1, 1,
0, 0, 0, 0, 0, 0, 0 float
resultTEST_SIZE int IresultTEST_SIZE int
size3 RealCorrs(data, 2 TEST_SIZE, PRN32I,
TEST_SIZE, result, size3, false) CHECK(size3
TEST_SIZE) for (int j 0 j lt TEST_SIZE
j) Iresultj resultj CHECK(MaximumLocat
ion(Iresult, TEST_SIZE) 4)
19
Complex Correlation -- Simple Test
pm float dataITEST_SIZE 2 0, 0, 0, 0,
1.0, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0 pm
float resITEST_SIZE
TEST(ComplexCorrelation, D_TEST)
TEST_LEVEL(1) float dataRTEST_SIZE 2
0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0 float resRTEST_SIZE int
IresultTEST_SIZE float parRTEST_SIZE 0,
0, 0, 0, 0, 0, 0, 0 int size3
ComplexCorrs(dataR, dataI, TEST_SIZE 2,
parR, PRN32I, TEST_SIZE,
resR, resI, size3, false)
CHECK(size3 TEST_SIZE) for (int
j 0 j lt TEST_SIZE j) Iresultj
abs(resRj) CHECK(MaximumLocation(Iresult,
TEST_SIZE) 4)
20
Complex Correlation related to results from last
lecture
for (int i 0 i lt 96 i 3)
satXCORRSRi -1 satXCORRSRi1 1
satXCORRSRi2 1 satXCORRSIi 0
satXCORRSIi1 0 satXCORRSIi2 0 for
(int i 0 i lt 48 i 3) prnXCORRSRi
-1 prnXCORRSRi1 1 prnXCORRSRi2 1
prnXCORRSIi -1 prnXCORRSIi1 1
prnXCORRSIi2 1 ComplexCorrs(satXCORRSR
, satXCORRSI, 96, prnXCORRSR, prnXCORRSI,
48, resXCORRSR,
resXCORRSI, size3, false)
CHECK(size3 48) for (int j 0 j lt 48
j) Iresultj abs(resXCORRSRj) for
(int j 1 j lt 45 j 3) CHECK(resXCORRSRj
-1 48) CHECK(resXCORRSRj
-16) CHECK(resXCORRSRj1
-16) CHECK(MaximumLocation(Iresult j, 48 -
j) 2)
21
Complex Correlation ASM related to results from
last lecture
for (int i 0 i lt 96 i 3)
satXCORRSRi -1 satXCORRSRi1 1
satXCORRSRi2 1 satXCORRSIi 0
satXCORRSIi1 0 satXCORRSIi2 0 for
(int i 0 i lt 48 i 3) prnXCORRSRi
-1 prnXCORRSRi1 1 prnXCORRSRi2 1
prnXCORRSIi -1 prnXCORRSIi1 1
prnXCORRSIi2 1
ComplexCorrsASM(satXCORRSR, satXCORRSI, 96,
prnXCORRSR, prnXCORRSI,
48, resXCORRSR, resXCORRSI,
size3, false) CHECK(size3
48) for (int j 0 j lt 48 j) Iresultj
abs(resXCORRSRj) for (int j 1 j lt 45 j
3) CHECK(resXCORRSRj-1
48) CHECK(resXCORRSRj -16) CHECK(resXCO
RRSRj1 -16) CHECK(MaximumLocation(Iresult
j, 48 - j) 2)
22
bool ConvertC32_2_C8(float inR, pm float inI,
unsigned int C8, int size) float holdR
inR pm float holdI inI for (int i
0 i lt size i) if ((inR gt 127)
(inR lt -128)) return false if ((inI gt 127)
(inI lt -128)) return false inR inI
// Not going to bother with things that
don't fit if (size 1) return false inR
holdR inI holdI for (int half 0
half lt size half 2) unsigned int first
( (int) inR) 0xFF unsigned int
second ( (int) inI) 0xFF unsigned
int third ( (int) inR) 0xFF
unsigned int fourth ( (int) inI) 0xFF
C8 ((((((fourth ltlt 8) third) ltlt 8)
second) ltlt 8) first) return
true
23
C8 ? C32 and C16 ? C32
float UINT8ToFloat(unsigned int value) if
(value 0x80) value value
0xFFFFFF00 return ( (int) value) else
return value void ConvertC8_2_C32(unsigned
int C8, float inR, pm float inI, int size)
for (int i 0 i lt size i 2)
unsigned int value C8 inR
UINT8ToFloat(value 0xFF) value gtgt 8
inI UINT8ToFloat(value 0xFF)
value gtgt 8 inR UINT8ToFloat(value
0xFF) value gtgt 8 inI
UINT8ToFloat(value 0xFF)
24
FIR filters
float RealFIR(float values, pm float params,
int size, bool overhead) if (overhead) return
0.0 float sum 0 for (int i 0 i lt size
i) sum values params return
sum pm float sumI 0 void ComplexFIR(float
valR, pm float valI, float parR, pm float
parI, float resultR, pm float resultI, int
size, bool overhead) if (overhead) resultR
resultI 0 return float sumR 0
sumI 0 // Was a static variable for (int i
0 i lt size i) sumR valR parR -
valI parI sumI valR parI valI
parR valR valI parR
parI resultR sumR resultI
sumI return
25
Correlation
void RealCorrs(float vals, int size1, pm float
params, int size2, float result, int size3,
bool overhead) if (overhead) return size3
size1 - size2 for (int j 0 j lt size2
j) result RealFIR(vals, params, size2,
overhead) void ComplexCorrs(float valR, pm
float valI, int size1, float
parR, pm float parI, int size2,
float resR, pm float resI, int size3, bool
overhead) if (overhead)
return size3 size1 - size2 for
(int j 0 j lt size2 j) ComplexFIR(valR,
valI, parR, parI, resRj, resIj, size2,
false)
26
Correlation XCORRS
extern "C" void xcorrsfunc(unsigned int C8, pm
unsigned int C1, unsigned int C16, int size)
void ComplexXCORRS(float valR, pm float valI,
int size1, float parR, pm
float parI, int size2, float
resR, pm float resI, int size3, bool overhead)
ConvertC32_2_C8(valR,
valI, DATAC8, size1) PRNC1 0x0 // Need to
shift hte PPRN to location C15 ConvertC32_2_C1(pa
rR, parI, PRNC1 1, size2) size3 size1 -
size2 if (!overhead) xcorrsfunc(DATAC8, PRNC1,
RESULTC16, size3) ConvertC16_2_C32(RESULTC16,
resR, resI, size3)
27
XCORRS same code as beforeexcept need to
transfer results out
// Shift out the values in TR registers into
results xR30 TR30 QJ6 4
xR30 xR30 TR74 QJ6 4
xR30 xR30 TR118 QJ6 4
xR30 xR30 TR1512 QJ6 4
xR30 IF NLC0E, JUMP OUTERLOOP
28
Need to get inpars and go round more than 16
times
J0 zeros // Clear the THR registers the hard
way R30 QJ0 4 THR30 R30 R74
R30 // K0 prn J2 J4
// satellite_data LC0 3 OUTERLOOP K0
J5 J2 J4 J4 J4 8 // Increment by 8
and not 16 REST OF CODE UNCHANGED // Load THR
with PRN code R10 LK0 2 THR10
R10 R10 LK0 2 THR32 R10
29
Test results
Write a Comment
User Comments (0)
About PowerShow.com