Title: TigerSHARC CLU Exploration of XCORRS for TakeHome Quiz 4 BIAWPQHI 13 April start of class
1TigerSHARC CLUExploration of XCORRS for
Take-Home Quiz 4BIAWPQHI -- 13 April start of
class
- M. Smith,
- University of Calgary, Canada
- smithmr_at_ucalgary.ca
2Ideal -- Take Home Quiz
- Develop tests for complex correlation
- Time and functionality
- Evaluate on
- C in default and optimized mode (especially
optimized) - Your optimized complex assembly code in complex
correlation in SID and SIMD modes - XCORRS in complex correlation in SID and SIMD
modes
3Reasonable -- Take Home QuizCode and report
- Develop Functionality and Time tests for real FIR
-- based on Lab. 3 - Use on optimized C and your SISD and SIMD FIR
- Develop Functionality and Time tests for real
correlation -- based on Lab. 3 / 4 - Use on optimized C and your SISD and SIMD
correlation - Work out (theory) speed changes expected on your
SISD and SIMD if went to complex. Use as template
for expected changes in optimized C - Develop Functionality and Time tests for complex
FIR - Use on optimized C
- Develop Functionality and Time tests for complex
correlation - Use on optimized C and your SISD and SIMD
XCORRS only - Report on whether changes in C code speed work
the way you expect - Use these figures to scale for FIR and
correlation to complex data - Report on relative speeds
- C in default and optimized mode (especially
optimized) - Your optimized complex assembly code in complex
correlation in SID and SIMD modes - XCORRS in complex correlation in SID and SIMD
modes
4Mark assignment
- My tests and C are available on the web
- If you use my tests, then you must say so, and
10 of marks are deducted - If you use my C code, then you must say so, and
10 of marks are deducted - If you use my C code and my test, then you must
say so, and 20 of marks are deducted
5Speed comparison Part 1
- Real FIR
- float / int values , params
- Loop
- sum sum values params
- 2 memory fetches
- 1 add and 1 mult per loop cycle
- done in ½ cycle in theory
- Time N / 2 overhead
- Determine overhead by measuring with and without
the loop-sum
- Complex FIR
- CMPX float / int values , params
- Loop many common factors with FFT Hint for
final? - sum sum values params
- Real sum v.re p.re v.im p.im
- Imag sum v.re p.im v.im p.re
- 8 memory fetches
- 3 add / sub and 4 mult per loop
- Time ??? overhead
6Speed comparison Part 2
- Speed in theory without doing anything special
- Any special way to store complex values to speed
up memory access? - Do we need to do 8 memory fetches
- On the Blackfin?
- In the TigerSHARC?
- Expected optimal speed?
- Time ??? overhead
- Complex FIR
- CMPX float / int values , params
- Loop many common factors with FFT Hint for
final? - sum sum values params
- Real sum v.re p.re v.im p.im
- Imag sum v.re p.im v.im p.re
- 8 memory fetches
- 3 add / sub and 4 mult per loop
- Time ??? overhead
7Speed comparison Part 3?
- Do these speed calculations scale the same way
for complex correlation as for complex FIR? - Do a theory calculation and then compare result
for debug and optimized C code to validate
within 25 of predicted changes is probably more
than reasonable for a back-of-envelope
calculation - Use scaling factor on your real FIR and
correlation functions
8Tests for following functions neededWhen convert
from float to int?
- void ConvertReal2Complex(float , CMPX32 , int
size) - Make Complex Real j0
- bool ConvertC32_2_C8(CMPX32 , CMPX8 , int
size) - Take bottom 8 bits of complex 32
- Return false if overflows
- Complex 8 is padded 2 complex in to 32 bits
--- int in format - bool ConvertC32_2_C1(CMPX32 , CMPX1 , int
size) - Take bottom 1 bits of complex 32
- Return false if overflows, or if not -1
-j1 format - Complex 1 is padded 16 complex in to 32 bits
--- int in format - void ConvertC8_2_C32(CMPX8 , CMPX32 , int
size) needed? YES - um
- void ConvertC1_2_C32(CMPX1 , CMPX32 , int
size) needed?
9Tests for following functions needed
- float RealFIR(float vals, float params, int
size, bool overhead) - CMPLX ComplexFIR(CMPLX vals, CMPLX params, int
size,
bool
overhead)vals in dm and params in pm - void RealCorrs(float vals, int size1, float
params, int size2, float
result, int size3, bool overhead) - void ComplexCorrs(CMPLX vals, int size1, CMPLX
params, int size2, CMPLX result, int size3,
bool overhead) - void XCORRS(CMPLX vals, int size1, CMPLX params,
int size2, CMPLX result, int size3, bool
overhead, int version) - version is 0 works, 1 SISD, 2 SIMD
-
10Some hints
- void XCORRS(CMPLX vals, int size1, CMPLX params,
int size2, CMPLX result, int size3, bool
overhead, version) - bool ConvertC32_2_C8(CMPX32 , dm CMPX8 ,
int size1) - bool ConvertC32_2_C1(CMPX32 ,pm CMPX1 ,
int size2) - size3 size1 size2
- for result 1 to size 3
- result 0
- if (!overhead) XCORRS(dm CMPX8 , pm
CMPX1 , dm? Result, size1, size2, size 3,
whichversion
11Some Hints
- void ComplexCorrs(CMPLX vals, int size1, CMPLX
params, int size2, CMPLX result, int size3,
bool overhead) - if (overhead) return
- size3 size1 size 2
- for loop to size 3
- resultloop ComplexFIR(vals, CMPLX
params, int size, bool overhead) - val
- end loop
-
-
12Some decisions
- Complex 32 first decision
- Store real in dm space and imaginary in pm space?
- Complex8 in dm space, Complex1 in pm space
- Doing everything with static pm variables
- Using dm variables on stack, in an attempt to
avoid running out of memory - Try with satellite of size 2048 and PRN data of
size 1024 but suspect may not have enough room
when doing with Complex 32 so may have to test on
smaller for comparison - I ended up generating the same data as for
thexcorrs( ) shown last Friday size 48 16
3. Decided that if I could handle that (3 times
round xcorrs loop) then far enough test
13Some Tests developed 1
TEST(ConvertReal2CMPLX32, D_TEST)
TEST_LEVEL(1) define TEST_SIZE 8 float
valuesTEST_SIZE 1.0, 2.0, 3.0, 4.0, 5.0,
6.0, 7.0, 8.0 float zerosTEST_SIZE 0, 0,
0, 0, 0, 0, 0, 0 ConvertReal2Complex(values,
C32Real, C32Imag, TEST_SIZE) ARRAYS_EQUAL(values
, C32Real, TEST_SIZE) ARRAYS_EQUAL(zeros,
C32Imag, TEST_SIZE)
14Test for padded data C8 format
define TEST_SIZE 8 pm float imag1 TEST_SIZE
0x04, 0x14, -0x8, -0x18, 0x24, 0x34, 0x44,
0x54 float real1TEST_SIZE 0x08, 0x18, -1,
-2, 0x28, 0x38, 0x48, 0x58 TEST(ConvertToCMPLX
8, D_TEST) TEST_LEVEL(1) define TEST_SIZE
8 unsigned int result4 0x14180408,
0xE8FEF8FF, 0x34382428, 0x54584448 CHECK(!Conve
rtC32_2_C8(real1, imag1, DATAC8,
1)) CHECK(ConvertC32_2_C8(real1, imag1, DATAC8,
TEST_SIZE)) ARRAYS_EQUAL(DATAC8, result,
TEST_SIZE / 2)
15Test for padded data C1 format
define LONGER_SIZE 32 pm float
imag2LONGER_SIZE 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, .. float
real2LONGER_SIZE 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, .. pm float
imag4LONGER_SIZE float real4LONGER_SIZE TE
ST(ConvertCMPLX1, D_TEST) TEST_LEVEL(1) unsi
gned int result12 0x00000000,
0x00000000 unsigned int result22
0xFFFFFFFF, 0xFFFFFFFF CHECK(!ConvertC32_2_C1
(real1, imag1, PRNC1, 1)) CHECK(!ConvertC32_2_C1
(real1, imag1, PRNC1, TEST_SIZE)) CHECK(!Convert
C32_2_C1(real2, imag2, PRNC1, 1)) CHECK(ConvertC
32_2_C1(real2, imag2, PRNC1, LONGER_SIZE)) ARRAY
S_EQUAL(PRNC1, result1, LONGER_SIZE / 16) for
(int i 0 i lt LONGER_SIZE i) real4i
-1 real2i imag4i -1
imag2i CHECK(ConvertC32_2_C1(real4, imag4,
PRNC1, LONGER_SIZE)) ARRAYS_EQUAL(PRNC1,
result2, LONGER_SIZE / 16)
16RealFIR
define TEST_SIZE 8 pm float paramsTEST_SIZE
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
8.0 TEST(RealFIR, D_TEST) TEST_LEVEL(1) flo
at impulseTEST_SIZE float resultsTEST_SIZE
for (int i 0 i lt TEST_SIZE i) for
(int j 0 j lt TEST_SIZE j) // Set to
zero impulsej 0 impulsei
1 resultsi RealFIR(impulse, params,
TEST_SIZE, false) ARRAYS_EQUAL(results,
params, TEST_SIZE)
17Complex FIR tests (3 of them)To see if I got
both Real and Imag correct
pm float resultsITEST_SIZE TEST(ComplexFIR,
D_TEST) TEST_LEVEL(1) float
impulseTEST_SIZE float resultsRTEST_SIZE
float zerosTEST_SIZE 0, 0, 0, 0, 0, 0, 0,
0 for (int i 0 i lt TEST_SIZE i)
for (int j 0 j lt TEST_SIZE j) // Set to
zero impulsej 0 impulsei 1 for
(int j 0 j lt TEST_SIZE j) C32Realj
impulsej C32Imagj 0 C32Real1j
paramsj C32Imag1j 0 ComplexFI
R(C32Real, C32Imag, C32Real1, C32Imag1,
resultsRi,
resultsIi, TEST_SIZE, false) ARRAYS_EQUA
L(resultsR, params, TEST_SIZE) ARRAYS_EQUAL(resu
ltsI, zeros, TEST_SIZE)
18Real Correlation
pm float PRN32ITEST_SIZE 1, -1, 1, -1, 1,
0, 0, 0 TEST(RealCorrelation, D_TEST)
TEST_LEVEL(1) float dataTEST_SIZE 2
0, 0, 0, 0, 1, -1, 1, -1, 1,
0, 0, 0, 0, 0, 0, 0 float
resultTEST_SIZE int IresultTEST_SIZE int
size3 RealCorrs(data, 2 TEST_SIZE, PRN32I,
TEST_SIZE, result, size3, false) CHECK(size3
TEST_SIZE) for (int j 0 j lt TEST_SIZE
j) Iresultj resultj CHECK(MaximumLocat
ion(Iresult, TEST_SIZE) 4)
19Complex Correlation -- Simple Test
pm float dataITEST_SIZE 2 0, 0, 0, 0,
1.0, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0 pm
float resITEST_SIZE
TEST(ComplexCorrelation, D_TEST)
TEST_LEVEL(1) float dataRTEST_SIZE 2
0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0 float resRTEST_SIZE int
IresultTEST_SIZE float parRTEST_SIZE 0,
0, 0, 0, 0, 0, 0, 0 int size3
ComplexCorrs(dataR, dataI, TEST_SIZE 2,
parR, PRN32I, TEST_SIZE,
resR, resI, size3, false)
CHECK(size3 TEST_SIZE) for (int
j 0 j lt TEST_SIZE j) Iresultj
abs(resRj) CHECK(MaximumLocation(Iresult,
TEST_SIZE) 4)
20Complex Correlation related to results from last
lecture
for (int i 0 i lt 96 i 3)
satXCORRSRi -1 satXCORRSRi1 1
satXCORRSRi2 1 satXCORRSIi 0
satXCORRSIi1 0 satXCORRSIi2 0 for
(int i 0 i lt 48 i 3) prnXCORRSRi
-1 prnXCORRSRi1 1 prnXCORRSRi2 1
prnXCORRSIi -1 prnXCORRSIi1 1
prnXCORRSIi2 1 ComplexCorrs(satXCORRSR
, satXCORRSI, 96, prnXCORRSR, prnXCORRSI,
48, resXCORRSR,
resXCORRSI, size3, false)
CHECK(size3 48) for (int j 0 j lt 48
j) Iresultj abs(resXCORRSRj) for
(int j 1 j lt 45 j 3) CHECK(resXCORRSRj
-1 48) CHECK(resXCORRSRj
-16) CHECK(resXCORRSRj1
-16) CHECK(MaximumLocation(Iresult j, 48 -
j) 2)
21Complex Correlation ASM related to results from
last lecture
for (int i 0 i lt 96 i 3)
satXCORRSRi -1 satXCORRSRi1 1
satXCORRSRi2 1 satXCORRSIi 0
satXCORRSIi1 0 satXCORRSIi2 0 for
(int i 0 i lt 48 i 3) prnXCORRSRi
-1 prnXCORRSRi1 1 prnXCORRSRi2 1
prnXCORRSIi -1 prnXCORRSIi1 1
prnXCORRSIi2 1
ComplexCorrsASM(satXCORRSR, satXCORRSI, 96,
prnXCORRSR, prnXCORRSI,
48, resXCORRSR, resXCORRSI,
size3, false) CHECK(size3
48) for (int j 0 j lt 48 j) Iresultj
abs(resXCORRSRj) for (int j 1 j lt 45 j
3) CHECK(resXCORRSRj-1
48) CHECK(resXCORRSRj -16) CHECK(resXCO
RRSRj1 -16) CHECK(MaximumLocation(Iresult
j, 48 - j) 2)
22bool ConvertC32_2_C8(float inR, pm float inI,
unsigned int C8, int size) float holdR
inR pm float holdI inI for (int i
0 i lt size i) if ((inR gt 127)
(inR lt -128)) return false if ((inI gt 127)
(inI lt -128)) return false inR inI
// Not going to bother with things that
don't fit if (size 1) return false inR
holdR inI holdI for (int half 0
half lt size half 2) unsigned int first
( (int) inR) 0xFF unsigned int
second ( (int) inI) 0xFF unsigned
int third ( (int) inR) 0xFF
unsigned int fourth ( (int) inI) 0xFF
C8 ((((((fourth ltlt 8) third) ltlt 8)
second) ltlt 8) first) return
true
23C8 ? C32 and C16 ? C32
float UINT8ToFloat(unsigned int value) if
(value 0x80) value value
0xFFFFFF00 return ( (int) value) else
return value void ConvertC8_2_C32(unsigned
int C8, float inR, pm float inI, int size)
for (int i 0 i lt size i 2)
unsigned int value C8 inR
UINT8ToFloat(value 0xFF) value gtgt 8
inI UINT8ToFloat(value 0xFF)
value gtgt 8 inR UINT8ToFloat(value
0xFF) value gtgt 8 inI
UINT8ToFloat(value 0xFF)
24FIR filters
float RealFIR(float values, pm float params,
int size, bool overhead) if (overhead) return
0.0 float sum 0 for (int i 0 i lt size
i) sum values params return
sum pm float sumI 0 void ComplexFIR(float
valR, pm float valI, float parR, pm float
parI, float resultR, pm float resultI, int
size, bool overhead) if (overhead) resultR
resultI 0 return float sumR 0
sumI 0 // Was a static variable for (int i
0 i lt size i) sumR valR parR -
valI parI sumI valR parI valI
parR valR valI parR
parI resultR sumR resultI
sumI return
25Correlation
void RealCorrs(float vals, int size1, pm float
params, int size2, float result, int size3,
bool overhead) if (overhead) return size3
size1 - size2 for (int j 0 j lt size2
j) result RealFIR(vals, params, size2,
overhead) void ComplexCorrs(float valR, pm
float valI, int size1, float
parR, pm float parI, int size2,
float resR, pm float resI, int size3, bool
overhead) if (overhead)
return size3 size1 - size2 for
(int j 0 j lt size2 j) ComplexFIR(valR,
valI, parR, parI, resRj, resIj, size2,
false)
26Correlation XCORRS
extern "C" void xcorrsfunc(unsigned int C8, pm
unsigned int C1, unsigned int C16, int size)
void ComplexXCORRS(float valR, pm float valI,
int size1, float parR, pm
float parI, int size2, float
resR, pm float resI, int size3, bool overhead)
ConvertC32_2_C8(valR,
valI, DATAC8, size1) PRNC1 0x0 // Need to
shift hte PPRN to location C15 ConvertC32_2_C1(pa
rR, parI, PRNC1 1, size2) size3 size1 -
size2 if (!overhead) xcorrsfunc(DATAC8, PRNC1,
RESULTC16, size3) ConvertC16_2_C32(RESULTC16,
resR, resI, size3)
27XCORRS same code as beforeexcept need to
transfer results out
// Shift out the values in TR registers into
results xR30 TR30 QJ6 4
xR30 xR30 TR74 QJ6 4
xR30 xR30 TR118 QJ6 4
xR30 xR30 TR1512 QJ6 4
xR30 IF NLC0E, JUMP OUTERLOOP
28Need to get inpars and go round more than 16
times
J0 zeros // Clear the THR registers the hard
way R30 QJ0 4 THR30 R30 R74
R30 // K0 prn J2 J4
// satellite_data LC0 3 OUTERLOOP K0
J5 J2 J4 J4 J4 8 // Increment by 8
and not 16 REST OF CODE UNCHANGED // Load THR
with PRN code R10 LK0 2 THR10
R10 R10 LK0 2 THR32 R10
29Test results