DSP Lecture 2332009 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

DSP Lecture 2332009

Description:

Stack and heap. Simple optimization rules. Cache. Some advices. ... Allocated in heap. The heap size is also set in 'build options'. Also no warning! ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 32
Provided by: s3K
Category:
Tags: dsp | heap | lecture

less

Transcript and Presenter's Notes

Title: DSP Lecture 2332009


1
DSP Lecture 23/3-2009
  • Per Zetterberg

2
Next DSP lecture moved
  • New date April 6.
  • Time 15.15

3
Agenda
  • Starting CCS
  • Comparing matlab and DSP results.
  • Profiling when comparing matlab and DSP results.
  • Matlablt-gtDSP communication.
  • EDMA
  • EDMA_RTDX_GPIO, QUAD_DAC_ADC.
  • _empty
  • State-machine using case statement.
  • Data formats.
  • Overlap and add.
  • Stack and heap.
  • Simple optimization rules.
  • Cache
  • Some advices.

4
Starting CCS
  • CCStudio v3.3 is the code development
    environment.
  • Use Setup CCStudion v3.3 when you need to change
    between targets.
  • C6713 DSK-USB
  • C6713 Device Cycle Accurate Simulator (little
    endian)
  • C6416 Device Cycle Accurate Simulator (little
    endian)
  • Connnect to matlab
  • ccccsdsp
  • cc.visible(0), cc.run, cc.isrunning.

The hardware
When doing tutorial
5
Comparing matlab and DSP result
  • Principle to test isolated functions e.g. a
    decoder
  • Generate input in matlab.
  • Write input to the DSP.
  • Call DSP version of function.
  • Read output from the DSP.
  • Call matlab version of function.
  • Compare results.
  • Lets have a look at the compare_with_matlab_31
    skeleton!

6
Test important functions by
  • Copy the entire compare_with_matlab_31.pjt
    project.
  • Replace FuncionToBeTested with your code
  • In the C-code.
  • In the matlab code.
  • Define input and output dataparameters as
    relevant for your function.
  • Change the matlab code to generate relevant input
    data.
  • Sometimes called test harness in industry.

7
Matlab lt-gt DSP communication 1(2)
  • Sending data between matlab and DSP when the DSP
    is not running
  • Input_objcreateobj(cc,Input) Input is a
    global in the DSP code.
  • write(Input_obj,Input) write data
  • Inputread(Input_obj) read data

.
matlab code
8
DSP -gt PC communication 2(3)
When the DSP is running (RTDX) On the DSP
side RTDX_write(ctrl_chan_dsp2pc,
data_to_matlab, sizeof(float)NO_FLOATS_TO_MATLAB
) On the matlab side data_from_DSPreadmsg(cc.r
tdx,'ctrl_chan_dsp2pc', 'single') Recommendation
Re-use code in the _empty skeletons.
9
Matlab lt-gt DSP communication 3(3)
  • The PClt-gtDSP interface is slow ?
  • Allowed cheating (if necessary)
  • Pre-read data into memory before real-time
    processing.
  • Read result from memory, after real-time
    processing.
  • Large memory areas available in external memory
  • pragma DATA_SECTION(Data,".external_mem") // On
    DSP
  • short Data1000 // On DSP
  • write(cc,h_Data.address(1), int16(Data)) In
    matlab
  • The data is not cleared when the program is
    reloaded.

10
Enhanced Direct Memory Access (EDMA)
Leaves DSP free from moving data back and forth
to ADC/DAC!
11
EDMA PaRAM
12
Ping-Pong Buffering
hEdmaReloadXmtPing
hEdmaReloadXmtPong



SRCgBufferXmtPing
SRCgBufferXmtPong
DSTDXR
DSTDXR
LINK hEdmaReloadXmtPing
LINK hEdmaReloadXmtPong
Let me show you EDMA_RTDX_GPIO_empty and
QUAD_DAC_ADC_empty!
13
Skeleton programs handling EDMARTDX
  • Single-antenna
  • EDMA_RTDX_GPIO_31_empty
  • EDMA_RTDX_GPIO_31.
  • Dual-antenna
  • QUAD_ADC_DAC_31_empty
  • QUAD_ADC_DAC_31.

Code development
Matlab prototype
Code development
Matlab prototype
14
EDMA_RTDX_GPIO
  • Lets go through EDMA_RTDX_GPIO_31_empty
  • Then go through EDMA_RTDX_GPIO_31
  • This is the DSPlt-gtmatlab interface to be used in
    the matlab prototype!!
  • Note Documentation in main.c!

15
State Machine using Case Statement in
appl_Process
16
Data formats
  • C-types char8bits, short16bits, int32bits,
    float 32bits.
  • Integers are signed or unsigned.
  • Float. Sign1bit, exponent8bits, fraction 23
    bits.
  • In C, conversion is automatic (when pointers are
    not involved).
  • However, note the range ..

17
The buffers in EDMA_RTDX_GPIO
  • appl_Process(short receive_buffer,short
    transmit_buffer)
  • The buffers consists of BUFFSIZE shorts (range
    -215,215-1).
  • BUFFSIZE is defined in EDMA_RTDX_GPIO.h to be
    256.
  • The number of bytes is 2BUFFSIZE512.
  • In EDMA_RTDX_GPIO there are 2 channels (i.e. ADC
    and DAC converters) which are interleaved.
  • Thus the number of 2-dimensional vector samples
    is BUFFSIZE/2128.
  • In QUAD_ADC_DAC the are 4 channels which are
    interleaved.
  • Thus the number of 4-dimensional vector samples
    BUFFSIZE/464.
  • BUFFSIZE can be changed.

18
Overlap and add
  • Say we want to do implement a FIR filter.
  • The input buffer is 128 samples.
  • The filter is 10 samples.
  • The filtered signal is 12810-1137 samples.
  • But the output filter is 128 samples .
  • Solution overlap and add.
  • Variant 1 Save the last 9 samples. Add them to
    the next buffer.
  • Variant 2 Overlap-and-add. See next slide.

19
Overlap and Add With additional buffer
Move 1289 samples
128 samples
128 samples
9
Zero these samples
Add the new signal
Good if transmit signal is 128 samples and
unsynchronized!
20
Stack and Heap
  • float myfunction(short buffer)
  • float internal_buffer1000

This data is stored in the stack. At least 4000
bytes needed.
The stack size is set in build options. No
warning is given by the compiler of the stack
size is to small!!!
Allocated in heap
float internal_buffer internal_buffer (float
) malloc(1000sizeof(float))
The heap size is also set in build options.
Also no warning!!!
21
Code Optimization
  • Let me show you optimization_example .

22
Simple Optimization Rules 1(2)
  • Turn optimization on. Flags -o3, program mode
    compilation pm and -op3 if possible.
  • Turn debug off i.e do not use -g.
  • Avoid function calls inside loops!
  • Use of division / is a function call!, use
    _rcpsp instead. Other intrinsics see table 8-6 in
    spru187n.
  • Avoid math-functions such as sin(x) use look-up
    tables instead.
  • Check that all important loops are pipelined by
    searching for "SOFTWARE PIPELINE INFORMATION in
    generated .asm files.

23
Simple Optimization Rules 2(2)
  • Allocate all time-critical code and data in
    internal memory (in our skeletons this is default
    allocating to external memory requires pragma
    statement).
  • Use the touch function in an initialization
    routine to have the most important data structure
    cached in internal memory. (This function can be
    copied from the cache_miss_example skeleton)
  • float ImportantData100
  • .
  • touch(ImportantData,100)

24
TMS320C6713 cache
25
One-way cache (L1P)
Mem 0x-0x1F
Line 0
Mem 0x20-0x3F
Line 1
Mem 0x0FE0-0x0FFF
Line 127
Mem 0x1000-0x101F
Mem 0x1020-0x103F
Cache
SDRAM
Mem 0x1FE0-0x1FFF
26
Two-way cache (L1D)
Mem 0x-0x1F
Line 0A
Mem 0x20-0x3F
Line 1A
Mem 0x7E0-0x7FF
Line 63A
Mem 0x800-0x81F
Mem 0x820-0x83F
Mem 0x0FE0-0x0FFF
27
L1D cache
L1D address allocation
  • A new line of 32bytes is loaded on a read-miss
    with a penalty 4 clock-cycles.
  • If two words are loaded per clock-cycle (reading
    sequentially from a memory segment) the overhead
    is 8/3241clock-cykle per instruction cycle.
  • A write-miss doesnt lead to a loading of a
    new-line. A write buffer of four words handle up
    to four misses without penalty.

28
cache_miss_example
  • main.c Illustrates impact of L1D write and read
    misses (compulsory misses).
  • main2.c Illustrates the problem with several
    data objects in the same set (thrashing)
  • Two data objects are in the same set if
  • Aa K2048 Ab,
  • for some address Aa and Ab in Object A or B
    respectively, and for some K.
  • Two code objects are in the same set if
  • Aa K4096 Ab,
  • for some address Aa and Ab in Object A or B
    respectively, and for some K.

29
What to consider when programming to make good
use of the cache
  • Align all data buffers on 32byte boundaries.
    (pragma DATA_ALIGN).
  • Avoid to allocate more than two objects that map
    to the same set in the same algorithm.
  • Avoid having two or more computationally complex
    algorithms that map to the same set.
  • Profile the algorithms with and without cached
    data and program (see cache_miss_example).
  • Force caching of important data and code before
    starting the realtime program starts (e.g in
    appl_Init()) by reading the data (touch) and
    calling the functions.
  • Test processing data in smaller buffers to see if
    performance improves.

30
Some advices 1(2)
  • Start with a skeleton.
  • Only insert functions which have been checked
    against matlab.
  • Make one change at a time gt much easier to find
    out what went wrong.
  • Save before and after code.
  • Dont use printf.

31
Some advices 2(2)
  • Check that all pointers are initialized.
  • If a variable are corrupted, check .map file to
    se how it could be over-written.
  • Use extern declaration both in the file where
    variable is declared and where it is used.
  • In real-time debugging. Store results to
    debug-globals.
  • When using sqrt, log, log10 use include
    ltmath.hgt.
Write a Comment
User Comments (0)
About PowerShow.com