Generating a software loop with memory accesses - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Generating a software loop with memory accesses

Description:

Generating a software loop with memory accesses TigerSHARC assembly syntax – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 39
Provided by: Micha1192
Category:

less

Transcript and Presenter's Notes

Title: Generating a software loop with memory accesses


1
Generating a software loop with memory accesses
  • TigerSHARC assembly syntax

2
Concepts
  • Learning just enough TigerSHARC assembly code to
    make a software loop work
  • Comparing the timings for rectification of
    integer and floating point arrays, using
  • debug C code,
  • Release C code
  • Our FIRST_ASM code
  • Looking in MIXED mode at the code generated by
    the compiler

3
Test Driven Development
  • Work with customer to check that the tests
    properly express what the customer wants done.
    Iterative process with customer heavily
    involved Agile methodology.

CUSTOMER DEVELOPER
4
Note Special marker
Compiler optimization FLOATS 927 ? 304 -- THREE
FOLD INTS 960 ? 150 SIX FOLD Why the
difference, and can we do better, and do we want
to? Note the failures what are they
5
Write tests about passing values back from an
assembly code routine
6
More detailed look at the code
As with 68K and Blackfin needs a .section But
name and format different
As with 68K need .align statement Is the 4 in
bytes (8 bits)or words (32 bits)
As with 68K need .globalto tell other code that
this function exists
Single semi-colons Double semi-colons
Start function label End function label Used
for profiling code
Label format similar to 68K Needs leading
underscore and final colon
7
Return registers
  • There are many, depending on what you need to
    return
  • Here we need to use J8 as the return register to
    pass back integer pointer
  • Many registers available need ability to
    control usage
  • J0 to J31 registers (integers and pointers)
    (SISD mode)
  • XR0 to XR31 registers (integers) (SISD mode)
  • XFR0 to XFR31 registers (floats) (SISD mode)
  • Did I also mention
  • I0 to I31 registers (integers and pointers)
    (SISD mode)
  • YR0 to YR31 , YFR0 to YFR31 (SIMD mode)
  • XYR, YXR and R registers (SIMD mode)
  • And also the MIMD modes
  • And the double registers and the quad registers
    .
  • define return_pt_J8 J8 // J8 is a
    VOLATILE, NON-PRESERVED register

8
Parameter passing
  • SPACES for first four parameters ARE ALWAYS
    present on the stack (as with 68K)
  • But the first four parameters are passed in
    registers (J4, J5, J6 and J7 most of the time)
    (as with MIPS and Blackfin)
  • The parameters passed in registers are often
    stored into the spaces on the stack (like the
    MIPS) as the first step when assembly code
    functions call assembly code functions
  • J4, J5, J6 and J7 are volatile, non-preserved
    registers

9
Can we pass back the start of the final array
Still passing tests byaccident and this needs
to be conditional returnvalue
10
What we need to know based on experiences from
other processors
  • Can we return from an assembly language routine
    without crashing the processor?
  • Return a parameter from assembly language routine
  • (Is it same for ints and floats?)
  • Pass parameters into assembly language
  • (Is it same for ints and floats?)
  • Do IF THEN ELSE statements
  • Read and write values to memory
  • Read and write values in a loop
  • Do some mathematics on the values fetched from
    memory
  • All this stuff is demonstrated by coding
    HalfWaveRectifyASM( )

11
Why is ELSE a keyword
  • FOUR PART ELSE INSTRUCTION IS LEGAL
  • IF JLT ELSE, J1 J2 J3 // Conditional
    execution if true ELSE, XR1 XR2
    XR3 // Conditional if true YFR1
    YFR2 YFR3 // Unconditional -- always
  • IF JLT DO, J1 J2 J3 // Conditional
    execution -- if true DO, XR1 XR2 XR3
    // Conditional -- if true YFR1 YFR2
    YFR3 // Unconditional -- always
  • Having this sort of format means that the
    instruction pipeline is not disrupted when we do
    IF statements

12
Label name is not the problem
NOTE This is C-like syntax, But it is not
C Statement must end in Not ONE
semicolon end of instructionTWO
semicolons end of parallel
instruction line
13
Add dual-semicolons everywhereWorry about
multiple issues later
This dual semi-colon Is so important that
you MUST code review for it all the time or else
you waste so much time in the Lab. Key in exams /
quizzes
At last an error I know how to fix ?
14
Well I thought I understood it !!!
  • Speed issue JUMP instructions cant be too
    close together when stored in memory
  • Not normally a problem when if code is larger

15
Add a single instruction of 4 NOPsnop nop nop
nop TEMPORARY
  • Fix the last error as part of Assignment 1

Fix the remaining error In handling the IF THEN
ELSE as part of assignment 1 Worry about code
efficiency later (refactor) when all code working
16
What we need to know based on experiences from
other processors
  • Can we return from an assembly language routine
    without crashing the processor?
  • Return a parameter from assembly language routine
  • (Is it same for ints and floats?)
  • Pass parameters into assembly language
  • (Is it same for ints and floats?)
  • Do IF THEN ELSE statements
  • Read and write values to memory
  • Read and write values in a loop
  • Do some mathematics on the values fetched from
    memory
  • All this stuff is demonstrated by coding
    HalfWaveRectifyASM( )

17
Target. Changing this C code into assembly (to
get more speed)
  • Code we generated yesterday was similar to parts
    of this, but not equivalent.
  • Re-factor the code to make the assembly code and
    C functionality equivalent

18
The code was not exactly what we designed (C
equivalent) re-factor and retest after the
re-factoring
NEXT STEP
19
Refactored C code
I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT
OF THE IF-THEN-ELSE TO OPTIMIZE THIS
PARTICULAR CODE BIT USE IF TRUE EXECUTE THIS
STATEMENT SINGLE LINE Avoiding JUMPS
in the mainflow of the code will speedthe flow
of the code Almost right. SYNTAX ERROR Look in
the manual to findthe correct syntax IF NJLE
DO, J8 0
20
No syntax errors (No CODE ERRORS). Code does not
work (CODE DEFECTS)
We dont haveenough code topass all the
testsbut we are failingtests we did notexpect
to fail
21
Run forensic tests to find out where DEFECT is
being introduced
Identify mistake byremoving codesections Witho
ut the IF
22
Add another line to the codeCan now spot the
error
New format of IF-THEN-ELSE Is doing exactly the
opposite of what we want IF NOT TRUE return
NULL (0) Need JLE not NJLE
23
Assignment 1 code the following as a software
loop follow MIPS / Blackfin approach
  • DONE DURING TUTOTIAL
  • int CalculateSum(void)
  • int sum 0
  • for (int count 0 count lt 6 count)
  • sum sum count
  • return sum

24
Reminder software for-loopbecomes while loop
with initial test
  • int CalculateSum(void)
  • int sum 0
  • int count 0
  • while (count lt 6)
  • sum sum count
  • count
  • return sum
  • Do line by line translation intoassembly code

25
USE SOFTWARE LOOP HEREDo loop control first
  • Have some jumps too close together

NOTEJGE is ILLEGALUSE NJLT Customize?define
JGE NJLT
26
Run the tests with 4 nop padding to check that
get out of loop as expected
Adding 4 nops-- lose 1 cyclegain an hour not
trying tosolve the problem If need the 1
cyclerefactor the code later
27
Accessing memory
  • Basic mode
  • Special register J31 acts as zero when used in
    additions
  • Pt_J5 is a pointer register into an array
  • Value_J1 is being used as a data register
  • J registers like MIPS registers (used as pointer
    and data).NOT like 68K or Blackfin registers
    those can be used as either data or address
    registers but not both
  • NOTE Later we will find that using TigerSHARC
    registers for data operations is a BAD idea
  • Value_J1 Pt_J5 read value from memory
    location pointed to by J5 -- Compare to
    Blackfin Value_R0 Pt_P0
  • Value_J1 Pt_J5 J31 read value from
    memory location pointed to by J5 but read
    somewhere that this CAN be faster than just
    Value_J1 Pt_J5 -- NEED TO CONFIRM

28
Accessing memory step 2
  • Basic mode
  • Pt_J5 is a pointer register into an array
  • Offset_J4 is used as an offset
  • Value_J1 is being used as a data register to
    receive the memory value load / store
    architecture
  • Read_J1 Pt_J5 Offset_J4 read value from
    memory location pointed to by (J5 J4)
  • PRE-MODIFY address used J5 J4, no change in
    J5
  • Read_J1 Pt_J5 Offset_J4 read value from
    memory location pointed to by J5, and then
    perform add operation on the J5 register (points
    to NEXT location)
  • POST-MODIFY address used J5, then perform J5
    J5 J4

29
Add in the memory accessesFORGET TigerSHARC
RISC PROCESSOR
LOAD/STORE ONLYLike MIPS and Blackfin Must place
value intoregister, and then copyregister to
memory NO J5 J0 0 NO J3 0J5 J0
J3 Uses wrong J3 Remember TigerSHARCcan
handle parallel instructions YESJ3 0J5
J0 J3
30
Understand the error messageToo many J resource
usage missing
Unintentionally doing theparallel instruction
line J5 J0 J2 J0 J0 1
31
Note Missing label is not an assembler error,
its a linker error
Fix warningsDEFECTmay be days before try to
linkthen hard to find
32
NOW the assembler know where CONTINUE is, then
it can tell you that you have two JUMP
instructions too close together
  • Fix with magic 4 nops and lose one cycle / loop

33
Not getting expected Test resultsSomething is
logically wrong (DEFECT)
34
Obvious question are we even getting into the
loop. Add BREAKPOINT to TEST code flow.(We dont
add BREAKPOINTS to code follow in detail)
CODE NEVER GOT TOBREAKPOINT meanscode never
entered loop Forgot to do count 0 So not even
getting into loop as there isa garbage value
already inCount_J0 fromcode we
executedearlier -- DEFECT
35
Not bad for a first effortFaster than compiler
in debug mode
36
Where did the float ASM code suddenly appear from?
  • Integer 0 has bit pattern 0x0000 0000
  • Float 0.0 has bit pattern 0x0000 0000
  • Integer 6 has format b 0??? ???? ???? ????
    ???? ???? ???? ????
  • Float 6.0 has format b 0??? ???? ???? ????
    ???? ???? ???? ????
  • Integer -6 has format b 1??? ???? ???? ????
    ???? ???? ???? ????
  • Float -6.0 has format b 1??? ???? ???? ????
    ???? ???? ???? ????
  • Formats are very different, but the sign bit is
    in the same place
  • Float algorithm - if S 1 (negative) set to
    zero
  • Otherwise leave unchanged same as integer
    algorithm
  • Just re-use integer algorithm with a change of
    name

EXPONENT
37
Final code Float rectify code just has a
different name
38
What we NOW KNOW
  • Can we return from an assembly language routine
    without crashing the processor?
  • Return a parameter from assembly language routine
  • (Is it same for ints and floats?)
  • Pass parameters into assembly language
  • (Is it same for ints and floats?)
  • Do IF THEN ELSE statements
  • Read and write values to memory
  • Read and write values in a loop
  • Do some mathematics on the values fetched from
    memory
  • All this stuff is demonstrated by coding
    HalfWaveRectifyASM( )
Write a Comment
User Comments (0)
About PowerShow.com