Software Optimization Of MPEG Audio LayerIII For A 32Bit RISC Processor Wonchul Lee, Kisun You and W

About This Presentation

Title:

Software Optimization Of MPEG Audio LayerIII For A 32Bit RISC Processor Wonchul Lee, Kisun You and W

Description:

Miss penalty between the main processor and the external SDRAM. SDRAM model ... SDRAM clock freq. CPU clock freq. The # of clock cycles according to different ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 19

Provided by: MSL87

Category:

more less

Transcript and Presenter's Notes

Title: Software Optimization Of MPEG Audio LayerIII For A 32Bit RISC Processor Wonchul Lee, Kisun You and W

1
Software Optimization Of MPEG Audio Layer-III For
A 32Bit RISC Processor Wonchul Lee, Kisun You
and Wonyong Sung

School of Electrical Engineering
Seoul National University
Wonyong Sung
Dec. 17th, 2002

2
Contents

Introduction
Architecture features
IMDCT and Subband synthesis optimization
Assembly language optimization
Using block data transfer
ARM7/ARM9 based implementation
Results
Conclusion

3
Introduction(1)

MPEG1/2 Layer-III(MP3)
Compressed audio standard
Large computation DSP algorithm
Custom VLSI-based implementation
High speed, low power
Constraint of flexibility
Software implementation
Format and application flexibility
Can apply to multi-standard portable players
Not only MP3, but also AC3, AAC and WMA

4
Introduction(2)

Implementation for an ARM RISC Processor
RISC processor has disadvantages to implement of
DSP algorithms
Need software optimization methods using
architecture features
Reducing the of cycles and the of memory
operands
Converting floating-point version to fixed-point
version
Automatic scaling method, AUTOSCALER (Kum, Sung)

5
Architecture

ARM architecture
No floating point unit
328 bits multiplier accuracy

6
Architecture Features

328 bits multiplier accuracy
Not good for executing multiplication intensive
DSP programs
Complement the demerits by using ARM features
ARM architecture features
Conditional execution
Reduce the control overhead in MP3 decoders
significantly
32-bits barrel shifter
Simultaneously execute shift and rotation with
ALU operations
Scaling, Multiplication by 2
Multiple load/store instruction
Reduce total memory access time
Software optimization methods
Loop unrolling, loop termination, circular
addressing, arranging data

7
MP3 decoding algorithms

Processing intensive
IMDCT, Subband synthesis
Control intensive
Dequantization, Huffman decoding

Major processing parts about 84 ? optimization
8
IMDCT and Subband Synthesis Optimization

Employed Britanak and Rao IMDCT algorithm
Small number of multiplications algorithm
Good for ARM CPU since it doesnt have a full
precision multiplier
MPEG1/2 Audio standard
N36 for long blocks, N12 for short blocks
lt Iso reference vs. Britanak and Raos
algorithm gt

9
Assembly Language Optimization

Using block transfer instructions, LDM, STM
Rarely found at the compiler generated code!!
Accessing the internal memory and cache
N, S, I 1 cycle
Accessing for external DRAM access
N gtgt 1 cycle

S Sequential cycle N Non-sequential cycle I
Internal cycle
14S 2N cycles (Store) 15S 1N 1I cycles
(Load)
2N 15 cycles (Store) (1S 1N 1l) 15 cycles
(Load)
10
Instructions vs. Clock cycles

Optimization?
Using block transfer instructions, LDM, STM
IMDCT
Instructions 28 decreased
Clock cycles 21 decreased
Subband
Instructions 34 decreased
Clock cycles 35 decreased

See Instruction types
11
Instruction types

Subband part is more efficient using block
transfer instructions

Load/Store reduction !!
Reducing Memory
access operand !!

12
of Memory Access

To know access of external memory, cache
performance, power consumption, etc
ARM7 architecture based implementation
Unified Cache
ARM9 architecture based implementation
Separated Cache
Cache simulator
DineroIV

13
ARM7 Architecture-based Implementation

ARM7
8KByte Unified Cache
Performance degradation due to cache miss can be
significant
No support write allocation
Improve the spatial locality of data which
reduces the miss ratio with block transfer
instructions
gt reduce the of accesses to/from the external
DRAM

1.7
13.5
14
ARM9 Architecture-based Implementation

ARM9
Separate 16KB Instruction and 16KB Data cache
Support write allocation

ARM7(Unified 8KB) Instruction cache miss
13.5 Data cache miss 1.7
15
Miss penalty

Miss penalty between the main processor and the
external SDRAM
SDRAM model
8 clock cycles of latency for the first word read
5 clock cycles for the first word write
1 clock cycles for successive memory read and
write
Assume
SDRAM clock freq. CPU clock freq.

72
23
The of clock cycles according to different
SDRAM bandwidth(16/32bit)
The of clock cycles according to different
cache size(ARM9)
16
Performance

Tested using the ISO reference standard, and
proved by using 10 popular pop-songs
With ARM7TDMI_at_60MHz
Mono, stereo, joint stereo
Sampling freq. 44.1kHz, 22.05kHz
Bit rate 32kbps 192kbps
Average of 94.24dB SNR
Average of 16.5 MIPS

17
Future Work

We need more accurate modeling methods between
DRAM and CPU which includes internal cache
memory.
With some appeared memory optimization
techniques, reducing R/W memory of MP3 decoder
and data cache miss ratio which is much more than
instruction cache miss ratio
With assembly language optimization, it is
difficult to reuse the other systems.
Need higher level algorithm optimization
With compiler?
With algorithm?
With transformation program?

18
Conclusion

Implementation of MPEG1/2 Layer-III decoding
algorithm using ARM7 and ARM9 based systems
By modifying codes which increase the locality of
memory reference, then applying block transfer
instructions, reduced 26 instruction demand, 8
data demand.
The overhead of date-transfer should be
considered very seriously for real-time and
low-power implementation

Write a Comment

User Comments (0)

About PowerShow.com

Software Optimization Of MPEG Audio LayerIII For A 32Bit RISC Processor Wonchul Lee, Kisun You and W - PowerPoint PPT Presentation

Software Optimization Of MPEG Audio LayerIII For A 32Bit RISC Processor Wonchul Lee, Kisun You and W

Miss penalty between the main processor and the external SDRAM. SDRAM model ... SDRAM clock freq. CPU clock freq. The # of clock cycles according to different ... – PowerPoint PPT presentation