Architecture Tuning in Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Architecture Tuning in Embedded Systems

Description:

(4) Fabricate a SOC IC (5) Insert the IC into an embedded system ... system architecture to the particular application program, before IC fabrication ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 28
Provided by: Scienc52
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Architecture Tuning in Embedded Systems


1
Architecture Tuning in Embedded Systems
  • Greg Stitt, Frank Vahid, Tony Givargis
  • Dept. of Computer Science Engineering
  • University of California, Riverside

Roman Lysecky Department of IP Management
Conexant Newport Beach
This work was supported by the National Science
Foundation under grants CCR-9811164 and
CCR-9876006, and by a Design Automation
Conference graduate scholarship.
This work is being presented at CASES00
(Compilers, Architectures and Synthesis for
Embedded Systems), November 18-19, 2000, San
Jose, CA.
2
A short list of embedded systems
Anti-lock brakes Auto-focus cameras Automatic
teller machines Automatic toll systems Automatic
transmission Avionic systems Battery
chargers Camcorders Cell phones Cell-phone base
stations Cordless phones Cruise control Curbside
check-in systems Digital cameras Disk
drives Electronic card readers Electronic
instruments Electronic toys/games Factory
control Fax machines Fingerprint identifiers Home
security systems Life-support systems Medical
testing systems
Modems MPEG decoders Network cards Network
switches/routers On-board navigation Pagers Photoc
opiers Point-of-sale systems Portable video
games Printers Satellite phones Scanners Smart
ovens/dishwashers Speech recognizers Stereo
systems Teleconferencing systems Televisions Tempe
rature controllers Theft tracking systems TV
set-top boxes VCRs, DVD players Video game
consoles Video phones Washers and dryers
  • And the list goes on and on

3
Introduction Traditional micro-processor use in
embedded systems
  • Tasks (not necessarily in the given order)
  • (1) Buy a microprocessor IC (integrated circuit)
  • (2) Integrate it with other ICs onto a board and
    insert it into an embedded system
  • (3) Download a software program

Software
Processor
Board
1
2
3
  • Notice that the processor IC is designed
    independent of the software
  • Different microprocessor variations thus exist,
    like low-power or high-performance ICs

4
Introduction Modern core-based approach
  • Tasks
  • (1) Buy a microprocessor CORE
  • Hard layout Firm structural HDL Soft
    synthesizable HDL
  • You are buying Intellectual Property, like a file
    that may come on a floppy, CD-ROM, over the web,
    etc. You are NOT buying hardware.
  • (2) Design a system-on-a-chip (SOC) from this and
    other cores
  • (3) Fabricate a SOC IC
  • (4) Insert the IC into an embedded system
  • (5) Download a software program

Software
Processor
Processor
HDL
HDL
1
4
5
2
3
5
Introduction embedded system unique feature of
fixed program
  • SOCs implementing an embedded system have a
    unique feature
  • Implements a particular application
  • Thus, the processor may execute a single fixed
    program that never changes
  • Unlike desktop systems, which execute a variety
    of programs
  • Examples digital camera, automobile
    cruise-controller
  • We can exploit this fixed-program feature
  • For example, by using mask-programmed ROM
  • But much more can be done

The software in here never changes after
production
6
Introduction Proposed core-based approach with
architecture tuning
  • Tasks
  • (1) Buy a microprocessor core
  • (2) Design a system-on-a-chip (SOC) from this and
    other cores
  • (3) TUNE the SOC architecture to a software
    program
  • (4) Fabricate a SOC IC
  • (5) Insert the IC into an embedded system
  • (6) Download the software program

Software
1
Processor
Processor
Processor
HDL
HDL
HDL
4
5
2
3
6
7
Introduction architecture tuning
Fixed program
  • Architecture tuning
  • A way to exploit the fixed-program feature of
    embedded systems
  • First, do architecture design for the particular
    application
  • Then, tune the core-based system architecture
    to the particular application program, before IC
    fabrication
  • Goals better performance, power, size

Architecture design
Peripheral
Prog.
Processor
Architecture tuning
HDL
Prog.
Peripheral
Processor
Fabrication
HDL
Prog.
Peripheral
Tuned cores
Processor
IC
8
Introduction architecture tuning
  • Examples of tuning optimizations
  • Memory hierarchy no cache, L1 cache, L1L2 cache
  • Cache organization size, associativity, write
    policies
  • Bus structure, data/address encoding
  • DMA block sizes
  • Microprocessor optimizations
  • Internal small-loop table
  • Controller partitioning
  • Datapath shortcuts
  • Register file copies

9
Introduction Tuning is a special case of Y-Chart
iteration
  • Philips/TriMedia approach of simultaneously
    developing architecture and its applications

Architecture
Applications
Mapping
Analysis
Numbers
10
Problem description
  • Focus of this work
  • Tuning a microcontroller to its program
  • Goal is reduced power without performance loss
  • Restrict tuning to maintain exact instruction set
    compatibility
  • No instructions may be added or deleted
  • Thus, no modification to software development
    environment
  • Also, no problems with porting software to/from
    other versions of the microcontroller
  • Instruction set incompatibility can be a show
    stopper
  • Maintenance/upgrades/re-porting of binaries over
    the lifetime of product and for product
    variations is a key issue
  • Likewise, a stable software development
    environment is needed

11
Previous work
  • Application-specific instruction-set processors
    Fisher99
  • Customize a microprocessor to its application(s)
  • Delete unnecessary instructions, add new ones
    along with accompanying datapath extensions
  • e.g., Tensilica
  • Customized instruction-set requires customized
    development tools (e.g., compiler, debugger)
  • Tuning compiler to architecture Tiwari et al 94
  • Architectural description languages to inform
    compiler of architecture features Halambi et al
    99
  • Tuning cache and cache/bus Givargis et al 99
    organization to application

12
Tuning environment
  • Currently for the 8051 microcontroller
  • Starts from VHDL synthesizable model of 8051
    (soft core)
  • Uses Synopsys synthesis, simulation and power
    analysis
  • Uses 8051 instruction-set simulator
  • Uses numerous scripts
  • Goal of the enviroment
  • Understand how power is being consumed for a
    particular application, so that modifications to
    the architecture (or application) can be made to
    minimize that power
  • Three main tools
  • Architectural view
  • Instruction-set view
  • Program/data memory view

13
Tuning environment architectural view tool
14
Tuning environment instruction-set view tool
Instruction Power (mW) ADDC_1 7.340834 ADD_1 7.350
741 ANL_1 6.631394 CLR_1 3.76228 CPL_1 5.481627 DA
5.28897 DEC_1 5.368807 DIV 7.716592 INC_1 4.66286
2 MOVC_1 6.078014 MOVC_2 5.021021 MOV_1 5.577664 M
OV_2 6.164267 MUL 5.522886 NOP 4.900275 ORL_1 6.95
4121 POP 8.103867 PUSH 8.7116
15
Tuning environment program/data memory view tool
Addr Ins Freq Pwr FreqPwr 00000 LJMP 1 0 0 0000
3 MOV_9 108 5.46067 589.752 00005 MOV_9 108 5.460
67 589.752 00007 MOV_9 108 5.46067 589.752 00009
MOV_9 108 5.46067 589.752 00011 RET 108 0 0 000
12 MOV_9 27 5.46067 147.438 00014 MOV_9 27 5.4606
7 147.438 00016 MOV_9 27 5.46067 147.438 00018 M
OV_9 27 5.46067 147.438 00020 MOV_4 27 4.83507 13
0.547 00022 LCALL 27 0 0
Addr Purpose Accesses 00128 P0
1311 00129 SP 70317 00130 DPL
31189 00131 DPH
7977 00144 P1 161 00208 PSW
413527 00224 ACC
360949 00240 B 2598
16
Tuning environment
17
Design flow using the tuning environment
18
Experiments
  • Started with 8051 soft core in VHDL
  • Tuning environment was used to
  • Examine where power consumption was occurring for
    a given application
  • Quickly evaluate the impact of tuning
    optimizations
  • These are early results, much more work remains

19
Power consumption of the initial 8051 model
  • Power consumption
  • Mainly due to switching wires
  • Any wire whos value changed (from 0 to 1)
    consumes power
  • Want to minimize switching
  • 8051 power consumption
  • 5 main components
  • Controller, RAM, and ALU are the most expensive
    components
  • These components have potential for general
    optimizations
  • Total Gates - 25854

Average power 37.1824 mW
20
General optimizations made to the 8051
  • Prevent unnecessary switching on wires connecting
    to memories
  • Wires connecting processor to memories are high
    capacitance
  • They were switching even when not being used
  • So we inserted latches to hold the previous
    value, a standard power-saving technique
  • Prevent unnecessary switching in decoder and ALU
  • Again, by latching the inputs coming from the
    controller
  • Fetch instruction bytes only when needed
  • Hold ROM output when not being read

21
Power after general optimizations
  • Overall power reduction from 37.2 to 11.6 mW.
  • Total gates - 25951
  • improvements
  • ROM 82.9
  • RAM 70.5
  • ALU 60.0
  • CTR 19.9

Average power 11.6025 mW
22
Tuning optimizations
  • Sought to tune the microprocessor to a particular
    applicaton
  • GCD (Greatest common divisor) computation
  • Tuning optimizations invoked
  • 1) Replace frequently-accessed RAM locations by
    internal registers
  • 2) Create datapath shortcuts for most common
    instructions
  • 3) Partition the controller into a big controller
    and a small controller, with the small one
    handling the most frequently-executed GCD
    instructions

23
Sample tuning optimization
  • Observation
  • RAM consumes much power
  • Address 224 accessed frequently
  • Possible tuning optimization
  • Replace this RAM location by a register
  • Steps
  • Modify VHDL model
  • Run all three view tools
  • Results
  • Power reduction 7.67 to 7.27 mW
  • RAM reduced from 1.42 to 0.8 mW, CTRL increased
    slightly

Addr Purpose Accesses 00128 P0
1311 00129 SP 70317 00130 DPL
31189 00131 DPH
7977 00144 P1 161 00208 PSW
413527 00224 ACC
360949 00240 B 2598
24
Replacing certain RAM locations by registers
  • PSW and accumulator are separated from RAM
    entity, placed in internal registers
  • Total gates - 26465
  • improvements
  • RAM 46.1
  • Overall 15.8

Average Power 9.7684 mW
25
Optimized datapath
Addr Ins Freq Pwr FreqPwr 00000
LJMP 1 0 0 00003 MOV_9 108 5.46067 589.752 0000
5 MOV_9 108 5.46067 589.752 00007
MOV_9 108 5.46067 589.752 00009
MOV_9 108 5.46067 589.752 00011
RET 108 0 0 00012 MOV_9 27 5.46067 147.438 0001
4 MOV_9 27 5.46067 147.438 00016
MOV_9 27 5.46067 147.438 00018
MOV_9 27 5.46067 147.438 00020
MOV_4 27 4.83507 130.547 00022 LCALL 27 0 0
  • MOV from reg7 to ACC very common
  • Add shortcut signal to register file
  • Avoids having data go through ALU
  • Total Gates - 26315
  • Power reduced by 0.32 mW (2.7)

Average power 11.2857 mW
26
Controller Partitioning
  • Motivation
  • In many applications, 90 of the time is spent in
    10 of the code (or some similar ratio)
  • So lets partition the controller into two, one
    handling the 10 of frequently executed code
  • This smaller controller should consume less power
  • Results
  • Average power reduced from 11.6 mW to 11.3 mW
    (2.6)
  • Total gates - 28731

27
Conclusions
  • Described an environment for tuning a
    microprocessor to its application for low power
  • Full instruction set compatibility
  • Multiple views helps find power hogs
  • Fully automated
  • Focus is now on developing tuning optimizations
  • Controller partitioning, small-loop table,
    datapath shortcuts, register-file copies, etc.
  • Investigate possibility of automating tuning
    optimizations, develop more general tuning
    methodology
  • Environment for the 8051 is available on the web
  • http//www.cs.ucr.edu/dalton
Write a Comment
User Comments (0)
About PowerShow.com