Скачать презентацию A First-step Towards an Architecture Tuning Methodology for Скачать презентацию A First-step Towards an Architecture Tuning Methodology for

be78954063bf807932e4c867053f8e5d.ppt

  • Количество слайдов: 17

A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Roman Lysecky Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine Department of IP Management Conexant Newport Beach This work was supported by the National Science Foundation under grants CCR 9811164 and CCR-9876006, and by a Design Automation Conference graduate scholarship.

Introduction: advent of cores n In the past, board-level embedded systems were built using Introduction: advent of cores n In the past, board-level embedded systems were built using discrete IC’s Board Processor Memory Peripheral n Today, single-IC systems are increasingly being built, using IP’s (Intellectual Property) A. k. a. “cores” Hard core: layout Firm core: structure (HDL) Soft core: synthesizable behavior (HDL) n “System-on-a-chip” (SOC) n n Core library Peripheral. A Peripheral. B Processor. X Peripheral Mem Processor IP cores

Introduction: embedded systems n SOC’s implementing an embedded system have a unique feature n Introduction: embedded systems n SOC’s implementing an embedded system have a unique feature n Implements a particular application n Thus, the processor may execute a single fixed program that never changes n Unlike desktop systems, which execute a variety of programs n Examples: digital camera, automobile cruise-controller n We can exploit this fixed-program feature n For example, by using mask-programmed ROM n But much more can be done

Introduction: architecture tuning n A way to exploit the fixed- program feature of embedded Introduction: architecture tuning n A way to exploit the fixed- program feature of embedded systems n First, do architecture design for the particular application n Then, “tune” the corebased system architecture to the particular application program, before IC fabrication n Goals: better performance, power, size Fixed program Core library Peripheral. A Peripheral. B Processor. X Architecture design Peripheral Architecture tuning Peripheral Processor Prog. Processor HDL Prog. Fabrication HDL Tuned cores Peripheral Processor IC Prog.

Introduction: architecture tuning n Examples of tuning optimizations n Memory hierarchy: no cache, L Introduction: architecture tuning n Examples of tuning optimizations n Memory hierarchy: no cache, L 1+L 2 cache n Cache organization: size, associativity, line size n Bus structure, data/address encoding n Microprocessor optimizations n Internal small-loop table n Controller partitioning n Datapath shortcuts n Register file copies

Introduction: Tuning is a special case of Y-Chart iteration n Philips/Tri. Media approach of Introduction: Tuning is a special case of Y-Chart iteration n Philips/Tri. Media approach of simultaneously developing architecture and its applications Architecture Applications Mapping Analysis Our focus Numbers

Problem description n Focus of this work: n Tuning a microcontroller to its program Problem description n Focus of this work: n Tuning a microcontroller to its program n Goal is reduced power without performance loss n Restrict tuning to maintain exact instruction set compatibility n No instructions may be added or deleted n Thus, no modification to software development environment n Also, no problems with porting software to/from other versions of the microcontroller n Instruction set incompatibility can be a show stopper

Previous work n Application-specific instruction-set processors [Fisher 99] n Customize a microprocessor to its Previous work n Application-specific instruction-set processors [Fisher 99] n Customize a microprocessor to its application(s) n e. g. , Tensilica n Customized instruction-set, requiring customized tools n Tuning compiler to architecture [Tiwari et al 94] n Architectural description languages to inform compiler of architecture features [Halambi et al 99] n Tuning cache and cache/bus [Givargis et al 99] organization to application

Tuning environment n Currently for the 8051 microcontroller n Starts from VHDL synthesizable model Tuning environment n Currently for the 8051 microcontroller n Starts from VHDL synthesizable model of 8051 (soft core) n Uses Synopsys synthesis, simulation and power analysis n Uses 8051 instruction-set simulator n Uses numerous scripts n Goal of the enviroment n Understand how power is being consumed for a particular application, so that modifications to the architecture (or application) can be made to minimize that power n Three main tools n Architectural view n Instruction-set view n Program/data memory view

Tuning environment: architectural view tool Microprocessor soft core RT-synthesizer Microprocessor structure Program binary ROM Tuning environment: architectural view tool Microprocessor soft core RT-synthesizer Microprocessor structure Program binary ROM 1. 04 m. W ROM generator ALU 1. 62 m. W ROM entity Simulator and power analyzer “Flat” power data Total 7. 66 m. W RAM 1. 42 m. W CTRL 2. 69 m. W DECODER 0. 07 m. W Structural hierarchical power data translator and xdu display

Tuning environment: instruction-set view tool Binaries to exercise instruction 1 exer Binaries to instructionto Tuning environment: instruction-set view tool Binaries to exercise instruction 1 exer Binaries to instructionto exe Binaries 2 instruction 3 ROM generator Microprocessor structure ROM entity Simulator and power analyzer Flat power data for instruction 1 Flat power data for instruction 2 Flat power data for instruction 3 Power data collector, structural power data translator, and xdu display Instruction Power (m. W) ADDC_1 7. 340834 ADD_1 7. 350741 ANL_1 6. 631394 CLR_1 3. 76228 CPL_1 5. 481627 DA 5. 28897 DEC_1 5. 368807 DIV 7. 716592 INC_1 4. 662862 MOVC_1 6. 078014 MOVC_2 5. 021021 MOV_1 5. 577664 MOV_2 6. 164267 MUL 5. 522886 NOP 4. 900275 ORL_1 6. 954121 POP 8. 103867 PUSH 8. 7116

Tuning environment: program/data memory view tool Per-instruction power data Program binary Instruction-set simulator Program/data Tuning environment: program/data memory view tool Per-instruction power data Program binary Instruction-set simulator Program/data memory access frequencies and power Program hierarchy power translator and xdu display Addr 000003 00005 00007 00009 00011 00012 00014 00016 00018 00020 00022 Ins LJMP MOV_9 RET MOV_9 MOV_4 LCALL Addr 00128 00129 00130 00131 00144 00208 00224 00240 Freq 1 108 108 108 27 27 27 Purpose P 0 SP DPL DPH P 1 PSW ACC B Pwr 0 5. 46067 5. 46067 4. 83507 0 Accesses 1311 70317 31189 7977 161 413527 360949 2598 Freq*Pwr 0 589. 752 0 147. 438 130. 547 0

Tuning environment Program binary Microprocessor core Program/data memory view tool (seconds) Architectural view tool Tuning environment Program binary Microprocessor core Program/data memory view tool (seconds) Architectural view tool (1 hour) Instruction-set power view tool (1 day) Program power data Architecture power data Instruction-set power data

Design flow using the tuning environment Change application Run program / data memory view Design flow using the tuning environment Change application Run program / data memory view tool Change architecture Run architecture view tool Run instruction -set view tool No Satisfied? Yes DONE

Sample tuning optimization n Observation ROM 1. 04 m. W n RAM consumes much Sample tuning optimization n Observation ROM 1. 04 m. W n RAM consumes much power n Address 224 accessed frequently n Possible tuning optimization n Replace this RAM location by a ALU 1. 62 m. W Total 7. 66 m. W RAM 1. 42 m. W register inside the CTRL module CTRL 2. 69 m. W n Steps DECODER 0. 07 m. W n Modify VHDL model n Run all three view tools n Results n Power reduction: 7. 67 to 7. 27 m. W Addr 00128 00129 00130 00131 00144 00208 00224 00240 Purpose P 0 SP DPL DPH P 1 PSW ACC B Accesses 1311 70317 31189 7977 161 413527 360949 2598

Some recent data n Applied the tuning environment for a particular application n Converted Some recent data n Applied the tuning environment for a particular application n Converted two frequently-accessed RAM locations to registers n 15% total power savings n Introduced datapath shortcuts for the two most common register-to-register moves of the application, thus bypassing the ALU n 10% total power savings n Partitioned the controller into two, one small one implementing the frequently-executed instructions n 10 -15% power savings, but we expect much more if we do a better job partitioning the design

Conclusions n Described an environment for tuning a microprocessor to its application for low Conclusions n Described an environment for tuning a microprocessor to its application for low power n Full instruction set compatibility n Multiple views helps find power hogs n Fully automated n Focus is now on developing tuning optimizations n Controller partitioning, small-loop table, datapath shortcuts, register-file copies, etc. n Investigate possibility of automating tuning optimizations, develop more general tuning methodology n Environment for the 8051 is available on the web: n http: //www. cs. ucr. edu/~dalton