Скачать презентацию Layered Approach To Intrinsic Evolvable Hardware Using Direct Скачать презентацию Layered Approach To Intrinsic Evolvable Hardware Using Direct

283e06523c8328999a579aece71a7487.ppt

  • Количество слайдов: 20

Layered Approach To Intrinsic Evolvable Hardware Using Direct Bitstream Manipulation Of Virtex II Pro Layered Approach To Intrinsic Evolvable Hardware Using Direct Bitstream Manipulation Of Virtex II Pro Devices Rashad S. Oreifej, Rawad N. Al-Haddad, Heng Tan and Ronald F. De. Mara University of Central Florida

Evolvable Hardware Automated Construction: develop Electronic Circuits by Intelligent Search Applications: Design, Optimization, or Evolvable Hardware Automated Construction: develop Electronic Circuits by Intelligent Search Applications: Design, Optimization, or Failure Recovery phases Intelligent Search Hardware Design Bayesian Amplifiers Simulated Annealing Filters Genetic Algorithms FPGAs Nearest Neighbor Antennas Evolvable Hardware Applications GAs frequently use binary strings to represent candidate solutions: genotype - Translation to FPGA Configuration bitstream maps genotype to phenotype FPGAs for evolving digital logic Individual (Chromosome) GENE

GAs and Evolution Genetic Algorithms: - Implement guided trial-and-error search using principles of Darwinian GAs and Evolution Genetic Algorithms: - Implement guided trial-and-error search using principles of Darwinian evolution - Iterative selection enforces “survival of the fittest” - Genetic operators - mutation, crossover, … - can be used to refurbish designs Extrinsic Evolution Intrinsic Evolution • • Functional models abstract physical aspects of device Representation has to undergo placement and routing before implementation. • Genetic Algorithm Simulation in the loop Done? software model Build it Hardware in the loop • Fitness measured using physical device output Observes constraints imposed by internal structure

Related Work • Conventional vs. Evolutionary Design [Miller, Alg. Evol Strat. 98] • - Related Work • Conventional vs. Evolutionary Design [Miller, Alg. Evol Strat. 98] • - GA is presented that can evolve 100% functional adder and multiplier circuits - Explored the effect of the device physical constraints (Xilinx 6216 FPGA) - Emphasized EH feasibility over FPGA implementation concerns Fitness-based vs. Population-based Evolution [Keymeulen, IEEE Trans. Rel 02] - Design fault-insensitive electronic components using evolutionary techniques - Online and offline repair techniques via an intrinsic design tool (EHWPack) - Fine-grained CMOS Field Programmable Transistor Array (FPTA) architecture is used to evolve analog multiplier and digital XNOR • Intrinsic EHW on Virtex Devices [Hollingworth, ICES 00] [ • Recent General-purpose Frameworks Support Bitstream Reuse - Evolution by partial reconfiguration of bitstream for changes from baseline circuit - Runtime configuration using Xilinx’s JBits Interface (Java in the loop) - Blodget et al [Blodget FPL 03] l Two-layer framework for Virtex II devices using Xilinx Partial Reconfiguration Toolkit (XPART) utilzing a soft processor core within the FPGA - Williams et al [Williams ERSA 04] l Egret focuses on a full SOC solution using ICAP and an embedded Linux system on a Xilinx Virtex II chip with bash shell scripts to perform operations, such as obtaining partial bit streams from remote servers, and initiating reconfiguration - Kalte et al [Kalte PDPS 05] l REPLICA (Relocation per online Configuration Alteration) filter uses the Select. MAP interface to perform bitstream manipulation to carry out the relocation during the regular download process

UCF Intrinsic Evolution Platform The developed platform utilizes the following hardware components on the UCF Intrinsic Evolution Platform The developed platform utilizes the following hardware components on the FPGA chip: 1. JTAG (IEEE 1149. 1) Port • • • 2. Half-duplex serial communication interface Connects to the General-purpose Native jt. Ag Tester (GNAT) from the FPGA side, and to the parallel port (IEEE 1284) on the host PC using a Xilinx Parallel Cable Confers input/output data exchanged between the host PC and the FPGA GNAT • • 3. Implemented in the bitstream to reside on the reconfigurable area Connects to the BSCAN_VIRTEX 2 block via the Test Data Input (TDI), Test Data Output (TDO), and Control signals, and to the targeted circuit via a straightforward read/write bus interface Evolved Circuit • • Circuit to be evolved on the FPGA chip Circuit peripherals are connected to the read/write bus of the GNAT to receive/deliver data throughput input/output

UCF Platform Software Components The developed platform consists of following software components: 1. GA UCF Platform Software Components The developed platform consists of following software components: 1. GA Engine • • 2. 3. Chromosome Manipulator • • C based GA operators library (yet executed using Visual Studio. NET) Provides a logical abstraction and hardware transparency of genetic operators to the GA Engine module MRRA • • 4. C++ based console application implemented using an object oriented architecture Implements a conventional population-based GA with runtime customizable parameters Partitions operations into Logic, Translation, and Reconfiguration layers with a standardized set of APIs FPGA configurations are manipulated at runtime using on-chip resources on Xilinx Virtex II Pro via PC (JTAG) or Power. PC (Select. MAP) Bitstream File • • Pre-compiled baseline bitstream generated using the Xilinx CAD tools The platform manipulates this bitstream to carry out the physical mapping of the crossover or mutation

Intrinsic Evolution Workflow START: 1. Initialization: obtain configuration from. bit modulebased flow performed in Intrinsic Evolution Workflow START: 1. Initialization: obtain configuration from. bit modulebased flow performed in two phases FPGA Reconfiguration Iterate: framebased flow 3. Fitness Evaluation: 2. GA Operations: derive new individuals Pattern Evaluation

Multilayer Runtime Reconfiguration Architecture (MRRA) Framework for Dynamic Reconfiguration • • Three layers (Logic, Multilayer Runtime Reconfiguration Architecture (MRRA) Framework for Dynamic Reconfiguration • • Three layers (Logic, Translation, and Reconfiguration) with well-defined interfaces promoting modularity and reuse within a set of high-level APIs to carry out the partial reconfiguration process with reduced manual intervention. Task-level Modularity: provide support at levels down to and including task-level granularity. A task is defined as an arbitrary function synthesized to a module that can be dynamically downloaded into the reconfigurable device: Module-based or Frame-based. Runtime Scenario Support: provide the ability to generate and reconfigure task bitstreams at runtime as well as design-time. Runtime scenarios envisioned at design-time may not necessarily know in advance which tasks will arrive nor when they will arrive, and in selected cases what some of their specific properties will be. Encapsulation: control logic of each layer self-contained with fixed interface to other layers. If new control algorithms are added or the device platform is changed, the system can be ported more readily.

MRRA Logic Control Flow Integrated and adopted Module-based Flow from the standard Xilinx flow MRRA Logic Control Flow Integrated and adopted Module-based Flow from the standard Xilinx flow plus selected area management ability and direct bit management process, we term Framebased Flow. Module-based utilized at design time. Later, translation engine supports autonomous reconfiguration without GUI interface. • • • One-Dimensional Area Management performed on full physical FPGA device by partitioning into 1 dimensional column-based rectangles, for fixed and reconfigurable modules arranged based on size and specified area constraints. Tools, such as Plan. Ahead, are accommodated. Bus Macros maintain correct connections between modules by spanning boundaries of these rectangular regions. Next, the modules are implemented and verified individually to create the Module Implementation and optimized by additional Two-Dimensional Area Allocation placements inside each module to minimize the partial reconfiguration bitstream size. After initial bitstream download, precompiled partial bitstreams can be monitored by algorithms in Logic Layer and updated directly to device for dynamic reconfiguration. New modification requests can be generated by the user logic in the form of hardware-independent representation depicted by the Runtime Flow. Although boundary of module is fixed, physical logic resources inside can be modified at runtime.

Direct Bitstream Manipulation Concept and Case Study X 0 0 1 0 1 1 Direct Bitstream Manipulation Concept and Case Study X 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 1 S /D Cin / Bin S 0 1 Cout / Bout Adder/ Subtracter Cout 0 0 Y Cin 0 0 X Y 0 1 1 X Y 96 S Cin (a) 1 Bit Full Adder Logic Switch E 8 Cout Y Bin Bout D 0 X 0 0 0 1 1 1 Y 0 1 1 Bin 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 X 96 D 8 E Bout (b) 1 Bit Full Subtracter • Change one-bit full adder to a one-bit full subtracter • Both have three one-bit inputs and two one-bit outputs, 2 LUTs with identical logic connections between LUTs and I/O signals • Only difference is only one truth table stored inside one LUT, changing from 0 x. E 8 to 0 x 8 E • Practical case study: dynamically reconfigurable SHA-1/MD 5 Message Digest hashing algorithms:

Direct Bitstream Management Equations deduced to locate logic content in V 2 Pro bitstream Direct Bitstream Management Equations deduced to locate logic content in V 2 Pro bitstream • Each CLB has 4 slices in 2 cols/2 rows as Xi. Yj, where X is the slice column number, 0 <= i <= 2 N-1, beginning from left. N=number CLB cols. Y = row number 0 <= j <= 2 K-1 from bottom to top and K=number CLB rows, e. g. XC 2 VP 7 N=40, K=34 • Configuration frame has unique 32 -bit address of Block Address (BA), a Major Address (MJA), a Minor Address (MNA), and a byte number offset • Let X denote column and overhead include GCLK + leftmost IOB + IOI col (e. g. 3): • Full configuration file: organized consecutively by frame without labeling: • In 5 bytes of slice, first 16 bits for G-LUT truth table (left to right as MSB to LSB) and the last 16 bits for F-LUT (reverse order from LSB to MSB). Each LUT max 4 inputs with up to 16 truth table elements but when less than 4 inputs utilized, remaining unused entries are filled with the duplicated effective values of the used entries:

Experimental Setup Target Circuit 4 -bit x 4 -bit unsigned adder 1) Unseeded Design: Experimental Setup Target Circuit 4 -bit x 4 -bit unsigned adder 1) Unseeded Design: random. bit population Experiments 2) Seeded Design: single functional. bit individual 3) Repair: single randomly injected stuck-at-fault (0 or 1) GA Parameters Parameter Range Evaluated Value Selected Number of LUTs for design 8 8 Number of LUTs for repair 8 -13 13 Population Size 5 -20 10 Mutation Rate 5%-90% 50% Crossover Rate 30%-90% 60% Tournament Size 1 -8 6 Elitism Size 1 -2 1

Stuck-at Zero and One Fault Modeling LUT address • Virtex II Pro chip has Stuck-at Zero and One Fault Modeling LUT address • Virtex II Pro chip has 16 -bit LUTs with four input lines and one output • If the Least Significant Bit (LSB) input pin is stuck-at zero, only the memory locations of the pattern (XXX 0)2 will be accessible • This behavior can be achieved by copying the content of the memory locations of the pattern (XXX 0)2 into (XXX 1)2 and overwriting their old values • The same concept can be extended where the location of the stuck input line (0, 1, 2, 3) determines the stride (1, 2, 4, 8) between the memory locations to copy, and the value of the stuck at condition (zero or one) determines the direction of the copy operation (left or right)

Performance Metrics : The numerical measure of fitness for best individual of final generation, Performance Metrics : The numerical measure of fitness for best individual of final generation, e. g. 2^(two 4 -bit inputs) * 5 -bit output=1280 : The arithmetic mean for the fitness of all individuals in the final generation of the run : The total number of generations in the run : The time elapsed to perform the GA crossover and mutation during the entire run : The time elapsed to apply the input patterns and read back the corresponding outputs for all the fitness evaluations during the entire run : The average time taken by a single genetic crossover for a certain GA run : The average time taken by a single genetic mutation for a certain GA run

Experimental Results Summary Fastest convergence Repair must overcome failed resource limitation Microsecond Order Experimental Results Summary Fastest convergence Repair must overcome failed resource limitation Microsecond Order

Circuit Evolution: Fitness vs. Time Unseeded Design Repair: Stuck-at Fault Seeded Design Circuit Evolution: Fitness vs. Time Unseeded Design Repair: Stuck-at Fault Seeded Design

Results Summary Multiple Seconds • • • Millisecond Order An intrinsic evolution platform is Results Summary Multiple Seconds • • • Millisecond Order An intrinsic evolution platform is developed for genetic operators and fitness assessment using API layers which directly manipulate the configuration bitstream on Xilinx Virtex II Pro devices Three experiments were conducted: unseeded design, and repair Full design/repair is achievable using this platform with an average time of 0. 4 microseconds to perform the genetic mutation, 0. 7 microseconds to perform the genetic crossover, and 5. 6 milliseconds for one input pattern intrinsic evaluation Performance advantage of three orders of magnitude over JBITS and more than seven orders of magnitude over the Xilinx design tool driven flow for realizing intrinsic genetic operators on a Virtex II Pro device Current work is on utilizing partial reconfiguration to reduce JTAG transfer time and porting to Virtex-4 platform

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] S. Vigander, "Evolutionary Fault Repair in Space Applications, " in Dept. of Computer & Information Science, vol. Masters Thesis. Trondheim: Norwegian University of Science and Technology (NTNU), 2001. J. F. Miller, P. Thomson, and T. Fogarty. , "Designing Electronic Circuits Using Evolutionary Algorithms. Arithmetic Circuits: A Case Study, " in Algorithms and Evolution Strategy in Engineering and Computer Science, D. Quagliarella, J. Periaux, C. Poloni, and G. Winter, Eds. Chichester, England, 1998, pp. 105 -131. D. Keymeulen, R. S. Zebulum, Y. Jin, and A. Stoica, "Fault-Tolerant Evolvable Hardware Using Field. Programmable Transistor Arrays, " IEEE Transactions On Reliability, vol. 49, issue 3, September 2000. R. S. Oreifej, C. A. Sharma, and R. F. De. Mara, "Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware, " in proc. International Conference on Reconfigurable Computing and FPGAs (Reconfig'06), San Luis Potosi, Mexico, September 20 -22, 2006, pp. 106 -113. R. F. De. Mara and K. Zhang. , "Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration, " in Proc. of the NASA/Do. D Conference on Evolvable Hardware (EH'05), Washington D. C. , U. S. A, June 29 -01, 2005. G. Hollingworth, S. Smith, and A. Tyrrell, "The intrinsic evolution of virtex devices through internet reconfigurable logic, " in Proc. of the Third International Conference on Evolvable System, April 2000. H. Tan and R. F. De. Mara, "A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management, " in proc. of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'05), Las Vegas, Nevada, U. S. A, June 27 -30, 2005. D. Wallace, "Using the JTAG Interface as a General-Purpose Communication Port, " www. xilinx. com/publications/xcellonline/xcell_53/xc_pdf/xc_jtag 53. pdf, 2005. Xilinx, "Parallel Cable IV Connects Faster and Better, " Xcell Journal, Spring 2002. Xilinx, "Using a Microprocessor to Configure Xilinx FPGAs via Slave Serial or Select. MAP Mode, " v 1. 4, November 2003, B. Blodget, P. James-Roxby, E. Keller, S. Mc. Millan, and P. Sundararajan, “A Self-Reconfiguring Platform”, in Proceedings of Field-Programmable Logic and Applications 2003, Lisbon, Portugal, September 1 -3, 2003. J. Williams, and N. Bergmann, “Embedded Linux as a Platform for Dynamically Self-Reconfiguring Systems-On. Chip”, in Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA 2004), Las Vegas, Nevada, USA, 21 -24 June, 2004. H. Kalte, G. Lee, M. Porrmann, and U. Ruckert, “REPLICA: A Bitstream Manipulation Filter for Module Relocation in Partial Reconfigurable Systems”, in Proceedings of 19 th IEEE International Proceedings of Parallel and Distributed Processing Symposium, Denver, Colorado, USA, April 04 -08, 2005.

MRRA Translation Process MRRA Translation Process

Current Work: Direct Bitstream Management Current Work: Direct Bitstream Management