5c59a571c4b9732264972b67b8ada06f.ppt
- Количество слайдов: 23
Upset Susceptibility and Design Mitigation of Power. PC 405 Processors Embedded in Virtex II-Pro FPGAs Swift 1 P 173/MAPLD 2005
Authors Gary Swift Jet Propulsion Laboratory/California Institute of Technology Gregory Allen Jet Propulsion Laboratory/California Institute of Technology Jeffrey George The Aerospace Corporation Swift 2 P 173/MAPLD 2005
Authors Sana Rezgui Xilinx Corporation Carl Carmichael Xilinx Corporation Fayez Chayab MDRobotics Swift 3 P 173/MAPLD 2005
Abstract We show recent results for the upset susceptibility of the registers and caches in the embedded Power. PC 405 in the Xilinx V 2 P 40 FPGA. For critical flight designs where configuration upsets are mitigated effectively, these upsets can dominate the system error rate. We consider several techniques for implementing various levels of redundancy to reduce system errors, including single-, dual- and triple-chip options. We conclude that the dual-chip option may often be the best choice and warrants further study. Swift 4 P 173/MAPLD 2005
Background - Reconfigurable FPGA Upsets The basic building blocks are soft to upset [Ref. 1] Swift 5 P 173/MAPLD 2005
Background - Upset Mitigation Critical applications require design-level upset mitigation • Design Triplication – The use of TMR (or triple modular redundancy) in a design allows correct function through triplicated majority voters even when a configuration element is upset. – The extra design effort is now largely automated by new software (TMRtool). • Active Configuration Scrubbing – Upsets in the configuration must not be allowed to accumulate or TMR will “break” – Scrubbing uses some resources, but can be implemented so that it is transparent to system operation. Swift 6 P 173/MAPLD 2005
Embedded “Hard-Core” Processor(s) Upset Power. PC 405 cores in Virtex II-Pro family FPGAs offer unprecedented computational power inside an FPGA, but include additional upsetable storage elements Swift 7 P 173/MAPLD 2005
Processor Upsets – Data Cache Processor caches are very important features for increased performance; however, upsets in the caches can lead to system errors. Swift 8 P 173/MAPLD 2005
Processor Upset Mitigation The “obvious” solution of implementing TMR with three processor cores is not an available single chip option because the maximum number of processors per FPGA is currently two. Tradeoffs between upset robustness and system complexity, possibly spanning multiple FPGAs, must be considered. Swift 9 P 173/MAPLD 2005
One-Chip Solution Running two processors in lockstep is conceptually simple, esp. as they can reside in a single FPGA. A fast TMR-ed comparison block is required to contain errors and not allow them to propagate into the rest of the system. A processor upset will appear to the comparison block as a disagreement, necessitating both processors be stopped within the current clock cycle. Then they both must be forced to roll back to a known good software “bookmark” or, alternatively, to reboot. Swift 10 P 173/MAPLD 2005
Flow Chart One-Chip Solution Swift 11 P 173/MAPLD 2005
Advantages • Contained in one chip – No chip-to-chip interconnects (minimal latency and propagation delay) – Lower power consumption – Less board area – No chip-to-chip synchronization • Technology is more developed and tested [See Reference 2] Swift 12 P 173/MAPLD 2005
Disadvantages • More system outages – Reboot or rollback on every error – Not suitable for some critical real-time applications • Twice as many errors as on a single processor, but at least they are detected Note: Requires extra device – either watchdog timer or external configuration scrubber Swift 13 P 173/MAPLD 2005
Two-Chip Solution With four processors in lockstep (necessitating two chips), a solution as robust as full TMR is possible. In this scheme, a pair of processors that get into a disagreement due to an upset will be stopped while the system runs without interruption on the processor pair that are in agreement. Correct internal state information is available in the working pair. , preferably soon. Thus, it is possible to resynchronize almost transparently and rapidly get back to full fourprocessor lockstep operation with minimal intrusion. As a side effect of using two separate FPGAs, additional robustness is possible by adding on cross-strapped configuration control. Swift 14 P 173/MAPLD 2005
Flow Chart Two-Chip Solution Swift 15 P 173/MAPLD 2005
Advantages • Reboots rare; requires simultaneous errors in two separate processors • Processor upsets are transparently handled without system outage until convenient re-synchronization opportunites • Enhanced robustness – outages lowered to less than the SEFI rate of ~1 in 80 years per device • Allows added configuration robustness – Chips check each other (not self-checking) – Eliminates need for external watchdog timer Swift 16 P 173/MAPLD 2005
Disadvantages • Complicated – Inter-chip communication/synchronization – Transparent reboot/resynchronization of both processors in chip with error • Twice the power consumption • In-beam testing is not yet done (although planned for the near future) Swift 17 P 173/MAPLD 2005
Three-Chip Solution The three-chip implementation (also known as the “virtual FPGA” solution [Ref. 3]) takes the responsibility of error detection out of the hands of the upsetable FPGAs by adding a Radiation. Hardened ASIC. Note that only one processor per FPGA is needed. The ASIC handles stopping error propagation and re-synchronizing an upset processor. Additionally, the ASIC can be used for configuration control of all three FPGAs. Swift 18 P 173/MAPLD 2005
Flow Chart Three-Chip Solution Swift 19 P 173/MAPLD 2005
Advantages • • Maximum robustness to upsets Only three processors in lockstep (but in 3 chips) More fabric available for other functions No system outages; errors and SEFIs are handled transparently • Most implementation details are confined to the ASIC and don’t affect the IP in the FPGAs significantly Swift 20 P 173/MAPLD 2005
Disadvantages • Complex ASIC development for controller to vote outputs and re-load/re-sync upset processor • ASIC development cost (currently funded though) • Board area Swift 21 P 173/MAPLD 2005
Conclusions • Both two-chip and three-chip solutions have about the same robustness, power consumption, and system complication, but handle upsets better than the onechip solution. • The two- vs. three-chip decision mostly boils down to the familiar FPGA vs. ASIC debate • Three-chip solution may use less power than the twochip. (Is the ASIC’s power consumption less than that of one processor core? ) • At present, the JPL-preferred approach is the two-chip implementation achieving maximum flexibility and near maximum robustness to upsets. Swift 22 P 173/MAPLD 2005
References • [1] J. George et al. , “Initial Single-Event Effects Testing and Mitigation in the Xilinx Virtex II-Pro FPGA, ” Paper 211, MAPLD 2005. • [2] M. Wang and G. Bolotin, “SEU Mitigation Techniques for Xilinx Virtex-II Pro FPGA, ” Paper D 110, MAPLD 2004, http: //klabs. org/mapld 04/presentations/session_d/ 1_d 110_wang_s. ppt • [3] J. Lyke and B. Marty, Virtual Field Programmable Gate Array Triple Modular Redundant Cell Design, Air Force Research Laboratory: Space Vehicles Directorate, AFRL-VS-PS-TR-2004 -1093, April 28, 2004. Swift 23 P 173/MAPLD 2005
5c59a571c4b9732264972b67b8ada06f.ppt