eae569d8e1b3d3e26705602c99c3090d.ppt
- Количество слайдов: 26
Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening Gary M. Swift and Ramin Roosta Jet Propulsion Laboratory / California Institute of Technology The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology. Swift and Roosta 1 144_C 4 / MAPLD 04
In the beginning was Actel … • Leveraging from a commercial product line § ONO anti-fuse based one-time programmable (OTP) • “beginning” = 1993 § Reference: Katz, R. ; Barto, R. ; Mc. Kerracher, P. ; Carkhuff, B. ; Koga, R. ; “SEU hardening of field programmable gate arrays (FPGAs) for space applications and device characterization, ” IEEE Transactions on Nuclear Science, Dec. 1994 Swift and Roosta 2 144_C 4 / MAPLD 04
Later, Xilinx Leveraging from a commercial product line § SRAM based reconfigurable “later” = 1998 § Reference: Guertin, S. M. ; Swift, G. M. ; Nguyen, D. ; “Single-event upset test results for the Xilinx XQ 1701 L PROM”, Radiation Effects Data Workshop Record, 1999 § Quote: (Xilinx SRAM-based FPGAs)… “do appear suited to a broad range of other (non-critical) applications, such as sensor and camera controllers. ” Swift and Roosta 3 144_C 4 / MAPLD 04
OUTLINE • FPGAs: A key enabling technology for modern spacecraft • Background in radiation testing of FPGAs ▫ Earlier, Katz/Swift collaboration ▫ Recently, Xilinx Consortium • Feature Comparison • Triple Modular Redundancy (TMR) hardware approach vs. software approach • Concluding Remarks Swift and Roosta 4 144_C 4 / MAPLD 04
FPGAs: A key “enabling technology” Like custom ASICs, FPGAs can replace whole boards § Saving mass, volume, power § Achieving extra functionality FPGAs are much cheaper than ASICs § Design efforts can be later in the schedule § Design mistakes don’t require a re-spin through the foundry Swift and Roosta 5 144_C 4 / MAPLD 04
MER Pyro-Controller Used self-checking of configuration to initiate a reconfiguration after spotting an upset Swift and Roosta 6 144_C 4 / MAPLD 04
MER Pyro-Controller Nearing Mars Swift and Roosta Xilinx XQR 4062 XL 7 144_C 4 / MAPLD 04
My Background • Actel experience is older § No direct involvement in radiation tests since the ONO anti-fuse was replaced § Results here are from others’ work • Xilinx experience is recent § Active participant in Xilinx Rad Test Consortium § Currently, finishing two+ year test campaign targeting the Virtex II family Swift and Roosta 8 144_C 4 / MAPLD 04
Currently Available Devices Actel RT 54 SX-S family (-SU) vs. Xilinx Virtex II family Note: both are essentially immune to single-event latchup and have good total ionizing dose tolerance, [ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ] Swift and Roosta 9 144_C 4 / MAPLD 04
Main Feature Comparison Actel RT 54 SX 72 S Xilinx XQR 2 V 6000 Gates: flip/flops: 72, 000 2012 ~6 M ( /~3. 2 ) 67, 584 / 3. 2 = ~20 k I/O Pins: 360 824 / 3 = 274 Speed external : Speed internal : 230 MHz 310 MHz Swift and Roosta 622 Mb/s (I-mode LVDS) 360 MHz 10 144_C 4 / MAPLD 04
Extra Features Comparison Actel RT 54 SX 72 S Block RAM: I/O Standards: Xilinx XQR 2 V 6000 no 2. 5 Mb many Others: hardwired TMR Swift and Roosta 11 Clock Manager Multipliers 144_C 4 / MAPLD 04
Actel: What bits can upset? User flip-flops only § Direct hits of same flip/flop in multiple domains ▫ Very unlikely due to layout § Clock domain hits SEFI modes essentially eliminated Swift and Roosta 12 144_C 4 / MAPLD 04
Xilinx: What bits can upset? × × • Configuration Bits § Logical Function § Routing § User Options × × • Block RAM • User Flip-flops NAND Ex-OR Flip-Flop type etc… Type of I/O Mode of Block RAM Access Clock Manager etc… • Control Registers Swift and Roosta 13 144_C 4 / MAPLD 04
Xilinx: Heavy Ion Test Results Low Threshold (soft) Low Susceptibility (hard) Resulting in fairly low in-space rates: ~6 per day for 2 V 6000 in GCRmin. Swift and Roosta 14 144_C 4 / MAPLD 04
Actel: Heavy Ion Test Results Data for two RTAX 2000 S prototypes at 1 MHz using checkerboard pattern Where’s Threshold ? ? ? Low Susceptibility (~100 x harder) from Fig. 12, JJ Wang et al. , NSREC 2003 [Ref. 1] Very low in-space rates (assume LETth > 40 achieved): ~1 per 6800 years for SX 72 -S in GCRmin. Swift and Roosta 15 144_C 4 / MAPLD 04
Actel-style TMR SX-A “R” cell triplicates to: RTSX-S “R” cell Swift and Roosta 16 144_C 4 / MAPLD 04
Actel-style TMR is fairly straightforward: § Each flip-flop is replaced by three plus feedback voter § Triplicated elements spread out physically § Uses one clock/inverse-clock domain § No external parts needed Swift and Roosta 17 144_C 4 / MAPLD 04
Xilinx-style TMR is more complicated: § First, it’s not too useful without configuration scrubbing § Whole functional blocks are triplicated, not individual flip-flops § Three voters are used § Three clock domains § Elimination of: ▫ Weak keepers (aka half latches) ▫ Use of configuration cells as part of the design - For example, SRL 16 § Needs some external circuitry (at least, a watchdog timer + PROMs) Swift and Roosta 18 144_C 4 / MAPLD 04
Xilinx-style TMR Swift and Roosta 19 144_C 4 / MAPLD 04
Xilinx-style TMR In Xilinx-style TMR, I/O’s use three pins tied externally : P Minority Voter D 0 P Minority Voter D D 1 P Minority Voter D 2 Board Traces Pins Swift and Roosta 20 144_C 4 / MAPLD 04
Xilinx TMRtool • Xilinx-style TMR done by hand is difficult and tedious • An automated tool which integrates into the design flow has been developed (“now” available) • In-beam testing shows tool is very effective Swift and Roosta 21 144_C 4 / MAPLD 04
Upset Comparison • ATMR now has eliminated: § Upsets of static storage elements, and § SEFIs • ATMR upsets from: § Transients that are clocked into storage § Clock tree hits • Xilinx FPGAs have a small susceptibility to two types of SEFIs § Reset (sometimes only partial) § Disable scrub port • XTMR in combination with scrubbing can lower system upset rates below the SEFI rate Swift and Roosta 22 144_C 4 / MAPLD 04
Rate Comparison • Actel • Dominated by transients • Roughly one system error per thousand years (GCRmin) • Xilinx • Dominated by SEFI rate • Expect one SEFI per ~65 years in GCRmin • Expect one system error ~5 -20 x less often GCR = Galactic Cosmic Ray background (interplanetary space) almost identical to geosynchronous orbit Swift and Roosta 23 144_C 4 / MAPLD 04
CONCLUSIONS For the present – Both can achieve very acceptable radiation tolerance Actel wins on: ▫ Less burden on the designer ▫ No auxiliary components ▫ Lower SEFI susceptibility Xilinx wins on: ▫ Designer control of the resources vs. hardness tradeoff ▫ On-chip feature set ▫ Re-configurability Competition is good. Swift and Roosta 24 144_C 4 / MAPLD 04
Acronyms FPGA - Field Programmable Gate Array ASIC - Application Specific Integrated Circuit SEU - Single Event Upset SEFI - Single Event Functionality Interrupt TMR - Triple Modular Redundancy ATMR - Actel-style TMR XTMR - Xilinx-style TMR LET - Linear Energy Transfer (proportional to deposited charge per micron for a heavy ion strike on an active node) GCRmin - Galactic Cosmic Ray background (highest during “solar minimum” period of ~11 -yr cycle of sunspots) MER - Mars Exploration Rovers (i. e. , Spirit and Opportunity) Swift and Roosta 25 144_C 4 / MAPLD 04
Additional References [1] J. J. Wang, W. Wong, S. Wolday, B. Cronquist, J. Mc. Collum, Katz, I. Kleyner, “Single event upset and hardening in 0. 15 antifuse-based field programmable gate array, ” IEEE Transactions on Nuclear Science, Dec. 2003 R. [2] Jih-Jong Wang, R. B. Katz, F. Dhaoui, J. L. Mc. Collum, W. Wong, B. E. Cronquist, R. T. Lambertson, E. Hamdy, I. Kleyner, W. Parker, “Clock buffer circuit soft errors in antifuse-based field programmable gate arrays, ” IEEE Transactions on Nuclear Science, Dec. 2000 [3] R. Katz, J. J. Wang, R. Koga, K. A. La. Bel, J. Mc. Collum, R. Brown, R. A. Reed, B. Cronquist, S. Crain, T. Scott, W. Paolini, B. Sin, “Current radiation issues for programmable elements and devices, ” IEEE Transactions on Nuclear Science, Dec. 1998 Swift and Roosta 26 144_C 4 / MAPLD 04