Скачать презентацию Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Скачать презентацию Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs

eae569d8e1b3d3e26705602c99c3090d.ppt

  • Количество слайдов: 26

Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening Tradeoffs in Flight-Design Upset Mitigation in State-of-the-Art FPGAs Hardened By Design vs. Design-Level Hardening Gary M. Swift and Ramin Roosta Jet Propulsion Laboratory / California Institute of Technology The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology. Swift and Roosta 1 144_C 4 / MAPLD 04

In the beginning was Actel … • Leveraging from a commercial product line § In the beginning was Actel … • Leveraging from a commercial product line § ONO anti-fuse based one-time programmable (OTP) • “beginning” = 1993 § Reference: Katz, R. ; Barto, R. ; Mc. Kerracher, P. ; Carkhuff, B. ; Koga, R. ; “SEU hardening of field programmable gate arrays (FPGAs) for space applications and device characterization, ” IEEE Transactions on Nuclear Science, Dec. 1994 Swift and Roosta 2 144_C 4 / MAPLD 04

Later, Xilinx Leveraging from a commercial product line § SRAM based reconfigurable “later” = Later, Xilinx Leveraging from a commercial product line § SRAM based reconfigurable “later” = 1998 § Reference: Guertin, S. M. ; Swift, G. M. ; Nguyen, D. ; “Single-event upset test results for the Xilinx XQ 1701 L PROM”, Radiation Effects Data Workshop Record, 1999 § Quote: (Xilinx SRAM-based FPGAs)… “do appear suited to a broad range of other (non-critical) applications, such as sensor and camera controllers. ” Swift and Roosta 3 144_C 4 / MAPLD 04

OUTLINE • FPGAs: A key enabling technology for modern spacecraft • Background in radiation OUTLINE • FPGAs: A key enabling technology for modern spacecraft • Background in radiation testing of FPGAs ▫ Earlier, Katz/Swift collaboration ▫ Recently, Xilinx Consortium • Feature Comparison • Triple Modular Redundancy (TMR) hardware approach vs. software approach • Concluding Remarks Swift and Roosta 4 144_C 4 / MAPLD 04

FPGAs: A key “enabling technology” Like custom ASICs, FPGAs can replace whole boards § FPGAs: A key “enabling technology” Like custom ASICs, FPGAs can replace whole boards § Saving mass, volume, power § Achieving extra functionality FPGAs are much cheaper than ASICs § Design efforts can be later in the schedule § Design mistakes don’t require a re-spin through the foundry Swift and Roosta 5 144_C 4 / MAPLD 04

MER Pyro-Controller Used self-checking of configuration to initiate a reconfiguration after spotting an upset MER Pyro-Controller Used self-checking of configuration to initiate a reconfiguration after spotting an upset Swift and Roosta 6 144_C 4 / MAPLD 04

MER Pyro-Controller Nearing Mars Swift and Roosta Xilinx XQR 4062 XL 7 144_C 4 MER Pyro-Controller Nearing Mars Swift and Roosta Xilinx XQR 4062 XL 7 144_C 4 / MAPLD 04

My Background • Actel experience is older § No direct involvement in radiation tests My Background • Actel experience is older § No direct involvement in radiation tests since the ONO anti-fuse was replaced § Results here are from others’ work • Xilinx experience is recent § Active participant in Xilinx Rad Test Consortium § Currently, finishing two+ year test campaign targeting the Virtex II family Swift and Roosta 8 144_C 4 / MAPLD 04

Currently Available Devices Actel RT 54 SX-S family (-SU) vs. Xilinx Virtex II family Currently Available Devices Actel RT 54 SX-S family (-SU) vs. Xilinx Virtex II family Note: both are essentially immune to single-event latchup and have good total ionizing dose tolerance, [ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ] Swift and Roosta 9 144_C 4 / MAPLD 04

Main Feature Comparison Actel RT 54 SX 72 S Xilinx XQR 2 V 6000 Main Feature Comparison Actel RT 54 SX 72 S Xilinx XQR 2 V 6000 Gates: flip/flops: 72, 000 2012 ~6 M ( /~3. 2 ) 67, 584 / 3. 2 = ~20 k I/O Pins: 360 824 / 3 = 274 Speed external : Speed internal : 230 MHz 310 MHz Swift and Roosta 622 Mb/s (I-mode LVDS) 360 MHz 10 144_C 4 / MAPLD 04

Extra Features Comparison Actel RT 54 SX 72 S Block RAM: I/O Standards: Xilinx Extra Features Comparison Actel RT 54 SX 72 S Block RAM: I/O Standards: Xilinx XQR 2 V 6000 no 2. 5 Mb many Others: hardwired TMR Swift and Roosta 11 Clock Manager Multipliers 144_C 4 / MAPLD 04

Actel: What bits can upset? User flip-flops only § Direct hits of same flip/flop Actel: What bits can upset? User flip-flops only § Direct hits of same flip/flop in multiple domains ▫ Very unlikely due to layout § Clock domain hits SEFI modes essentially eliminated Swift and Roosta 12 144_C 4 / MAPLD 04

Xilinx: What bits can upset? × × • Configuration Bits § Logical Function § Xilinx: What bits can upset? × × • Configuration Bits § Logical Function § Routing § User Options × × • Block RAM • User Flip-flops NAND Ex-OR Flip-Flop type etc… Type of I/O Mode of Block RAM Access Clock Manager etc… • Control Registers Swift and Roosta 13 144_C 4 / MAPLD 04

Xilinx: Heavy Ion Test Results Low Threshold (soft) Low Susceptibility (hard) Resulting in fairly Xilinx: Heavy Ion Test Results Low Threshold (soft) Low Susceptibility (hard) Resulting in fairly low in-space rates: ~6 per day for 2 V 6000 in GCRmin. Swift and Roosta 14 144_C 4 / MAPLD 04

Actel: Heavy Ion Test Results Data for two RTAX 2000 S prototypes at 1 Actel: Heavy Ion Test Results Data for two RTAX 2000 S prototypes at 1 MHz using checkerboard pattern Where’s Threshold ? ? ? Low Susceptibility (~100 x harder) from Fig. 12, JJ Wang et al. , NSREC 2003 [Ref. 1] Very low in-space rates (assume LETth > 40 achieved): ~1 per 6800 years for SX 72 -S in GCRmin. Swift and Roosta 15 144_C 4 / MAPLD 04

Actel-style TMR SX-A “R” cell triplicates to: RTSX-S “R” cell Swift and Roosta 16 Actel-style TMR SX-A “R” cell triplicates to: RTSX-S “R” cell Swift and Roosta 16 144_C 4 / MAPLD 04

Actel-style TMR is fairly straightforward: § Each flip-flop is replaced by three plus feedback Actel-style TMR is fairly straightforward: § Each flip-flop is replaced by three plus feedback voter § Triplicated elements spread out physically § Uses one clock/inverse-clock domain § No external parts needed Swift and Roosta 17 144_C 4 / MAPLD 04

Xilinx-style TMR is more complicated: § First, it’s not too useful without configuration scrubbing Xilinx-style TMR is more complicated: § First, it’s not too useful without configuration scrubbing § Whole functional blocks are triplicated, not individual flip-flops § Three voters are used § Three clock domains § Elimination of: ▫ Weak keepers (aka half latches) ▫ Use of configuration cells as part of the design - For example, SRL 16 § Needs some external circuitry (at least, a watchdog timer + PROMs) Swift and Roosta 18 144_C 4 / MAPLD 04

Xilinx-style TMR Swift and Roosta 19 144_C 4 / MAPLD 04 Xilinx-style TMR Swift and Roosta 19 144_C 4 / MAPLD 04

Xilinx-style TMR In Xilinx-style TMR, I/O’s use three pins tied externally : P Minority Xilinx-style TMR In Xilinx-style TMR, I/O’s use three pins tied externally : P Minority Voter D 0 P Minority Voter D D 1 P Minority Voter D 2 Board Traces Pins Swift and Roosta 20 144_C 4 / MAPLD 04

Xilinx TMRtool • Xilinx-style TMR done by hand is difficult and tedious • An Xilinx TMRtool • Xilinx-style TMR done by hand is difficult and tedious • An automated tool which integrates into the design flow has been developed (“now” available) • In-beam testing shows tool is very effective Swift and Roosta 21 144_C 4 / MAPLD 04

Upset Comparison • ATMR now has eliminated: § Upsets of static storage elements, and Upset Comparison • ATMR now has eliminated: § Upsets of static storage elements, and § SEFIs • ATMR upsets from: § Transients that are clocked into storage § Clock tree hits • Xilinx FPGAs have a small susceptibility to two types of SEFIs § Reset (sometimes only partial) § Disable scrub port • XTMR in combination with scrubbing can lower system upset rates below the SEFI rate Swift and Roosta 22 144_C 4 / MAPLD 04

Rate Comparison • Actel • Dominated by transients • Roughly one system error per Rate Comparison • Actel • Dominated by transients • Roughly one system error per thousand years (GCRmin) • Xilinx • Dominated by SEFI rate • Expect one SEFI per ~65 years in GCRmin • Expect one system error ~5 -20 x less often GCR = Galactic Cosmic Ray background (interplanetary space) almost identical to geosynchronous orbit Swift and Roosta 23 144_C 4 / MAPLD 04

CONCLUSIONS For the present – Both can achieve very acceptable radiation tolerance Actel wins CONCLUSIONS For the present – Both can achieve very acceptable radiation tolerance Actel wins on: ▫ Less burden on the designer ▫ No auxiliary components ▫ Lower SEFI susceptibility Xilinx wins on: ▫ Designer control of the resources vs. hardness tradeoff ▫ On-chip feature set ▫ Re-configurability Competition is good. Swift and Roosta 24 144_C 4 / MAPLD 04

Acronyms FPGA - Field Programmable Gate Array ASIC - Application Specific Integrated Circuit SEU Acronyms FPGA - Field Programmable Gate Array ASIC - Application Specific Integrated Circuit SEU - Single Event Upset SEFI - Single Event Functionality Interrupt TMR - Triple Modular Redundancy ATMR - Actel-style TMR XTMR - Xilinx-style TMR LET - Linear Energy Transfer (proportional to deposited charge per micron for a heavy ion strike on an active node) GCRmin - Galactic Cosmic Ray background (highest during “solar minimum” period of ~11 -yr cycle of sunspots) MER - Mars Exploration Rovers (i. e. , Spirit and Opportunity) Swift and Roosta 25 144_C 4 / MAPLD 04

Additional References [1] J. J. Wang, W. Wong, S. Wolday, B. Cronquist, J. Mc. Additional References [1] J. J. Wang, W. Wong, S. Wolday, B. Cronquist, J. Mc. Collum, Katz, I. Kleyner, “Single event upset and hardening in 0. 15 antifuse-based field programmable gate array, ” IEEE Transactions on Nuclear Science, Dec. 2003 R. [2] Jih-Jong Wang, R. B. Katz, F. Dhaoui, J. L. Mc. Collum, W. Wong, B. E. Cronquist, R. T. Lambertson, E. Hamdy, I. Kleyner, W. Parker, “Clock buffer circuit soft errors in antifuse-based field programmable gate arrays, ” IEEE Transactions on Nuclear Science, Dec. 2000 [3] R. Katz, J. J. Wang, R. Koga, K. A. La. Bel, J. Mc. Collum, R. Brown, R. A. Reed, B. Cronquist, S. Crain, T. Scott, W. Paolini, B. Sin, “Current radiation issues for programmable elements and devices, ” IEEE Transactions on Nuclear Science, Dec. 1998 Swift and Roosta 26 144_C 4 / MAPLD 04