Скачать презентацию National Aeronautics and Space Administration Software Reliability Techniques Скачать презентацию National Aeronautics and Space Administration Software Reliability Techniques

a1e37242624bb2349fd0f3dd1c8081d4.ppt

  • Количество слайдов: 30

National Aeronautics and Space Administration Software Reliability Techniques Applied to Constellation Technical Briefing NASA National Aeronautics and Space Administration Software Reliability Techniques Applied to Constellation Technical Briefing NASA OSMA Software Assurance Symposium September 9 -11, 2008 Allen P. Nikora, JPL/Caltech, PI Sergio Guarro, ASCA, Inc. , Co-I This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office 09/09/2008 SAS 08_Classify_Defects_Nikora 1

National Aeronautics and Space Administration Agenda • Problem/Approach • Relevance to NASA • Accomplishments National Aeronautics and Space Administration Agenda • Problem/Approach • Relevance to NASA • Accomplishments and/or Tech Transfer Potential • Technology Readiness Level • Data Availability • Impediments to Research or Application • Next Steps 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 2

National Aeronautics and Space Administration Problem/Approach • Software-related failures responsible for more than half National Aeronautics and Space Administration Problem/Approach • Software-related failures responsible for more than half of NASA major space mission losses or malfunctions between 1996 and 2007 – Large majority due to system conditions not been anticipated or fully understood in the system / software specification and design process – As NASA space missions are increasingly controlled by software, probability of mission failure due to software may increase if no action is taken – Minimizing loss of crew/loss of mission requires appropriate techniques to evaluate reliability of onboard and ground-based support software during all development phases. 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 3

National Aeronautics and Space Administration Problem/Approach (cont’d) • Modeling of a software system in National Aeronautics and Space Administration Problem/Approach (cont’d) • Modeling of a software system in its anticipated operational context is an important aspect of assuring software reliability. – Recognized in concept of “operational profile”, software reliability model assumptions – Many techniques for modeling software reliability treat software in isolation from the hardware on which it runs and which it controls. • Goals: – Demonstrate feasibility of applying Context-based Software Risk Modeling (CSRM) technique to Cx. P applications/scenarios • Focus on mission-critical applications such as GN&C, Safety and Health Monitoring, Launch Abort – Develop guidelines for use of context-based techniques – Infuse context-based SW reliability modeling techniques to other NASA SW development efforts 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 4

National Aeronautics and Space Administration Relevance to NASA • Reliability of software component depends National Aeronautics and Space Administration Relevance to NASA • Reliability of software component depends on operating environment. CSRM explicitly includes context in system/software models. • Unlike traditional software reliability modeling techniques, CSRM helps guide software testing • CSRM can be used to evaluate risk of software failure during specification and design phases as well as during implementation and test. – Identify risk-prone areas earlier in development reduced number of defects passed through to test and operations – Earlier identification of risk-prone areas more effective management of development resources 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 5

National Aeronautics and Space Administration Accomplishments and/or Tech Transfer Potential • Selected PA-1 as National Aeronautics and Space Administration Accomplishments and/or Tech Transfer Potential • Selected PA-1 as initial scenario to be modeled • Acquired relevant artifacts from Windchill, JSC contacts • Analysis of PA-1 software specifications/design in progress • Development of CSRM models of PA-1 software in progress. – GNC is the initial software component selected for modeling 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 6

National Aeronautics and Space Administration Technology Readiness Level • CSRM is TRL 9 – National Aeronautics and Space Administration Technology Readiness Level • CSRM is TRL 9 – Actual system has been thoroughly demonstrated and tested in its operational environment. – All documentation completed. – Successful operational experience. – Sustaining engineering support in place. • Goal of this effort is to apply CSRM to Cx. P rather than developing new software reliability modeling techniques 09/09/2008 SAS 08_Classify_Defects_Nikora 7

National Aeronautics and Space Administration Data Availability • Access to Windchill repository, Cx. P National Aeronautics and Space Administration Data Availability • Access to Windchill repository, Cx. P artifacts • Contact points at JSC, GSFC to – Help with navigation through repository – Obtain needed artifacts from contractors that aren’t in repository 09/09/2008 SAS 08_Classify_Defects_Nikora 8

National Aeronautics and Space Administration Impediments to Research or Application • Large volume of National Aeronautics and Space Administration Impediments to Research or Application • Large volume of data – difficult to navigate through repository and identify appropriate artifacts. 09/09/2008 SAS 08_Classify_Defects_Nikora 9

National Aeronautics and Space Administration Next steps • Complete development of PA-1 model(s) • National Aeronautics and Space Administration Next steps • Complete development of PA-1 model(s) • Analyze models; evaluate software failure risk • Review models, results • Refine models • Select further applications to model 09/09/2008 SAS 08_Cx. P_SWRel_Nikora 10

National Aeronautics and Space Administration Technical Detail National Aeronautics and Space Administration Technical Detail

National Aeronautics and Space Administration CSRM Key Features From “Risk-Informed Software Assurance for NASA National Aeronautics and Space Administration CSRM Key Features From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • CSRM Context-based Software Risk Model • A practical approach and framework for assurance of mission-critical software-intensive systems for NASA programs’ use – System and mission scenario analysis oriented – Integrates traditional PRA event-tree / fault-tree models with Dynamic Flowgraph Methodology (DFM) models suited to handle softwareintensive and human-in-the-loop systems (“dynamic PRA” environments) – Can be applied for both preliminary assessments of yet-to-be-written software and in-depth assessment of existing, testable software – Produces software test guidance, as well as assurance and PRAintegrated risk models and metrics – Supported by implementation toolsets • Classical PRA and DFM software Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 12

National Aeronautics and Space Administration CSRM Technical Highlights • • • From “Risk-Informed Software National Aeronautics and Space Administration CSRM Technical Highlights • • • From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 PRA-style development of mission and risk scenario models Uses traditional event-tree / fault-tree logic models at top modeling level to capture the basic aspects of mission scenarios Uses Dynamic Flowgraph Methodology (DFM) models to capture dynamic and logically complex aspects of system/software/operator interactions – DFM analytical and quantitative results are fully compatible / can be integrated with PRA tool binary models and results (SAPHIRE, CAFTA) • Can incorporate risk, reliability and assurance info from other tools and sources – SW-process-quality information and non-project-specific reliability data and assessments • SW reliability info collected in other projects and deemed applicable as a first-estimates of risk levels in current SW modules of interest – Produces software test guidance, as well as assurance and PRA-SW defect / reliability model output (e. g. , Schneidewind’s model or other) – Traditional test results Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 13

National Aeronautics and Space Administration CSRM Analysis Overview From “Risk-Informed Software Assurance for NASA National Aeronautics and Space Administration CSRM Analysis Overview From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 1. Inspect / examine conventional PRA ET/FT models and identify SW related system functions and events 2. Quantify SW functions and events via process-quality assessment methods and/or generic SW data (as needed and applicable for preliminary assessment and prioritization purposes) Event-tree branch-point to be further modeled analyzed Approach 09/09/2008 Next Slide SAS 08_Cx. P_SWRel_Nikora 14

National Aeronautics and Space Administration CSRM Analysis Overview (cont’d) From “Risk-Informed Software Assurance for National Aeronautics and Space Administration CSRM Analysis Overview (cont’d) From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 3. Develop DFM model of high-priority SW related functions, accordingly expanding ET branch-point or FT events of interest P 1 1 -P 1 Approach 09/09/2008 Next Slide SAS 08_Cx. P_SWRel_Nikora 15

National Aeronautics and Space Administration CSRM Analysis Overview (cont’d) From “Risk-Informed Software Assurance for National Aeronautics and Space Administration CSRM Analysis Overview (cont’d) From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 4. 5. 6. Use DFM multi-valued logic / dynamic analysis of higher-level ET or FT event to identify SW and HW/SW potential failure mode sub-scenarios (e. g. , “cut-set” constituted of < HW-failure-X AND SW-faulty-response-Y >) Test HW/SW in actual or simulated integrated system setup, to exclude or establish risk upper-bound for existence of analytically identified potential cutsets Insert and integrate Step 4 and 5 results into overall PRA ET/FT models, to obtain full system-level mission assurance, risk analysis and quantification perspective Approach 09/09/2008 Next Slide SAS 08_Cx. P_SWRel_Nikora 16

National Aeronautics and Space Administration CSRM Data Needs From “Risk-Informed Software Assurance for NASA National Aeronautics and Space Administration CSRM Data Needs From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • Logic model(s) development and qualitative analysis – Logic model(s) development and qualitative (i. e. , logic) analysis are iterative processes. – Logic model(s) for the software and the balance-of-system will evolve with the design of the system. – The fidelity of the model(s) and the qualitative analytical results increases with this evolution process. Early Design Phase Design Maturity System Integration Phase Data need for SW Interface documents, Preliminary SW design spec. , Preliminary Hazard Analyses, FMECAs, Classification of SW failure data for similar designs Detailed SW design docs. , Pseudo code, Preliminary module testing (qualitative results – e. g. types of contexts tested, types of errors encountered) Executable code, Module & Integration testing (qualitative results) Data need for Balance-of-system Conceptual design docs. , High level qualitative risk assessment models such as FMEAs, master logic diagrams Detailed design docs. , Preliminary qualitative risk assessment models such as event sequence diagrams, event trees, fault trees, fish bone models etc. System integration docs. , System PRA model Approach 09/09/2008 Next Slide SAS 08_Cx. P_SWRel_Nikora 17

National Aeronautics and Space Administration CSRM Data Needs (cont’d) From “Risk-Informed Software Assurance for National Aeronautics and Space Administration CSRM Data Needs (cont’d) From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • Quantitative Analysis – Quantitative analysis is also an iterative process: • Preliminary qualitative and quantitative results identify SW error-forcing contexts to be tested and establish the testing criteria for meeting the reliability threshold. • More detailed qualitative and quantitative results identify areas of refinement for risk management and risk reduction. • Final qualitative and quantitative results estimate the contribution of the SW to the overall system risk. Early Design Phase Design Maturity System Integration Phase Data need for SW Generic SW failure data or reliability / risk assessments for similar designs Preliminary module testing (qualitative / quantitative results – e. g. type and no. of contexts tested, no. of tests executed, type & no. of errors encountered) Executable code, Module & Integration testing (quantitative results) Data need for Balance-of-system High level quantitative risk assessment models such as top-level event tree / fault-tree quantifications Preliminary quantitative risk assessment results, such as quantitative estimates for failure modes of sub-systems interacting w/ the SW Quantitative risk assessment results Approach 09/09/2008 Next Slide SAS 08_Cx. P_SWRel_Nikora 18

National Aeronautics and Space Administration Dynamic Flowgraph Methodology From “Risk-Informed Software Assurance for NASA National Aeronautics and Space Administration Dynamic Flowgraph Methodology From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • • § DFM is a directed-graph, modeling methodology that uses multi-valued logic and discrete-event dynamic representation of system parameter and component states Capable of handling – within the limits of the discrete state and time representations: – Cause-effect relationshiops – Time-dependent relationships. – Feedback and logic loops – Cognitive models of human operator actions. A DFM system model, once constructed, can be analyzed in either deductive (e. g. , “fault-tree like”) of inductive (e. g. , “FMEA or event-tree like”) mode – Deductive analysis produces the “prime implicants” for any “top event” that can be defined in terms of combinations of possible system parameter and/or component states (even across time boundaries) – Inductive analysis tracks the evolution of parameter, component and system states over discrete time and logic steps, starting from any user defined combination of states that represents a possible system state Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 19

National Aeronautics and Space Administration DFM and PRA/PSA Tools From “Risk-Informed Software Assurance for National Aeronautics and Space Administration DFM and PRA/PSA Tools From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • DFM is not intended to be a substitute of any existing PRA tool (although in “binary mode” it can mimic both event-tree and fault-tree models) • DFM can be most useful as a PRA/PSA modeling supplement, for those special portions of a system or mission that call for the use of non-static, non-binary models. A DFM system model, once constructed, • can be analyzed in either deductive (e. g. , “fault-tree like”) of inductive (e. g. , “FMEA or event-tree like”) mode DFM can be integrated with an existing PRA/PSA framework by inserting its results into an existing ET / FT model framework – This can be automated if the ET / FT tool offers a data interchange utility and / or an “open API” Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 20

National Aeronautics and Space Administration DFM Constructs and Modeling Representations From “Risk-Informed Software Assurance National Aeronautics and Space Administration DFM Constructs and Modeling Representations From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • Nodes and discretized statevectors represent key process parameters and/or components • Mapping between the discretized state-vectors is governed by multi-valued logic rules – Transfer-boxes (decision tables) – Transition-boxes (decision tables with built-in time transitions) Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 21

National Aeronautics and Space Administration Steps in Typical DFM Analysis From “Risk-Informed Software Assurance National Aeronautics and Space Administration Steps in Typical DFM Analysis From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 Step 1: Model Construction • Construct DFM model of system of interest – Representing the system behavior and flow of causality – (Model is a network of nodes, transfer-boxes, transition-boxes, and associated arc connections) Step 2: System Analysis • Use DFM inductive and deductive engines to: 1. Verify specified behavior (can be done on system “design model”) 2. Identify system failure modes in terms of basic component failure modes (“Automated FMEA”) 3. Develop “Dynamic Scenario Trees” (similar to dynamic event trees) 4. Identify prime implicants for system failure (“Top-Events” of interest) 5. Define test sequences specifically suited to identify and isolate varioius classes of possible faulrs. (This feature is useful for generating input vectors for testing software based systems) Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 22

National Aeronautics and Space Administration Steps in Typical DFM Analysis From “Risk-Informed Software Assurance National Aeronautics and Space Administration Steps in Typical DFM Analysis From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 Step 3: Quantification of System Analysis • DFM Model results usually identify subevents that contribute probability to the branch-points of a system / mission event tree – DFM analysis is equivalent in concept and results to the fault-tree analyses carried out in traditional PRA to provide further definition and quantification to system sequences initially defined via event-tree models • DFM “top events” are quantified in fashion similar to fault-tree “top events” • To quantify a DFM Top Event, the set of associated n prime implicants (PIs) is first converted into a set of m mutually exclusive implicants (MEIs) Top Event = MEI 1 MEIm • The sum of probabilities for the MEIs yields the probability of the Top Event P(Top Event) = P(MEI 1) + + P(MEIm) • The above is in essence the multi-value logic equivalent of the BDD (Binary Decision Diagram) quantification process for fault-trees Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 23

National Aeronautics and Space Administration Use of DFM in CSRM Framework From “Risk-Informed Software National Aeronautics and Space Administration Use of DFM in CSRM Framework From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • CSRM (Context-based Software Risk Model) is a framework to address and guide the integration of functional models of softwarerelated risk into “classical” PRA / PSA frameworks • CSRM is the modeling approach for software intensive space systems ) recommended and illustrated in the NASA PRA Procedures Guide • CSRM can be implemented for simpler systems using only standard ET / FT PRA models • For more complex systems, use of methods with more advanced and dynamic features (such as DFM or “colored Markov”) is recommended, at least for part of the modeling and analytical effort Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 24

National Aeronautics and Space Administration Example: Top Level DFM Model of Mini. AERCam System National Aeronautics and Space Administration Example: Top Level DFM Model of Mini. AERCam System From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 1 clk = 1 sec. This node represents the actual attitude of the Mini-AERCam. This is the sub-model for the GN&C Software. It is expanded in the next slide. It is discretized into 3 states: 1. Correct (Error < 3˚) 2. Slightly Inaccurate (Error of 3˚ to 10˚) ) 3. Inaccurate (Error > 10˚) Approach Next Slide 09/09/2008 SAS 08_Classify_Defects_Nikora 25

National Aeronautics and Space Administration Example: DFM Model of Mini-AERCam GN&C Sub-Model From “Risk-Informed National Aeronautics and Space Administration Example: DFM Model of Mini-AERCam GN&C Sub-Model From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 This sub-model includes the GPS hardware and the translational navigation software. Approach ) Next Slide This sub-model includes the angular rate gyro hardware and the rotational navigation software. 09/09/2008 1 clk = 1 sec. SAS 08_Classify_Defects_Nikora 26

National Aeronautics and Space Administration Example: DFM Model of Mini-AERCam Propulsion Subsystem From “Risk-Informed National Aeronautics and Space Administration Example: DFM Model of Mini-AERCam Propulsion Subsystem From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 This node represents a leak in the propulsion system fuel lines after the isovalve but before thruster solenoids. It is discretized into 4 states: 1. None ) 2. Small (1 – 40%) A small leak produces thrust and torque of less than 40% of the total thrust and torque the Mini-AERCam can produce to counteract it. A leak of this magnitude should not significantly affect the performance of the Mini-AERCam. 3. Large (41 -80%) Produces thrust or torque within 80% of the Mini-AERCam’s. The Mini-AERCam can compensate and should be recoverable, but its performance is inadequate to perform its mission safely. 4. Critical (> 81%) The Mini-AERCam is expected to be uncontrollable. Approach 09/09/2008 SAS 08_Classify_Defects_Nikora Next Slide 27

National Aeronautics and Space Administration Analysis of Mini-AERCam DFM Model From “Risk-Informed Software Assurance National Aeronautics and Space Administration Analysis of Mini-AERCam DFM Model From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • • • Analysis of the Autonomous Hold Failure Top Event yields n prime implicants (PIs) Top Event = PI 1 Pin DFM prime implicants identify: – HW-only fault conditions – SW-only fault conditions – Combinations of HW & SW fault conditions For example: – Prime Implicant 1 is Iso. Valve. Cond = Stuck Closed at time-1. HW only fault – Prime Implicant 2 is Target. Att = Inaccurate at time-1. SW only error (The Target. Att node in the GN&C sub-model represents the accuracy of the target attitude determined by the rotational guidance software function. The PI identifies the possibility that a programmer introduced an error when coding the module, resulting in severely inaccurate output when the latter is used. ) Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 28

National Aeronautics and Space Administration Mini-AERCam Model Analysis (cont’d) From “Risk-Informed Software Assurance for National Aeronautics and Space Administration Mini-AERCam Model Analysis (cont’d) From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • • Prime Implicant 3 is Prop. Line. Leak = Small Leak at time-2. and. Rot. Thruster. Comm = Slightly Inaccurate at time-1 This Prime Implicant corresponds to a combination of hardware and software conditions. (The hardware condition is a small leak in one of the propellant lines. The software condition is an algorithmic fault that causes drifting of the attitude control given a sub-nominal thrust caused by a line leak. ) – If only one of the two conditions exists, the Mini-AERCam does not fail: • The GN&C software works properly when no leak exists. • If a small leak occurs but there is no drift error in the attitude control, the GN&C is able to compensate for the leak by using the thrusters. This PI example shows how DFM analysis can identify an off-nominal entry condition for which the SW may have to be tested: – does not correspond to a normal state of the system; – would not be usually identified and tested for in a standard SW V&V process addressing the SW operational profile. Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 29

National Aeronautics and Space Administration Risk-Informed Testing of Potential SW Risk Scenario and Quantification National Aeronautics and Space Administration Risk-Informed Testing of Potential SW Risk Scenario and Quantification of DFM Prime Implicant From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc. , November, 2007 • Prime Implicant 3 is one of the mutually exclusive implicants. It can be quantified by considering: – – • • From a HW failure rate database (e. g. , NPRD), the entry condition can be determined to occur with a failure rate of 6. 00 E-06/hr. For a 5 hour mission duration, the associated probability is P(C 3) = 3. 00 E-05. The SW attitude control function can then be tested in the (real or simulated) presence of the system (HW fault) entry condition to determine whether it performs correctly or not – • Without the specific identification of the HW fault condition, random sampling of the SW normal operational input space may never cover the actual system condition! In the case discussed the risk quantification process was completed via a simulated “hardware in the loop” test process – • The “entry condition” (i. e. small propellant line leak) The conditional probability that the software causes an attitude shift under this triggering condition Sampling conducted across the possible range of initial states (i. e. , Mini. AERCam spatial and rotational positions, compatible thruster command settings, etc. ) in which the system could be at the onset of the leak condition. With the aid of the CSRM – DFM analysis a normalized sampling set of 450 tests was sufficient to “demonstrate” a risk contribution in the order of 1. E-6 from this scenario, if no erroneous GN&C SW response was observed in the tests – This was obtained via a straight Bayesian estimation, starting from a uniform, non informative prior Approach 09/09/2008 Next Slide SAS 08_Classify_Defects_Nikora 30