Скачать презентацию Developing Safety Critical Software Fact and Fiction John Скачать презентацию Developing Safety Critical Software Fact and Fiction John

c2b0a4713be48e971a243c4cbb1a5eb1.ppt

  • Количество слайдов: 89

Developing Safety Critical Software: Fact and Fiction John A Mc. Dermid Developing Safety Critical Software: Fact and Fiction John A Mc. Dermid

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Part 1 Fact – costs and distributions Fiction – get the requirements right Part 1 Fact – costs and distributions Fiction – get the requirements right

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Costs and Distributions n Examples of industrial experience – Specific example – Some more Costs and Distributions n Examples of industrial experience – Specific example – Some more general observations n Example covers – Cost by phase – Where errors are introduced – Where errors are detected n and their relationships

To System Integration Process Phases Effort/Cost by Phase From System Specification Via Software Engineering To System Integration Process Phases Effort/Cost by Phase From System Specification Via Software Engineering

Error Introduction FE = Functional Effect Min FE typically data change Error Introduction FE = Functional Effect Min FE typically data change

Finding Requirements Errors Requirements testing tends to find requirements errors Phases on Pie Chart Finding Requirements Errors Requirements testing tends to find requirements errors Phases on Pie Chart System Validation

Result - High Development Cost Errors Introduced Here…. . Result - High Development Cost Errors Introduced Here…. .

Result - High Development Cost …. are not found until here Errors Introduced Here…. Result - High Development Cost …. are not found until here Errors Introduced Here…. .

Result - High Development Cost …. are not found until here Errors Introduced Here…. Result - High Development Cost …. are not found until here Errors Introduced Here…. . After following safety critical development process

Software and Money n Typical productivity – 5 Lines of Code (Lo. C) person Software and Money n Typical productivity – 5 Lines of Code (Lo. C) person day 1 k. Lo. C person year – Requirements to end of module test n Typical avionics “box” – 100 k. Lo. C – 100 person years of effort – Circa £ 10 M for software, so £ 500 M on a modern aircraft?

% functions performed by software US Aircraft Software Dependence Year Do. D Defense Science % functions performed by software US Aircraft Software Dependence Year Do. D Defense Science Board Task Force on Defense Software, November 2000

Increasing Dependence Software often determinant of function n Software operates autonomously n – Without Increasing Dependence Software often determinant of function n Software operates autonomously n – Without opportunity for human intervention, e. g. Mercedes Brake Assist n Software affected by other changes – e. g new weapons fit on Euro. Fighter n Software has high levels of authority

Inappropriate Cof. G control in fuel system can reduce fatigue life of wings Inappropriate Cof. G control in fuel system can reduce fatigue life of wings

Growing Dependency n Problem is growing – Now about a third of aircraft development Growing Dependency n Problem is growing – Now about a third of aircraft development costs – Increasing proportion of car development n Around 25% of capital cost of new cars in electronics – Problem made more visible by rate of improvements in tools for “mainstream” software development

Growth of Airborne Software Approx £ 1. 5 B at current productivity and costs Growth of Airborne Software Approx £ 1. 5 B at current productivity and costs

Size In Function Points The Problem - Size matters 1 function point = 80 Size In Function Points The Problem - Size matters 1 function point = 80 SLOC of Ada 1 function point =128 SLOC of C Probability of Software Project Being Cancelled Capers Jones, Becoming Best In Class, Software Productivity Research, 1995 briefing

Is Software Safety an Issue? n Software has a good track record – A Is Software Safety an Issue? n Software has a good track record – A few high profile accidents Therac 25 n Ariane 501 n Cali (strictly data not software) n – Analysis of 1, 100 “computer related deaths” n Only 34 attributed to software

Chinook - Mull of Kintyre Was this caused by FADEC software? Chinook - Mull of Kintyre Was this caused by FADEC software?

But Don’t be Complacent n n n Many instances of “pilot error” are system But Don’t be Complacent n n n Many instances of “pilot error” are system assisted Software failures typically leave no trace Increasing software complexity and authority Can’t measure software safety (no agreement) Unreliability of commercial software Cost of safety critical software

Summary n Safety critical software a growing issue – Software-based systems are dominant source Summary n Safety critical software a growing issue – Software-based systems are dominant source of product differentiation – Starting to become a major cost driver – Starting to become the drive (drag) on product development n Can’t cancel, have to keep on spending!!! – Not major contributor to fatal accidents n Although many incidents

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Requirements Fiction n Fiction stated – Get the requirements right, and the development will Requirements Fiction n Fiction stated – Get the requirements right, and the development will be easy n Facts – Getting requirements right is difficult – Requirements are biggest source of errors – Requirements change – Errors occur at organisational boundaries

Embedded Systems Computer system embedded in larger engineering system n Requirements come from n Embedded Systems Computer system embedded in larger engineering system n Requirements come from n – “Flow down” from system – Design decisions (commitments) – Safety and reliability analyses n Derived safety requirements (DSRs) – Fault management/accommodation n As much as 80% for control applications

Almost Everything on One Picture NB Based on Parnas’ four variable model Almost Everything on One Picture NB Based on Parnas’ four variable model

Almost Everything on One Picture Almost Everything on One Picture

Almost Everything on One Picture Almost Everything on One Picture

Almost Everything on One Picture Almost Everything on One Picture

Types of Layer n Some layers have design meaning – Abstraction from computing hardware Types of Layer n Some layers have design meaning – Abstraction from computing hardware n Time in m. S from reference, or. . . – Not interrupts or bit patterns from clock hardware – The “System” HAL n “Raw” sensed values, e. g. pressure in psia – Not bit patterns from analogue to digital converters – FMAA to Application n Validated values of platform properties – May also have computational meaning n e. g. call to HAL forces scheduling action

Commitments n Development proceeds via a series of commitments – A design decision which Commitments n Development proceeds via a series of commitments – A design decision which can only be revoked at significant cost – Often associated with architectural decision or choice of component n Use of triplex redundancy, choice of pump, power supply, etc. – Commitments can be functional or physical n Most common to make physical commitments

Derived Requirements n Commitments introduce derived requirements (DRs) – Choice of pump gives DRs Derived Requirements n Commitments introduce derived requirements (DRs) – Choice of pump gives DRs for control algorithm, iteration rate, also requirements for initialisation, etc. – Also get derived safety requirements (DSRs), e. g. detection and management of sensor failure for safety

System Level Requirements n Allocated requirements – System level requirements which come from platform System Level Requirements n Allocated requirements – System level requirements which come from platform – May be (slight) modification due to design commitments, e. g. Platform – control engine thrust to within ± 0. 5% of demanded n System – control EPR or N 1 to within ± 0. 5% of demanded n

Stakeholder Requirements n Direct requirements from stakeholders, e. g. – The radar shall be Stakeholder Requirements n Direct requirements from stakeholders, e. g. – The radar shall be able to detect targets travelling up to mach 2. 5 at 200 nautical miles, with 98% probability – In principle allocated from platform n In practice often stated in system terms – Need to distinguish legitimate requirements from “soluntioneering” n Legitimacy depends on the stakeholder, e. g. CESG and cryptos

Requirements Types n Main requirements types – Invariants, e. g. n Forward and reverse Requirements Types n Main requirements types – Invariants, e. g. n Forward and reverse thrust will not be commanded at the same time – Functional transform inputs to outputs, e. g. n Thrust demand from thrust-lever resolver angle – Event response – action on event, e. g. n Active ATP on passing signal at danger – Non-functional (NFR) – constraints, e. g. n Timing, resource usage, availability

Changes to Types n Note requirements types can change – NFR to functional – Changes to Types n Note requirements types can change – NFR to functional – System – achieve < 10 -5 per hour unsafe failures – Software – detect failure modes x, y and z of the pressure sensor P 30 with 99% coverage, and mitigate by … n Requirements notations/methods must be able to reflect requirements types

Requirements Challenges n Even if systems requirements are clear, software requirements – Must deal Requirements Challenges n Even if systems requirements are clear, software requirements – Must deal with quantisation (sensors) – Must deal with temporal constraints (iteration rates, jitter) – Must deal with failures n Systems requirements often tricky – Open-loop control under failure – Incomplete understanding of physics

Requirements Errors n Project data suggests – Typically more than 70% of errors found Requirements Errors n Project data suggests – Typically more than 70% of errors found post unit test are requirements errors – F 22 (and other data sets) put requirements errors at 85% – Finding errors drives change The later they are found, the greater the cost n Some data, e. g. F 22, write 3 Lo. C for every one delivered n

The Certainty of Change %Change May verify all code 3 times! n 20% Change The Certainty of Change %Change May verify all code 3 times! n 20% Change mainly due to requirements errors The cost due – high majority of to Cumulative change presence reverification in of modules are dependencies stable Module

Requirements and Organisations n Requirements errors are often based on misinterpretations (its obvious that Requirements and Organisations n Requirements errors are often based on misinterpretations (its obvious that …) – Thus errors (more likely to) happen at organisational/cultural boundaries n Systems to software, safety to software … – Study at NASA by Robyn Lutz n 85% of requirements errors arose at organisational boundaries

Summary n Getting requirements right is a major challenge – Software is deeply embedded Summary n Getting requirements right is a major challenge – Software is deeply embedded n Discretisation, timing etc. an issue – Physics not always understood n Requirements (genuinely) change – Notion that can get requirements right is simplistic n Notion of “correct by construction” optimistic

Part 2 Fiction – get the functionality right Fiction – abstraction is the solution Part 2 Fiction – get the functionality right Fiction – abstraction is the solution Fiction – safety critical code must be “bug free” Some key messages

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Functionality Fiction n Fiction stated – Get the functionality right, and the rest is Functionality Fiction n Fiction stated – Get the functionality right, and the rest is easy n Facts – Functionality doesn’t drive design Non-Functional Requirements (NFRs) are critical n Functionality isn’t independent of NFRs n – Fault management is a major aspect of complexity

Functionality and Design n Functionality – System functions allocated to software – Elements of Functionality and Design n Functionality – System functions allocated to software – Elements of REQ which end up in SOFTREQ n NB, most of them – At software level, requirements have to allow for properties of sensors, etc. n Consider an aero engine example

Engine Pressure Block Engine Pressure Block

Engine Pressure Sensor n Aero engine measures P 0 – Atmospheric pressure – A Engine Pressure Sensor n Aero engine measures P 0 – Atmospheric pressure – A key input to fuel control, etc. n Example input P 0 Sens – Byte from A/D converter – Resolution – 1 bit 0. 055 psia – Base = 2, 0 = low (high value 16) – Update rate = 50 m. S

Pressure Sensing Example n Simple requirement – Provide validated P 0 value to other Pressure Sensing Example n Simple requirement – Provide validated P 0 value to other functions and aircraft n Output data item – P 0 Val 16 bits n Resolution – 1 bit 0. 00025 psia n Base = 0, 0 = low (high value 16. 4) n

Example Requirements n Simple functional requirement – RS 1: P 0 Val shall be Example Requirements n Simple functional requirement – RS 1: P 0 Val shall be provided within 0. 03 bar of sensed value – R 1: P 0 Val = P 0 Sens [± 0. 03] (software level) – Note: simple algorithm P 0 Val = (P 0 Sens * 0. 055 + 2)/0. 00025 P 0 Sens = 0 → P 0 Val = 8000 = 00010 1111 0100 0000 binary P 0 Sens = 1111 = 16. 025 → P 0 Val = 64100 = 1111 1010 0100 – Does R 1 meet RS 1? Does the algorithm meet R 1?

A Non-Functional Requirement n Assume duplex sensors – P 0 Sens 1 and P A Non-Functional Requirement n Assume duplex sensors – P 0 Sens 1 and P 0 Sens 2 n System level – RS 2: no single point of failure shall lead to loss of function (assume P 0 Val is covered by this requirement) This will be a safety or availability requirement n NB in practice may be different sensors wired to different channels, and cross channel comms n

Software Level NFR n Software level – R 2: If | P 0 Sens Software Level NFR n Software level – R 2: If | P 0 Sens 1 - P 0 Sens 2 | < 0. 06 then P 0 Val = (P 0 Sens 1 + P 0 Sens 2 )/2 else P 0 Val = 0 – Is R 2 a valid requirement? n In other words, have we stated the right thing? – Does R 2 satisfy RS 2?

Temporal Requirements n Timing is often an important system property – It may be Temporal Requirements n Timing is often an important system property – It may be a safety property, e. g. sequencing in weapons release n System level – RS 3: validated pressure value shall never lag sensed value by more than 100 m. S NB not uncommon to ensure quality of control

Software Level Timing n Software level requirement, assuming scheduling on 50 m. S cycles Software Level Timing n Software level requirement, assuming scheduling on 50 m. S cycles – R 3: P 0 Val (t) = P 0 Sens (t-2) [± 0. 03] – If t is quantised in units of 50 m. S, representing cycles – Is R 3 a valid requirement? – Does R 3 satisfy RS 3? NB need data on processor timing to validate

Timing and Safety n Software level – R 4: If | P 0 Sens Timing and Safety n Software level – R 4: If | P 0 Sens 1 (t) - P 0 Sens 2 (t) | < 0. 06 then P 0 Val (t+1) = (P 0 Sens 1 (t) + P 0 Sens 2 (t))/2 else if | P 0 Sens 1 (t) - P 0 Sens 1 (t-1) | < | P 0 Sens 2 (t) - P 0 Sens 2 (t-1) | then P 0 Val (t+1) = (P 0 Sens 1 (t)) else P 0 Val (t+1) = (P 0 Sens 2 (t)) – What does R 4 respond to (can you think of an RS 4)?

Requirements Validation n Is R 4 a valid requirement? – Is R 4 “safe” Requirements Validation n Is R 4 a valid requirement? – Is R 4 “safe” in the system context (assume that misleading values of P 0 could lead to a hazard, e. g. a thrust roll-back on take off) Does R 4 satisfy RS 3? n Does R 4 satisfy RS 2? n Does R 4 satisfy RS 1? n

Real Requirements n Example still somewhat simplistic – Need to store sensor state, i. Real Requirements n Example still somewhat simplistic – Need to store sensor state, i. e. knowledge of what has failed n Typically timing, safety, etc. drive the detailed design – Aspects of requirements, e. g. error bands, depend on timing of code – Requirements involve trade-offs between, say, safety and availability

Requirements and Architecture n NFRs also drive the architecture – Failure rate 10 -6 Requirements and Architecture n NFRs also drive the architecture – Failure rate 10 -6 per hour Probably just duplex (especially if fail stop) n Functions for cross comms and channel change n – Failure rate 10 -9 per hour Probably triplex or quadruplex n Changes in redundancy management n NB change in failure rate affects low level functions

Quantification n The “system level” functionality is in the minority – Typically over half Quantification n The “system level” functionality is in the minority – Typically over half is fault management – Euro. Fighter example FCS 1/3 MLo. C n Control laws 18 k. Lo. C n n Note, very hard to validate – 777 flight incident in Australia due to error in fault management, and software change

Boeing 777 Incident near Perth n Problem caused by Air Data Inertial Reference Unit Boeing 777 Incident near Perth n Problem caused by Air Data Inertial Reference Unit (ADIRU) – Software contained a latent fault which was revealed by a change June 2001 accelerometer #5 fails with erroneous high output values, ADIRU discards output values Power Cycle on ADIRU occurs each occasion aircraft electrical system is restarted Aug 2006 accelerometer #6 fails, latent software error allows use of previously failed accel #5

Summary n Functionality is important – But not the primary driver of design n Summary n Functionality is important – But not the primary driver of design n Key drivers of design – Safety and availability n Turns into fault management at software level – Timing behaviour n Functionality not independent of NFRs – Requirements change to reflect NFRs

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Abstraction Fiction stated – Careful use of abstraction will address problems of requirements etc. Abstraction Fiction stated – Careful use of abstraction will address problems of requirements etc. n Fact – Most forms of abstraction don’t work in embedded control systems n n State abstraction is of some use The devil is in the detail

Data Abstraction n Most data is simple – Boolean, integer, floating point – Complex Data Abstraction n Most data is simple – Boolean, integer, floating point – Complex data structures are rare n May exist in a maintenance subsystem (e. g. records of fault events) – Systems engineers work in low-level terms, e. g. pressures, temperatures, etc. n Hence requirements are in these terms

Control Models are Low Level Control Models are Low Level

Looseness n A key objective is to ensure that requirements are complete – Specify Looseness n A key objective is to ensure that requirements are complete – Specify behaviour under all conditions – Normal behaviour (everything working) – Fault conditions n Single faults, and combinations – Impossible conditions n So design is robust against incompletely understood requirements/environment

Despatch Requirements n Can despatch (use) system “carrying” failures – Despatch analysis based on Despatch Requirements n Can despatch (use) system “carrying” failures – Despatch analysis based on Markov model – Evaluate probability of being in nondespatchable state, e. g. only one failure from hazard – Link between safety/availability process and software design

Fault Management Logic n Fault-accommodation requirements may use four valued logic – Working, undetected, Fault Management Logic n Fault-accommodation requirements may use four valued logic – Working, undetected, and confirmed. w – Table illustrates w w “logical and” ([. ]) u u – Used for analysis u d c d d c c c

Example Implementation . w d c w w d c d d d c Example Implementation . w d c w w d c d d d c c c

State Abstraction n Some state abstraction is possible – Mainly low-level state to operational State Abstraction n Some state abstraction is possible – Mainly low-level state to operational modes n Aero engine control – Want to produce thrust proportional to demand (thrust lever angle in cockpit) – Can’t measure thrust directly – Can use various “surrogates” for thrust n Work with best value, but reversionary models

Thrust Control n Engine pressure ratio (EPR) – between atmosphere & the exhaust pressures Thrust Control n Engine pressure ratio (EPR) – between atmosphere & the exhaust pressures – Best approximation to thrust – Depends on P 0 n Low level state modelling “health” of P 0 sensor – If P 0 fails, revert to use N 1 (fan speed) – Have control modes n EPR, N 1, etc. which abstract away from details of sensor fault state

Summary n Opportunity for abstraction much more limited than in “IT” systems – Hinders Summary n Opportunity for abstraction much more limited than in “IT” systems – Hinders many classical approaches n Abstraction is of some value – Mainly state abstraction, relating low-level state information, e. g. sensor “health” to system level control modes n NB formal refinement, a la B, is helped by this, as little data refinement

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

“Bug Free” Fiction n Fiction stated – Safety critical code must be “bug free” “Bug Free” Fiction n Fiction stated – Safety critical code must be “bug free” n Facts – It is hard to correlate fault density and failure rate – <1 fault per k. Lo. C is pretty good! – Being “bug free” is unrealistic, and there is a need to “sentence” faults

Close to Fault Free? n DO 178 A Level 1 software (engine controller) – Close to Fault Free? n DO 178 A Level 1 software (engine controller) – now would be DAL A – Natural language specifications and macroassembler – Over 20, 000 hours without hazardous failure – But on version 192 (last time I knew) n Changes “trims” to reflect hardware properties

Pretty Buggy n DO 178 B Level A software (aircraft system) – Natural language, Pretty Buggy n DO 178 B Level A software (aircraft system) – Natural language, control diagrams and high level language – 118 “bugs” found in first 18 months, 20% critical – Flight incidents but no accidents – Informally “less safe” than the other example, but still flying, still no accidents

Fault Density n So far as one can get data – <1 flaw per Fault Density n So far as one can get data – <1 flaw per k. Lo. C for SC is pretty good – Commercial much worse, may be as high as 30 faults per k. Lo. C – Some “extreme” cases Space Shuttle – 0. 1 per k. Lo. C n Praxis system – 0. 04 per k. Lo. C n – But will a hazardous situation arise?

Faults and Failures n Why doesn’t software “crash” more often? – Paths miss “bugs” Faults and Failures n Why doesn’t software “crash” more often? – Paths miss “bugs” as don’t get critical data – Testing “cleans up” common paths – Also “subtle faults” which don’t cause a crash n NB IBM OS – 1/3 of failures were “ 3, 000 year events”

Pictures © 3 BP. com Commercial Software n Examples of data dependent faults? – Pictures © 3 BP. com Commercial Software n Examples of data dependent faults? – Loss of availability is acceptable – Most SCS have to operate through faults n Can’t “fail stop” – even reactor protection software needs to run circa 24 hours for heat removal

Retrospective Analysis n Retrospective analysis of US civil product for UK military use – Retrospective Analysis n Retrospective analysis of US civil product for UK military use – Analysis of over 500 k. Lo. C, in several languages – Found 23 faults per k. Lo. C, 3% safety critical – Vast majority not safety critical n NB most of the 3% related to assumptions, i. e. were requirements issues

Find and Fix n If a fault is found it may not be fixed Find and Fix n If a fault is found it may not be fixed – First it will be “sentenced” n If not critical, it probably won’t be fixed – Potentially critical faults will be analysed Can it give rise to a problem in practice? n If decide not to change, document reasons n – Note: changes may bring (unknown) faults n e. g. Boeing 777 near Perth

Perils of Change Dependency Module Perils of Change Dependency Module

Summary n Probably no safety critical software is fault free – Less than 1 Summary n Probably no safety critical software is fault free – Less than 1 fault per k. Lo. C is good – Hard to correlate fault density with failure rate (especially unsafe failures) n In practice – Sentence faults, and change if net benefit n Need to show presence of faults – To decide if need to remove them

Overview Fact – costs and distributions n Fiction – get the requirements right n Overview Fact – costs and distributions n Fiction – get the requirements right n Fiction – get the functionality right n Fiction – abstraction is the solution n Fiction – safety critical code must be “bug free” n Some key messages n

Summary of the Summaries n Safety critical software – Has a good track record Summary of the Summaries n Safety critical software – Has a good track record – Increased dependency, complexity, etc. mean that this may not continue n Much of the difficulty is in requirements – Partly a systems engineering issue – Many of the problems arise from errors in communication – Classical CS approaches limited utility

Research Directions (1) n Advances may come at architecture – Improve notations to work Research Directions (1) n Advances may come at architecture – Improve notations to work at architecture and implement via code generation – Develop approaches, e. g. good interfaces, product lines, to ease change – Focus on V&V, recognising that the aim is fault-finding n AADL an interesting development

Research Directions (2) n Advances may come at requirements – Work with systems engineering Research Directions (2) n Advances may come at requirements – Work with systems engineering notations Improve to address issues needed for software design and assessment, NB PFS n Produce better ways of mapping to architecture n Try to find ways of modularising, to bound impact of change, e. g. contracts n – Focus on V&V, e. g. simulation n Developments of Parnas/Jackson ideas?

Research Directions (3) n Work on automation, especially for V&V – Design remains creative Research Directions (3) n Work on automation, especially for V&V – Design remains creative – V&V is 50% of life-cycle cost, and can be automated – Examples include Auto-generation of test data and test oracles n Model-checking consistency/completeness n n The best way to apply “classical” CS?

Coda n Safety critical software research – Always “playing catch up” – Aspirations for Coda n Safety critical software research – Always “playing catch up” – Aspirations for applications growing fast n To be successful – Focus on “right problems”, i. e. where the difficulties arise in practice – If possible work with industry – to try to provide solutions to their problems