Verification Based on Run-Time Field-Data and Beyond Séverine

Verification Based on Run-Time, Field-Data, and Beyond Séverine Colin Laboratoire d’Informatique (LIFC) Université de Franche-Comté-CNRS-INRIA Leonardo Mariani Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo) Università di Milano Bicocca Tope Omitola Computer Laboratory University of Cambridge, UK

Outline l Traditional Run-Time Verification Techniques – l Test and Verification Techniques based on Field. Data – l l 2 checking properties on execution data at run-time gathering execution data to increase effectiveness of (offline) test and verification techniques Discussion on Test, Verification and Model-Checking Conclusions

Run-Time Verification Techniques l l l 3 Basic idea : to extract an execution trace of an executing program and to analyze it to detect errors To check classical error pattern (data races, deadlock) To verify a program against formal specification

Data races detection l l 4 Data race: two concurrent threads access a shared variable and at least one access is a write in same time Eraser tool dynamically detects data races To enforce every shared variable is protected by some lock Eraser algorithm is used by Path. Explorer, Visual Thread

Deadlock Detection l l 5 Deadlock: to occur whenever multiple shared resources are required to accomplish a task A model representation of the program is constructed during the program execution Deadlock: circularity in the dependency graph Used by Visual. Thread and Path. Explorer

Monitoring and Checking (Ma. C) l l System requirements are formalised Monitoring script is constructed: – – l 6 to instrument the code to establish a mapping from low-level information into high-level events At run-time, generated events are monitored for compliance with the requirements specification

Ma. C: Events and Conditions l l l 7 Events occur instantaneously during the system execution Conditions are information that hold for a duration of time Three-valued logic: true, false, undefined PEDL (Primitive Event Definition Language): language for monitoring scripts MEDL (Meta Event Definition Language): language for safety requirements

Path. Explorer (1/2) l l l 8 Instrumentation module (using Jtrek): it emits relevant events An interaction module: send events to observer module An observer module: it verifies the requirement specification

Path. Explorer (2/2) l l 9 Requirements are written using past LTL (Monitoring operators are added: ↑F, ↓F, [F, F)S, [F, F)w Use the recursive nature of past time temporal logic: the satisfaction relation for a formula can be calculated along the execution trace looking only one step backwards (see our paper for the algorithm)

T&V Techniques based on Field-data l l Field-data: “run-time data collected from the field” Why collecting field data for Test and Verification? – limited knowledge about the final system, l – uncertainty of the final environment l – e. g. , in the case of ubiquitous computing, pervasive computing, mobile computing, and wireless networks, it is not possible to predict in advance every possible situation dynamic environments l 10 e. g. , sw components are usually developed in isolation, assembled with third-party components and, finally, deployed in unknown environments e. g. , in the case of mobile code, self-adaptive systems and peer-to-peer systems, resources suddenly appear and disappear

Existing Approaches l Field-data has been collected for: – – Evaluating usability of an application (usability testing) Modelling usage of the system l – – Learning properties of the implementation Modelling program faults l 11 which components, modules and functionalities are used? which failures have been recognized on the target system?

Evaluating Usability l l Traditionally, data for usability testing has been gathered by running testing sessions Novel approaches: silent data-gathering systems – – – 12 Automatic Navigability Testing System (ANTS) [Rod 02] Web Variable Instrumented Program (Webvip) [VG] Gamma System [OLHL 02]

Silent Data-Gathering Systems (1/2) ANTS server Webvip Data server upload server agent communication user’s actions http: //. . . client-side agent 13 session file multimedia content script

Silent Data-Gathering Systems (2/2) Gamma 14 figure appeared in [OLHL 02]

Modelling Usage of the System (1/2) l for performing system-specific impact analysis – Law and Rothermel’s impact analysis [LR 03] l the program is instrumented to produce execution traces representing the procedure-level execution flow, e. g. , MBr. ACDr. Errrrx l – Orso et al. ’s impact analysis [OAH 03] l l l 15 the impacted set for procedure P is computed by selecting procedures that are called by P and procedures that are in the call stack when P returns entity-level instrumentation: an execution trace is a sequence of traversed entities a change c on entity e potentially affects all entities of traces containing e the impact set is given from the intersection between the potentially affected entities and the result of a forward slicing with variable used on change c as slicing criterion

Modelling Usage of the System (2/2) l Information from impact analysis can be used in regression testing – Orso et al’s regression testing [OAH 03] entity-level instrumentation l test suite T’ is initialized with all test cases contained in existing test suite T traversing the change l T’ is augmented with test cases covering uncovered impacted entities computed with Orso et al’s impact analysis technique l test suite prioritization is performed by privileging test cases covering more impacted entities l l for increasing confidence of the program – Pavlopoulou and Young’s perpetual testing [PY 99] l l 16 l normal executions are considered as tests instrumentation measures statement coverage of uncovered blocks, even in the final environment the program can be iteratively generated to reduce instrumentation

Learning Properties (1/2) l Automatic synthesis of properties/invariants – Ernst et al’s approach [ECGN 01] l l l Automatic synthesis of programs – – Many approaches from machine learning, but they learn very simple functions Lau et al’s approach [LDW 03] l 17 initially, a large set of invariants is supposed to hold over monitored variables each execution can falsify some invariants. Falsified invariants are deleted for each of true invariants is computed the probability that it “randomly holds“ if this probability is below a given threshold the invariant is accepted synthesized properties are defined by the set of accepted invariants l it is still simple, but it learns small computer programs based on accurate execution traces and programming constructs

Learning Properties (2/2) l Synthesized properties, invariants and programs can be used to – – check the implementation with respect to the specification verify safety of updates (in terms of components’ replacements) l – – 18 Ernst at al. approach has been used to verify Pre-cond, Post-cond and Inv corresponding to implemented services when replacing components [ME 03] derive test suites provide to the programmer confidence over the implementation

Test, Verification and Model-Checking (TVM) l l 19 Evolution of Testing, Model Checking, and Run-time Verification Will mention their advantages and disadvantages Mention future research agenda Conclusion

TVM l l l 20 It started with “The Software Crisis” [NATO, 1968] Led to calls for software “Engineering” [Bauer, 1968] Focus on methodology for constructing software (e. g. Structured Programming [Dijkstra, 1969]; Chief Programmer Team [Harlan Mills @ IBM, 1973])

TVM l l l 21 Higher level languages viewed as panacea (C, Java, ML, Meta-ML) Buggy software was still being produced Focus shifted to detecting and preventing mistakes during software construction --Testing

TVM - Testing l l l 22 2 main approaches to Testing: Reliability Growth Modelling (RGM) and Random Testing In RGM, program is corrected, tested, fails, corrected, tested again, goes on many times MTBF (Mean Time Between Failure) entered into a mathematical model derived from previous experiences

TVM - Testing l l 23 When the model indicates a very long MTBF, we stop testing, and ship product Pitfalls of RGM: Very tenuous (weak) link between past development processes and the current one Correction of a bug can introduce new bugs, which reduces dependability, and

TVM - Testing l l l 24 Industrial practice found you need extremely large amounts of failure-free testing Thereby not cost-effective Random Testing: test cases are selected randomly from a domain of possible inputs Advantages of Random Testing over RGM: Random, therefore non-automatable, you are more likely to find errors, and

TVM - Testing l l 25 Random testing draws on tools from information theory to analyse results Pitfalls of Random Testing: Distribution of random test cases may not be the same as real usage of system Random testing takes no account of program size, a 10 -line program treated the same as a 10000 -line program

TVM - Program Review l l 26 Buggy software was still being produced Another panacea tried was Program Review (Software Inspection) Depends on humans making the right decisions Fallible on human errors

TVM - Program Proving (Theorem Provers) l l l 27 Solution then became Formal Deductive Reasoning – Program Proving Automated Theorem Provers (e. g. Isabelle [Camb]) developed to prove programs A main problem with theorem provers is the impracticality of proving all layers of the system from software programs to hardware to circuits

TVM - Model Checking l l 28 Alternative approach to theorem provers is model checking In model checking, specification for a system is expressed in temporal logic, and the system is modelled as graph of finite state transitions, and a model checker checks whether the graph matches the temporal logic specification

TVM - Model Checking l l l 29 Advantages over theorem provers: Algorithmic, so the user need only to press a button and wait for the result while in theorem provers, a user may need to direct theorem prover to find a solution Gives counterexamples if formula is not satisfied

Model Checking l l 30 Disadvantage of model checking: Computational complexity, and Some information about the system is lost when you turn a system with an infinite number of states to a finite number There are calls for Run-Time Verification of software

TVM - Run-Time Verification (RTV) l l l 31 Some ideas of this were presented above. Observations of some RTV tools: Simply debuggers with fancy features Or they provide good tracing mechanisms Encouraging observations of RTV tools: Some use LTL (or extensions) to describe the program monitor

TVM - RTV l l 32 Some use LTL as the basis for a Property Specification Language, such as PEDL, MEDL May be used as a basis for understanding and for theory

Call to Arms - Future Research Agenda l l l 33 We need a Theory of Testing Such theory should integrate good aspects of testing, model checking, and run-time verification I shall mention some approaches (references in our paper)

Some Approaches to Theory of Testing l l 34 Type Systems/Abstract Interpretation Work from compiling and type systems directed towards optimisation of code can provide good information to direct selection of test cases Polymorphism and linearity can help Very little work so far on Semantics of Testing (encouraging work from this workshop)

Some Approaches to Theory of Testing l l l 35 Developing semantic structures (e. g. of domain) that facilitate testing may be something to look at Semantics of A. I. Planning to provide a basis for semantics of run-time verification (ref. in our paper) Domain theory in concurrency to provide semantics for distributed system testing (ref. in paper)

Conclusions l l 36 Call to arms for theory builders and tool builders Come up with good theories and better tools Provide tools for software professionals to use for system specification, design, build, test, audit, monitor systems Let’s do it !!!