Catching Bugs in Software Rajeev Alur Systems Design

Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania www. cis. upenn. edu/~alur/

Software Reliability q Software bugs are pervasive Bugs can be expensive Bugs can cost lives Bulk of development cost is in validation, testing, bug fixes q Old problem that just won’t go away q Many approaches and decades of research Systematic testing Programming languages technology (e. g. types) Formal methods (specification and verification) Grand challenge for computer science: Tools for designing “correct” software

software/model correctness specification Verifier Yes/proof No/bug q Correctness is formalized as a mathematical claim to be proved or falsified rigorously always with respect to the given specification q A brief history of formal verification 1. Structured programs; Hoare logic; 1969 2. Network protocols; State-space search; 1990 3. Cache coherency protocols; Symbolic search; 1995 4. Device drivers; Automated abstraction; 2001

1. Program Verification q Hoare logic formalizing correctness of structured programs (late 1960 s) q Typical examples: sorting, graph algorithms q Specification for sorting Permute(A, B): array B is a permutation of elements in array A Sorted(A): for 0<i<n, A[i]<=A[i+1] q Function sort is correct if following holds {True} B : = sort(A) {Permute(A, B)&Sorted(B)} q Provides calculus for pre/post conditions of structured programs

Sample Proof: Bubble Sort Key to proof: Bubble. Sort (A : array[1. . n] of int) { B = A : array[1. . n] of int; Finding suitable for (i=0; i<n; i++) { loop invariants Permute(A, B) Sorted(B[n-i, n]) for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for (j=0; j<n-i; j++) { Permute(A, B), Sorted(B[n-i, n], for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for 0<k<j B[k] <= B[j] if (B[j]>B[j+1]) swap(B, j, j+1) } }; return B; }

Program Verification q Powerful mathematical logic (e. g. first-order logic, Higher-order logics) needed formalization Automation extremely difficult Finding proof decomposition requires great expertise q Alive and well, but not booming q Contemporary theorem provers: HOL, PVS, ACL 2 provide decision procedures and tactics for decomposition q Main applications: Microprocessor verification, Correctness of JVM…

2. Protocol Analysis q Automated analysis of finite-state protocols Network protocols, Distributed algorithms q Great progress in the last 20 years Protocol modeled as communicating finite-state processes Correctness specified using temporal logic Verification performed automatically to reveal errors Highly optimized state-space search techniques q Model checker SPIN from Bell Labs ACM Software Systems award (2001) Success in finding high-quality bugs in real systems (NASA space shuttle, Lucent’s Pathstar switch)

Example: X. 21 Communication Protocol

State-space Explosion !! q Analysis is basically a reachability problem in a graph Nodes are states, where each state gives values of all the variables of all the communicating processes An edge represents execution of a single action of one of the processes (asynchronous communication) q Size of graph grows exponentially as the number of bits required for state encoding, but… Graph is constructed only incrementally, on-the-fly Clever hashing and state compaction techniques Many techniques for exploiting structure: symmetry, data independence, partial order reduction … Millions of states can be explored quickly to reveal bugs q Great flexibility in modeling Abstract many details, simplify Scale down parameters (buffer size, number of network nodes…)

3. Symbolic Model Checking q Constraint-based analysis of Boolean systems Cache coherency protocols, Memory controllers, … q Active in the past 12 years Symbolic Boolean representations (propositional formulas, BDDs) used to encode system dynamics Correctness specified using temporal logic CTL Fix-point computation over state sets Highly optimized memory management q Model checker SMV from CMU ACM Kannellakis Theory in Practice Award (1999) Success in finding high-quality bugs in hardware applications (VHDL/Verilog code)

Cache consistency: Gigamax Real design of a distributed multiprocessor Global bus UIC M UIC P M P Cluster bus Read-shared/read-owned/write-invalid/write-shared/… Deadlock found using SMV Similar successes: IEEE Futurebus+ standard, IBM/Intel/Motorola…

Symbolic Reachability Problem Model variables X ={x 1, … xn} Each var is of finite type, say, boolean Initialization: I(X) condition over X Update: T(X, X’) How new vars X’ are related to old vars X as a result of executing one step of the program Target set: F(X) Computational problem: Can F be satisfied starting with I by repeatedly applying T ? Graph Search problem

Symbolic Solution Data type: region to represent state-sets R: =I(X) Repeat If R intersects T report “yes” else if R contains Post(R) report “no” else R : = R union Post(R) Post(R(X))= (Exists X. R(X) and T(X, X’))[X’ -> X] Operations needed: union, intersection, test for inclusion/emptiness, projection, renaming

Binary Decision Diagrams Popular representations for Boolean functions 0 0 0 c a 0 1 0 d 1 b 1 1 1 Like a decision graph No redundant nodes No isomorphic subgraphs Variables tested in fixed order Function: (a and b) or (c and d) Key properties: Canonical! Size depends on choice of ordering of variables Operations such as union/intersection are efficient

Symbolic Search Techniques q Size of BDDs can explode during search, and is quite unpredictable Years of research leading to plethora of heuristics q Significant industrial interest In-house groups: Cadence, Synopsis, IBM, NEC… Commercial model checkers/verification consultants q Recent focus: SAT solvers Checking whether F can be reached within k steps can be formulated as a satisfiability of a propositional formula with nk variables Extremely fast solvers such as z. Chaff (from Princeton) can solve problems with 1000 vars fast ! SAT + BDD can be combined to great effects

4. Software Model Checking via Abstraction q Can we apply model checking to C programs? SPIN approach is fine for analyzing models, but constructing models is expensive, and models have no relation to code q Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction) Basic ideas around for a while, but all components put together effectively only recently by Microsoft Research team in the project SLAM Shown to be effective on Windows device drivers, Linux source code (about 10 K lines of code)

Program Abstraction int x, y; bool bx, by; if x>0 { ………… y: =x+1 ………. } else { ………… y: =x+1 ………. } if bx { ………… by: =true ………. } else { ………… by: ={true, false} ………. } Predicate Abstraction bx: x>0; by : y>0

Verification Example Does this code obey the locking spec? do { Ke. Acquire. Spin. Lock(); Rel Unlocked Acq Rel n. Packets. Old = n. Packets; Locked Acq Error Specification if(request){ request = request->Next; Ke. Release. Spin. Lock(); n. Packets++; } } while (n. Packets != n. Packets. Old); Ke. Release. Spin. Lock();

Initial Abstraction do { Ke. Acquire. Spin. Lock(); U L L if(*){ L Ke. Release. Spin. Lock(); U L U U E } } while (*); Ke. Release. Spin. Lock(); Model checking boolean program Using BDDs

Feasibility Analysis do { Ke. Acquire. Spin. Lock(); U Is error path feasible in C program? Requires theorem prover for constraint propagation L n. Packets. Old = n. Packets; L L U L U U E if(request){ request = request->Next; Ke. Release. Spin. Lock(); n. Packets++; } } while (n. Packets != n. Packets. Old); Ke. Release. Spin. Lock();

Predicate Discovery b : (n. Packets. Old == n. Packets) do { Ke. Acquire. Spin. Lock(); U Add new predicate to boolean program New techniques L n. Packets. Old = n. Packets; b = true; L L U L U U E if(request){ request = request->Next; Ke. Release. Spin. Lock(); n. Packets++; b = b ? false : *; } } while (n. Packets != n. Packets. Old); !b Ke. Release. Spin. Lock();

Revised Abstraction b : (n. Packets. Old == n. Packets) do { Ke. Acquire. Spin. Lock(); U L b = true; b L if(*){ b L b U !b U Ke. Release. Spin. Lock(); b = b ? false : *; } } while ( !b ); Ke. Release. Spin. Lock(); Model checking refined boolean program

Abstraction Based Techniques q Tools for verifying source code combine many techniques Program analysis techniques such as slicing Abstraction Model checking Refinement from counter-examples q New challenges for model checking (beyond finite-state reachability analysis) Recursion gives pushdown control Pointers, dynamic creation of objects, inheritence…. q A very active and emerging research area

Research in Formal Methods software Modeling languages Hierarchy, recursion Real-time, Hybrid Stochastic model correctness specification Bridging the gap Model extraction Model-based design: from models to code Decision procedures Algorithms engineering Automated abstraction Compositional analysis Verifier proof bug Temporal logics Automata From requirements to specs

Current Research Projects q Foundations Analysis of context-free models Stochastic hybrid systems Decision problems for timed automata q Algorithms Engineering Combining SAT, BDDs, Abstraction Symbolic solutions to games q Model-based design From hybrid automata to embedded software From state-machine models to Java card policies q Software verification for Java classes

Classical Model Checking q Both model M and specification S are regular (finite-state) M as a generator of all possible behaviors S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S) q Typical specifications (using automata or temporal logic) Safety: Always not ( both P 1 and P 2 have write-exclusive copy) Liveness: Always (if P 1 requests, eventually it gets response) q Robustness of theory of regular languages helps in many ways M can be product of several components (closure under intersection) q For liveness properties, one needs to consider automata over infinite words, but corresponding theory of omega-regular languages is well developed and well understood

Boolean Programs main() { bool y; … x = P(y); … z = P(x); … } bool P(u: bool) { … return Q(u); } bool Q(w: bool) { if … else return P(~w) } Recursive State Machines A 1 A 2 A 2 A 3 Entry-point A 3 Box (superstate) A 1 Exit-point

Model Checking of Recursive Models q Control-flow requires stack, so model M defines a context-free language q Algorithms exist for checking regular specifications against context-free models Emptiness of pushdown automata is solvable Product of a regular language and a context-free language is context-free q But, checking context-free spec against a context-free model is undecidable! Context-free languages are not closed under intersection Inclusion as well as emptiness of intersection undecidable

Are Context-free Specs Interesting? q Classical Hoare-style pre/post conditions If p holds when procedure A is invoked, q holds upon return Total correctness: every invocation of A terminates Integral part of emerging standard JML q Stack inspection properties (security/access control) If a variable x is being accessed, procedure A must be in the call stack q Above requires matching of calls with returns, or finding unmatched calls Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Caret for Context-free Specifications q Caret: Temporal Logic of Calls and Returns [AEM 03] Context-free extension of Pnueli’s Linear Temporal Logic LTL Allows specification of pre/post conditions Allows specification of stack inspection properties q Main result: Checking Caret specifications against a context-free model is decidable Polynomial in the size of the model and exponential in the size of formula (as in case of classical model checking) Proof technique: Product of pushdown model M and Caret specification S is again a pushdown automaton Key to success: The notion of calls and returns is the same for M as well as S

Caret Definition Interpreted over “structured” words in which positions are marked with calls { and returns } p’=Always(p or q) p q’ {q {r p r q’ q {p p p} p’ r q’ p’ p’ q} p p q’=Next(q) Caret provides classical temporal operators such as Next and Always

Caret Abstract Operators Abstract versions of operators jump from a call to the matching return p’=abstract-always(p or q) p’ p {q p’ p’ q’ {r q’ p r q {p p’ p q’ p’ p’ p} r q’ Sample specification: pre/post: Always( p & call -> abstract-next q ) p’ p’ q} p p q’=abstract-next(q)

Visibly Pushdown Languages [AM 03] q Subclass of context-free languages that is suitable for program analysis / algorithmic verification q Alphabet is structured: Symbols are tagged with calls and returns q A visibly pushdown automaton’s moves are constrained by input If current symbol is a call, it must push If current symbol is a return it must pop Else it can only update control state q Class of languages defined by these automata is very robust Closed under union, intersection, complement, Kleene-*. Emptiness, inclusion, equivalence decidable Alternative characterizations: Embeddings of regular tree languages, Monadic Second Order theory with a binary matching predicate q Caret is a subset of visibly pushdown languages

Synthesis of Behavioral Interfaces q Behavioral type of a class specifies the allowed sequences of method calls q Type for a file class may be (open; (read+open)*; close)* q Can we synthesize this type automatically? Given source code for the class implementation Construct a regular language over the method calls so that a particular exception is never raised q This is useful for compositional verification also: behavioral interface is a suitable abstraction of the class q Proposed route (ongoing project) Use abstraction to get a finite-state model Solve a symbolic game to get the most general strategy for invoking methods to keep the abstract model “safe” Extract interface type from the game solution

Abstract. List. Itr public Object next() { … last. Ret = cursor++; …} public Object prev() { … last. Ret = cursor; …} public void remove() { if (last. Ret==-1) throw new Illegal. Exc(); … last. Ret = -1; …} public void add(Object o) { … last. Ret = -1; …} Behavioral Interface Start next add next, prev Safe Unsafe remove, add next, prev

Game in Abstracted Program next prev From black states, Player 0 gets to choose the input method call From purple states, Player 1 gets to choose a path in the abstract program till call returns Objective for Player 0: Ensure error states (from which exception can be rasied) are avoided Winning strategy: Correct method sequence calls

Challenges q Techniques for generating finite-state abstractions q How to solve large games symbolically? In fact, a partial information game (Player 0 should choose the next method call only based on values returned so far) q How to construct an understandble behavioral type from the winning strategy? q Abstraction refinement If Player 0 does not invoke any method, exceptions can never be raised How to refine the current abstraction based on quality of current behavioral type? q Integrating all these into a working tool