786d34de227dbabc3477a5cf19fb3040.ppt
- Количество слайдов: 42
Intelligent automatic test pattern generation for C-based HW/SW co-design descriptions through combined use of concrete and symbolic simulations Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Background r In high-level So. C design, system behavior can be described in C-like programming languages r r Target both hardware and software Tool support is not sufficient r Difficulties compared with RTL or lower design descriptions r r r Many wide-bit word-level signals (large exploration space) Complicated control flow (many paths) Difficulty in modeling various descriptions r r r Our goal is to assist test case generation for system-level descriptions in C-like languages r Automatic input pattern generation r r 2 SW: pointers, pointer-arithmetic, casting, dynamic allocation, recursive calls… HW: concurrency, synchronization, throughput, latency… Assertion-based verification to find bugs For higher code coverage that results in higher confidence
Most important issues in debugging r Generally speaking, counter examples generated by simulation/emulation are very “long” r r Could be billions of cycles Not east at all to understand why error occurs Need much shorter counter examples just to understand why the bug happens Are those long sequences really necessary ? Initial state Bug State space r State space Bounded model checking is based on assertions with “constraints” r r 3 Initial state There can be more direct. Bug Loops can be skipped path Bounds cannot be large Can we drive good constraints from the counter examples found in simulation/emulation ?
Target language r Spec. C = ANSI-C + mechanisms for HW r r Structural hierarchy Parallelism Behavior Ports r Synchronization r Channel p 1 Channel c 1 Interfaces p 2 B v 1 r Languages discussed here r r C language Some additional features b 1 Child behaviors 4 b 2 Variable (wire)
Outline Background r Problem definitions for input pattern generation r Preliminaries r r r branch / path / coverage definitions Concrete/symbolic hybrid simulation r r Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage r Implementation r Experimental Results r Conclusion and Future work r 5
Requirements for input pattern generation (1) r For assertion failure detection r Given a design description annotated with r Input variable definitions r Assumption for input variables as predicates r Assertion predicates r Possible result r Assertion violation (and input value assignments), r Assertion holds for all possible input values, r Unknown int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; } 6 int x, y; FL_INPUT(x); FL_INPUT(y); FL_ASSUME(x >= 0); FL_ASSUME(y >= 0); FL_ASSERT(func(x, y) > 0); Assertion failure Counter examples exist: (x = 0, y = 0) (x = 3, y = 3). . .
Requirements for input pattern generation (2) r For branch coverage: r Given design description with annotations and target branch coverage r Generate set of test cases (input value assignments) to cover branches r Tell how to activate code fragments as many as possible (over multiple runs) int x, y; FL_INPUT(x); FL_INPUT(y); if (x > 2) { } if (y > 2) { } 7 Test cases of (1) (x = 0, y = 0) (2) (x = 3, y = 3) will achieve 100% branch coverage
Outline Background r Problem definitions for input pattern generation r Preliminaries r r r branch / path / coverage definitions Concrete/symbolic hybrid simulation r r Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage r Implementation r Experimental Results r Conclusion and Future work r 8
Branch / path definitions r. A (pair of) conditional branch(es): r Associated with if, do-while, for, switch-case, and while statements r A branch is covered when the associated condition has been evaluated as true (or false) at least once (over multiple runs) if (cond) then BC = cond 9 else BC = ! cond
Branch / path definitions r. A path is a sequence of branches taken r. A 1: 2: 3: 4: 5: 6: 7: 8: path condition is defined as the conjunction of all the branch conditions taken r A false (infeasible) path is a path such that there is no value assignment which satisfies the path condition void func(int x, int y) { if (x > 2) { } else { } if (y > 2) { } else { } } 1: 2: 3: 4: 5: 6: void func(int x, int y) { if (x > 2) { } if (x < 2) { } } There appear to be 4 paths; There are 4 paths; The path condition is (x > 2) AND NOT(y > 2) 10 But the path condition is (x > 2) AND (x < 2) INFEASIBLE!
Branch / path coverage definitions r Branch coverage r r # of branches covered out of # of all branches Path coverage r r # of paths covered out of # of all (or feasible) paths Difficult to use in practice because: r The number of feasible paths cannot be known so easily r The number of possible paths can be huge r Exponential w. r. t. # of if-statements * loop iterations if if 11 Exercised 2 runs: branch coverage: 4 / (2 + 2) (100%) path coverage: 2 / (2 * 2) (50%)
Outline Background r Problem definitions for input pattern generation r Preliminaries r r r branch / path / coverage definitions Concrete/symbolic hybrid simulation r r Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage r Implementation r Experimental Results r Conclusion and Future work r 12
Traditional (concrete) simulation approach r Create test cases (input values) by hand r r Or, generate randomly r r r Very simple, but how long does it take to hit the failure? Incomplete: cannot prove the assertion ALWAYS holds r r Automated, but maybe difficult to activate the corner cases In system level descriptions, the search space can be huge (e. g. 32 bit word level signals) Run simulation r r Not so easy unless all possible values have been exercised (not practically possible) Confidence (quality of tests): given by coverage metrics E. g. Branch-coverage Try (x=3, y=100) => r=97 > 0 OK Try (x=1, y=20) => r=19 > 0 OK. . . Try (x=10, y=10) => r=0 > 0 NG! (may eventually happen, but much rarely) r 13
Formal approach Build the formal expressions and mathematically solve the constraints r Precise & Complete r Computationally expensive r Word-level approach: Symbolic simulation r Evaluates values as symbolic expressions instead of concrete values r 14
Symbolic Simulation r Needs to enumerate all the paths r Sometimes the path can be infeasible (falsepath problem) path-condition Path 1 int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; path 2 } Enumerates possible paths (including infeasible ones) Path 1: (r_1=0) (x – y > 0) (r_2=x - y) (x>=0) (y>=0) -> (r_2>0) VALID for all x, y 15 Path 2: (r_1=0) NOT(x – y > 0) (r_2=y -x) (x>=0) (y>=0) -> (r_2>0) INVALID Counter Example: (y - x=0) (some of them may be reported)
Symbolic simulation (cont’d) r Employs SMT (satisfiability modulo theory) solver To solve path conditions r To evaluate assertions r r For each path: r r One symbolic simulation on a path corresponds to concrete simulations of all possible values on that path Limitations: # of paths (including false paths) r Size of symbolic expressions r Solver capability (non-linear algebra) r How to model complicated descriptions r r 16 May not be applied straightforwardly to complex / large descriptions
Concrete-symbolic hybrid approach Combines concrete simulation and symbolic simulation (originally proposed by Larson[5]) r CUTE[11] is proposed for unit testing r r r Exhaustive traversal on all paths Concrete run guides the path for symbolic simulation (initially random simulation) Symbolic run on that path derives the path-condition Use concrete values for approximation if the constraints cannot be processed (e. g. non-linear) Solve the constraints to guide the path to another r Negate 17 some path-condition term to take another branch
Concolic Simulation (1 st) initially random 1: void test(int x, int y, int z) { 2: if (x > 3) // B 1 3: if (y > 11) // B 2 4: if (z == y*y) // B 3 5: if (x < 5) // B 4 6: reach_me(); 7: } Concrete States x=0 y=0 z=0 (0 > 3)? -> no! Find the inputs to reach_me() 18 Symbolic States x=i 1 y=i 2 z=i 3 (i 1 > 3)? Path Condition (i 1 <= 3) Negate this condition And solve to take THEN branch at B 1
Concolic Simulation (2 nd) 1: void test(int x, int y, int z) { 2: if (x > 3) // B 1 3: if (y > 11) // B 2 4: if (z == y*y) // B 3 5: if (x < 5) // B 4 6: reach_me(); 7: } Concrete States x=10 y=0 z=0 (10 > 3) (0 > 11)? -> no! Find the inputs to reach_me() 19 Symbolic States x=i 1 y=i 2 z=i 3 (x > 3) (y <= 11) Path Condition (i 1 > 3) (i 2 <= 11) Negate this condition And solve to take THEN branch at B 2
Concolic Simulation (3 rd) 1: void test(int x, int y, int z) { 2: if (x > 3) // B 1 3: if (y > 11) // B 2 4: if (z == y*y) // B 3 5: if (x < 5) // B 4 6: reach_me(); 7: } Concrete States x=10 y=20 z=0 (10 > 3) (20 > 11) (0 == 400)? -> no! Find the inputs to reach_me() 20 Symbolic States x=i 1 y=i 2 z=i 3 (x > 3) (y > 11) (z == y*y) Path Condition (i 1 > 3) (i 2 > 11) (i 3 != 400) Non-linear i 2*i 2 is replaced by 400. Negate this condition And solve to take THEN branch at B 3
Concolic Simulation (4 th) 1: void test(int x, int y, int z) { 2: if (x > 3) // B 1 3: if (y > 11) // B 2 4: if (z == y*y) // B 3 5: if (x < 5) // B 4 6: reach_me(); 7: } Concrete States x=10 y=20 z=400 (10 > 3) (20 > 11) (400 == 400) (10 < 5)? -> no! Find the inputs to reach_me() 21 Symbolic States x=i 1 y=i 2 z=i 3 (x > 3) (y > 11) (z == 400) (x >= 5) Path Condition (i 1 (i 2 (i 3 (i 1 > 3) > 11) == 400) >= 5) Negate this condition And solve to take THEN branch at B 4
Concolic Simulation (5 th) 1: void test(int x, int y, int z) { 2: if (x > 3) // B 1 3: if (y > 11) // B 2 4: if (z == y*y) // B 3 5: if (x < 5) // B 4 6: reach_me(); 7: } Concrete States x=4 y=20 z=400 (4 > 3) (20 > 11) (400 == 400) (4 < 5) Symbolic States x=i 1 y=i 2 z=i 3 (x > 3) (y > 11) (z == 400) (x < 5) Path Condition (i 1 (i 2 (i 3 (i 1 > 3) > 11) == 400) < 5) Find the inputs to reach_me() Reached successfully! 22
Concolic approach r Can be applied to work-around non-linear r Can be used to enumerate the paths r Good r Can for path coverage be used to guide the path r But CUTE does not think about which path should be tried next r As r May 23 CUTE’s strategy is exhaustive not terminate if # of paths is huge
Outline Background r Problem definitions for input pattern generation r Preliminaries r r r branch / path / coverage definitions Concrete/symbolic hybrid simulation r r Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage r Implementation r Experimental Results r Conclusion and Future work r 24
Proposed method r Flip a branch condition on a path only when not covered yet r Gives the priority for path enumeration r Skips the uncovered paths that do not contribute to the branch coverage r Terminates when the target coverage is achieved r Tries r Not guaranteed to cover all possible branches r r r 25 to avoid enumerating all the paths Derived alternative paths may not be feasible Worst case: all paths need to be enumerated Also limited by the solver’s capability (i. e. path condition may not be solved)
Our implementation r Implemented on FLEC (our C-Equivalence Checker) r r r Used as Spec. C[3] frontend Control/data/communication/… dependencies have been extracted AST interpreter r Evaluates AST node (expression / statement) one by one r r r r For alternative path For assertion failure SMT solver: CVC 3[12] r r 26 Concrete simulator evaluates with concrete values Symbolic simulator evaluates with symbolic expressions Branch/Path coverage profiler Input pattern generator r r C. f. CUTE: instrument & compile We can start from any points in the program ! r To generate input patterns To evaluate assertions C. f. CUTE: lpsolve
Outline Background r Problem definitions for input pattern generation r Preliminaries r r r branch / path / coverage definitions Concrete/symbolic hybrid simulation r r Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage r Implementation r Experimental Results r Conclusion and Future work r 27
Experimental results (1/3) 1: int func(int x, int y) { 2: int r = 0; 3: if (x – y > 0) 4: r = x – y; 5: else 6: r = y – x; 7: return r; 8: } 9: void main() { 10: int x, y; 11: FL_INPUT(x); 12: FL_INPUT(y); 13: FL_ASSUME(x >= 0); 14: FL_ASSUME(y >= 0); 15: FL_ASSERT(func(x, y) > 0); 16: } 28 r Simple example r Achieved 2 / 2 (100%) branch coverage with 2 runs r Detected assertion failure with (x=0, y=0)
Experimental results (2/3) 1: unsigned int fact_rec(unsigned int s) { 2: if ( s <= 1) { 3: return 1; 4: } else { 5: unsigned int t; 6: unsigned int p; 7: t = s * fact_rec(s – 1); 8: return t; 9: } 10: unsigned int fact_for(unsigned int s) { 11: unsigned int i; 12: unsigned int p; 13: p = 1; 14: for (i = 1; i <= s; i++) { 15: p *= I; 16: } 17: return p; 18: 29 } 19: 20: 21: 22: 23: 24: 25: 26: r void main() { int i, o 1, o 2; FL_INPUT(i); FL_ASSUME(i <= 10); o 1 = fact_for(i); o 2 = fact_rec(i); FL_ASSERT(o 1 == o 2); } Calculate factorial with two implementations r r r With recursive function calls With for-loop Validated for one path (i = 8) r Achieved 4/4 (100%) branch coverage with 1 run
Experimental results (3/3) 1: int f(int x, int y, int z) { 2: int p; 3: if (x+y+z == 6) 4: if (2*x+7*y+3*z==25) 5: if(-4*x-2*y+2*z==-2) 6: FL_ASSERT(0); 7: for (p = 0; p < 100; p++) { 8: if (p == z) { 9: } 10: } 11: } 12: void main() { 13: int x, y, z; 14: FL_INPUT(x); 15: FL_INPUT(y); 16: FL_INPUT(z); 17: f(x, y, z); 18: } 30 r r r # of branches: 10 # of paths: 4 * 2^100 Achieved 10 / 10 (100%) branch coverage with 5 runs Detected assertion failure with (x=1, y=2, z=3) CUTE got stuck due to too many paths
Elevator controller profile r Elevator controller (abstracted model) r r Cycle-based behavior Simple, but designed by real engineer r r Inputs: r 3 Floors r r 1 F 2 F open 3 buttons for floor stop request 2 buttons for door open / close Outputs: r r r 31 Up request buttons on 1 F and 2 F Down request buttons on 2 F and 3 F 1 Cabin r r 3 F There is a not-intended bug r Up, Down request status Floor stop request status Door open/close Cabin vertical speed (0: stopped, +1: up, -1: down) Cabin position (on 1 F, b/w 1 F and 2 F, on 2 F, b/w 2 F and 3 F, on 3 F) Service direction (0: none, +1: up, -1: down) 3 F 2 F close 1 F
Elevator controller profile (cont’d) r State variables: r r r r r Up/Down request status (2+2) Floor stop request status (3) Door status (1) Cabin position (on 1 F, b/w 1 F and 2 F, on 2 F, b/w 2 F and 3 F, on 3 F) Cabin speed (0: stopped, +1: up, -1: down) Service direction (0: none, +1: up, -1: down) 2^8 * 5 * 3 = 11. 5 k states (including infeasible ones) Initially stopped on 1 F, door closed, no request active Original code: 396 lines in Spec. C r r 145 million paths (including infeasible) Replaced if-then-else & switch-case statements with conditional (cond ? True : false) expressions r r r 32 To handle multiple paths at once Simple control flow (straight line), but very complex data flow Reduced to 155 lines
Elevator controller profile (cont’d) r Property examples r Elevator must be on or between 1 F and 3 F r ASSERT((out_position >= 0) && (out_position <= 4)); r Door opens only when the elevator is stopped on either of 1 F, 2 F and 3 F r ASSERT (!out_door || ( (out_speed == 0) && ( (out_position == 0) || (out_position ==2) || (out_position == 4)))) 33
Symbolic simulation result r Symbolic expression explodes in 3 -4 cycles of symbolic simulation r r r With constant propagation/substitution With simplifications for ITE, AND, OR, and other operators Without concrete-value substitution (approximation) Without common sub-expression sharing # of cycles of symbolic simulation must be highly bounded! Beginning of Symbolic simulation Reset sequence 34 300 k nodes and more!
User guided simulation r Starts symbolic simulation from the specified state by the user r Explore with respect to the states of user’s interest r Some of the states (proved to be) reachable by concrete (random) simulation r Jump into the states (which may or may not be feasible) r Will need to check its feasibility later Cycle is bounded Concrete simulation Symbolic State space simulation Initial states 35 Paths unknown Symbolic simulation Might be infeasible
User guided result (1) r Try to generate the input pattern to make a situation where Located on 2 F r Speed = -1 (down) r r (not a bug) I. e. to violate ASSERT (!((out_speed == -1) && (out_position == 2))) r This state is out of bound from the initial state (stopped on 1 F) r r 36 Need more than 3 cycles for elevator to accept request on 1 F, start moving, go up at least to 2 F, and go down…
User guided result (1) (cont’d) r So let’s jump in to one of the feasible state_position = 4, state_door = false, state_speed = 0 … r Known as a reachable state by random simulation a priori r r Found one of the input pattern to violate the assertion @ cycle 5 (3 rd cycle of symbolic sim. ) r r r 37 Up request on 1 F @ cycle 1 = true Up request on 2 F @ cycle 1 = false Down request on 2 F @ cycle 1 = false Stop on 1 F request @ cycle 1 = false Stop on 2 F request @ cycle 1 = false
User guided result (2) r Try to violate the assertion r Elevator must be on or between 1 F and 3 F r ASSERT((out_position r >= 0) && (out_position <= 4)); Let’s jump into one of the state r r r state_position = 4 (on 3 F) state_speed = +1 (up) next state goes into r out_position = 5 (higher than 3 F!) r And violates the assertion! r However, the state (state_position = 4, state_speed = +1) is actually infeasible r r 38 Wrong assumption may lead a wrong conclusion The feasibility of the originating state should be verified in some way
Conclusion & Future work r Conclusion r r r Implemented concrete/symbolic hybrid simulator based on AST interpreter Proposed a method for input pattern generation for branch coverage Experimental results demonstrate the input pattern generation r For assertion failure detection r For better branch coverage r Future work r r 39 Capability to cover the specified target branch Handling of concurrent executions Hybrid simulation heuristic tuning Efficient management of symbolic expressions
References r r 40 [3] D. D. Gajski, J. Zhu, R. Domer, A. Gerstlauer, and S. Zhao. Spec. C: Specification Language and Methodology. Kluwer Academic Publishers, 2000. [5] E. Larson and T. Austin. High coverage detection of input-related security facults. In SSYM’ 03: Proc of 12 th conf on USENIX Security Symbosium, 2003. [11] K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for c. In Proc. Of Esec/SIGSOFT FSE-13, 2005. [12] A. Stump, C. Barrett, and D. Dill. CVC: a cooperating validity checker. In 14 th int’l conf on computer-aided verification, 2002
Difficulty compared with RTL or lower r r 41 In traditional methodology for RTL or gate-level r Word signals are converted into bit-vector r Then, solved with Boolean algebra r Efficient algorithms available: SAT, BDDs… In system-level descriptions r Too many word signals, too wide words (32 bit / 64 bit) r Too wide space to explore r Complicated control-flow r Data-flow dynamically changes depending on the path r Control-conditions are complex r Too many paths
Difficulty compared with RTL or lower (cont’d) r In system-level descriptions r To model software r Recursive calls, pointer-arithmetic, typecasting, dynamic-allocations… r To model hardware r Concurrency, synchronization, throughput, latency… r As word-level solvers, SMT solvers can be employed, but with limited capability r Usually up to linear algebra r Need approximation / workaround, otherwise it would not work! 42
786d34de227dbabc3477a5cf19fb3040.ppt