Indexing Noncrashing Failures A Dynamic Program Slicing-Based Approach

Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang, Bharat K. Bhargava University of Illinois at Urbana-Champaign Purdue University Supported by NSF 0242840, 0219110 02/13/2007 1

Overview n Problem: q n Automatically cluster program failures that are due to the same bug. Solution: q By looking at the similarity between the dynamic slices of program failures. 2

Outline n Motivation n Failure Indexing in Formulation n Dynamic Slicing-Based Failure Indexing n Experiments n Conclusion 3

Automated Failure Reporting n End-users as Beta testers q q n Valuable information about failure occurrences in reality 24. 5 million/day in Redmond (if all users send) – John Dvorak, PC Magazine Widely adopted because of its usefulness q q Microsoft Windows, Linux Gentoo, Mozilla applications … Any applications can implement this functionality 4

Failure Report n Automatic reports (windows/mozilla) q q n Application name, version (e. g. , winword. exe). Module name, version (e. g. , mso. dll) Offset into module (for example, 00003 cbb). Calling context. Manual reports (bugzilla) q q Textual description of the symptoms Failure inducing input 5

After Failures Collected … n Failure triage q Failure prioritization: n n q Duplicate failure removal n q What are the most severe bugs? Worst 1% bugs = 50% failures Same failures can be reported multiple times Patch suggestion n Automatically locating the patch by querying the patch database with the reported failure 6

A Solution: Failure Indexing Cluster failure reports that may correspond to the same fault. Y + Least Severe Failure Reports + + +++ + ++ Most Severe ++ ++ + n Less Severe 0 X 7

Current Status of Failure Indexing n Great success in indexing crashing failures q q n Same crashing venues likely imply the same failure E. g. , Microsoft Dr. Watson System, Mozilla Quality Feedback Agent … Elusive: How to index noncrashing failures q q Noncrashing failures are mainly due to semantic bugs Hard to index because crashing contexts not available anymore 8

Noncrashing Failures n Examples. q q q n Unwanted dialogs. Undesired visual outputs, e. g. colors, layouts. Periodical loss of focus. Periodical loss of connection. Abnormal memory consumption. Abnormal performance. Caused by semantic bugs. 9

Semantic Bugs Dominate Others Memory-related Bugs: • Many are detectable Concurrency bugs Semantic Bugs: • Application specific • Only few are detectable • Mostly require annotations or specifications Bug Distribution [Li et al. , ICSE’ 07] • 264 bugs in Mozilla and 98 bugs in Apache manually checked • 29, 000 bugs in Bugzilla automatically checked Courtesy of Zhenmin Li 10

Existing Approaches to Indexing Noncrashing Failures n T-Proximity [Podgurski et al. , ICSE 2003] q q n Failures exhibiting similar behaviors (e. g. , similar branchings) are indexed together Entire execution is considered R-Proximity [Liu and Han, FSE 2006] q q Failures likely due to the same bug are indexed together Bug location for each failure is automatically found through statistical debugging tool SOBER [Liu et al. , FSE 2005] 11

Comments on Existing Approaches n n Ideal Solution (possible through manual effort) q Index by root causes (i. e. , the exact fault location) q Finding root causes for every failure is exactly what failure indexing wants to circumvent T-Proximity q Indexing based on the entire execution q But usually only a small part of an execution is failure-relevant R-Proximity q Indexing by likely fault location – failure-relevant q Better quality than T-Proximity, but requires a set of passing executions to find the likely fault location Theme of this paper q Can we index noncrashing failures as effectively as RProximity without any successful executions? 12

Outline n Motivation n Failure Indexing in Formulation n Dynamic Slicing-Based Failure Indexing n Experiments n Conclusion 13

Failure Indexing in Formulation n A failure indexing technique is a function pair q q n : Signature function that represents a failing execution in certain ways : Distance function that calculates the dissimilarity between two failure signatures Indexing result q q A proximity matrix where the (i, j) cell is the dissimilarity between failure and , i. e. , Failures and are indexed together if is small 14

Metrics for Indexing Effectiveness n n No quantitative metric for indexing effectiveness exists Indexing effectiveness q q n Cohesion: To what extent failures due to the same bug are close to each other Separation: To what extent failures due to different bug are separated from each other Silhouette coefficient q q q A measure adapted from data mining A value ranges from -1 to 1, the higher the better More details in paper (Section 2. 2) 15

Outline n Motivation n Failure Indexing in Formulation n Dynamic Slicing-Based Failure Indexing n Experiments n Conclusion 16

Dynamic Slicing-Based Failure Indexing n Dynamic slicing as the failure signature function 17

Dynamic Slicing …… n Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988] 10. A = …. . . 20. B = …… 30. P = 31. If (P<0) {. . . 35. FS (A@40) = {10, 35, 40} A=A+1 36. } 37. B=B+1 …… 40. Error(A) 18

Data Slicing …… n n Full dynamic slice (FS) is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988] Data slice (DS): only data dependence is considered. 10. A = …. . . 20. B = …… 30. P = 31. If (P<0) {. . . 35. A=A+1 36. } 37. B=B+1 DS (A@40) = {10, 35, 40} …… 40. Error(A) 19

Distance between Dynamic Slices n For any two non-empty dynamic slices and of the same program, the distance between them is 20

Outline n Motivation n Failure Indexing in Formulation n Dynamic Slicing-Based Failure Indexing n Experiments n Conclusion 21

Experiment Result n Experiment setup q q q Benchmark (gzip 1. 2. 3) obtained from the Software-artifact Infrastructure Repository (SIR from Nebraska Lincoln), together with a test suite 6, 184 lines of C code Ground-truth determination group 1 group 2 group 1 &2 22

Two Semantic Bugs in Gzip-1. 2. 3 deflate. c /*Fault 1*/ /*Fault 2*/ n Ground Truth: q 217 input test cases (executions) in total q 82 cases fail due to both faults, no crashes q 65 fail due to Fault 1, 17 fail due to Fault 2 23

Indexing Result Proximity Graph(PG): the axes are meaningless, if two. R-Proximity is the most objects are distant in n the effective are distant in PG, they q Expected because it their original space uses information from both passing and failing executions n T-Proximity is the worst q n FS-Proximity and DSproximity q q n n Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Expected because it essentially indexes the entire execution, rather than the failure relevant part More effective than TProximity because indexing on failurerelevant information Less effective than RProximity because of no access to passing executions 24

Indexing Result- A Closer Look (1) n n n Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Data slices can precisely capture the error propagation mechanism of Fault two. 25

Indexing Result- A Closer Look (2) n n n Red crosses are for failures due to Fault 1 Blue circles are for failures due to Fault 2 Data slices can precisely capture the two different error propagation mechanisms of Fault 1 26

Observations n n n Dynamic slicing based failure proximity is more effective than T-Proximity DS-Proximity is more accurate than FSProximity DS-Proximity is able to produce more cohesive individual clusters. q However, clusters belong to the same bug may be distant due to the different error propagations. n n Not as good as R-Proximity But does not require passing reports. 27

Outline n Motivation n Failure Indexing in Formulation n Dynamic Slicing-Based Failure Indexing n Experiments n Conclusion 28

Conclusions n Indexing noncrashing failures q q n Dynamic slicing-based failure indexing q n An increasingly important question as crashing failures are tackled more and more nicely Not intensively studied yet Effective and does not rely on passing executions A framework to develop and evaluate more indexing techniques q q Decomposition of an indexing technique into signature function and distance function – Many instantiations Quantitative evaluation metrics for scientific study 29

Further discussion, contact chaoliu@uius. edu xyzhang@cs. purdue. edu 30