Скачать презентацию Efficient Mining of Iterative Patterns for Software Specification Скачать презентацию Efficient Mining of Iterative Patterns for Software Specification

9d356ff1363c0c37047480e647791b8c.ppt

  • Количество слайдов: 21

Efficient Mining of Iterative Patterns for Software Specification Discovery David Lo† Joint work with: Efficient Mining of Iterative Patterns for Software Specification Discovery David Lo† Joint work with: Siau-Cheng Khoo† and Chao Liu‡ †Prog. Lang. & Sys. Lab Dept of Comp. Science National Uni. of Singapore Current: (Sch. of Info. Systems, Singapore Management Uni. ) ‡Data Mining Group Department of Computer Science Uni. of Illinois at Urbana-Champaign Current: (Microsoft Research, Redmond) 1

Motivation o Specification: Description on what a software is supposed to behave - Locking Motivation o Specification: Description on what a software is supposed to behave - Locking Protocol [YEBBD 06]: o Existing problems in specification: Lack, incomplete and outdated specifications [LK 06, ABL 02, YEBBD 06, DSB 04, etc. ] o Cause difficulty in understanding an existing system o Contributes to high software cost – Prog. maintenance : 90% of soft. cost [E 00, CC 02] – Prog. understanding : 50% of maint. cost [S 84, CC 02] – US GDP software component: $214. 4 billion [US BEA] o Solution: Specification Discovery 2

Our Specification Discovery Approach o Analyze program execution traces o Discover patterns of program Our Specification Discovery Approach o Analyze program execution traces o Discover patterns of program behavior, e. g. : –Locking Protocol [YEBBD 06]: –Telecom. Protocol [ITU], etc. - see paper o Address unique nature of prog. traces: – Pattern is repeated across a trace – A program generates different traces – Interesting events might not occur close together 3

Need for a Novel Mining Strategy o Sequential Pattern Mining [AS 95, YHA 03, Need for a Novel Mining Strategy o Sequential Pattern Mining [AS 95, YHA 03, WH 04] - A series of events (itemsets) supported by (i. e. subsequence of) a significant number of sequences. Required Extension: Consider multiple occurrences of patterns in a sequence o Episode Mining [MTV 97, G 03] - A series of closelyoccurring events recurring frequently within a sequence Required Extension: Consider multiple sequences; Remove the restriction of events occurring close together. 4

Iterative Patterns – Semantics o A series of events supported by a significant number Iterative Patterns – Semantics o A series of events supported by a significant number of instances: - Repeated within a sequence - Across multiple sequences. o Follow the semantics of Message Seq. Chart (MSC) [ITU] and Live Seq. Chart (LSC) [DH 01]. o Describe constraints between a chart and a trace segment obeying it: - Ordering constraint [ITU, KHPLB 05] - One-to-one correspondence [KHPLB 05] 5

Iterative Patterns – Semantics Switching Sys Calling Called Party off_hook dial_tone_on dial_tone_off seizure ack Iterative Patterns – Semantics Switching Sys Calling Called Party off_hook dial_tone_on dial_tone_off seizure ack ring_tone answer connection TS 1: off_hook, seizure, ack, ring_tone, answer, ring_tone, X X connection_on TS 2: off_hook, seizure, ack, ring_tone, answer, X X connection_on X TS 3: off_hook, seizure, ack, ev 1, ring_tone, ev 1, answer, connection_on [ITU] 6

Iterative Patterns – Semantics o Given a pattern P (e 1 e 2…en), a Iterative Patterns – Semantics o Given a pattern P (e 1 e 2…en), a substring SB is an instance of P iff SB = e 1; [-e 1, …, en]*; e 2; …; [-e 1, …, en]*; en Pattern: S 1: off_hook, ring_tone, seizure, answer, ring_tone, X X connection_on S 2: off_hook, seizure, ring_tone, answer, X X X connection_on S 3: off_hook, seizure, ev 1, ring_tone, ev 1, answer, connection_on S 4: off_hook, seizure, ev 1, ring_tone, ev 1, answer, connection_on, off_hook, seizure_int, ev 2, ring_tone, ev 3, answer, connection_on 7

Mining Algorithm 8 Mining Algorithm 8

Projected Database Operations o Projected-all of Seq. DB wrt pattern P – Return: All Projected Database Operations o Projected-all of Seq. DB wrt pattern P – Return: All suffixes of sequences in Seq. DB where for each, its infix is an instance of pattern P S 2 Sequence (1, 1, 2) (1, 4, 5) (2, 1, 2) S 1 (Seq, Start, End) o Support of a pattern = size of its proj. DB all o Seq. DBev is formed by considering occurrences of ev all o Seq. DBP++ev can be formed from Seq. DBP 9

Pruning Strategies Apriori Property If a pattern P is not frequent, P++evs can not Pruning Strategies Apriori Property If a pattern P is not frequent, P++evs can not be frequent. Closed Pattern Definition: A frequent pattern P is closed if there exists no super-sequence pattern Q where: P and Q have the same support and corresponding instances Sketch of Mining Strategy 1. Depth first search 2. Cut search space of non-frequent and nonclosed patterns 10

Closure Checks and Pruning – Definitions o Prefix, Suffix Extension (PE) (SE) - An Closure Checks and Pruning – Definitions o Prefix, Suffix Extension (PE) (SE) - An event that can be added as a prefix or suffix (of length 1) to a pattern resulting in another with the same support o Infix Extension (IE) - An event that can be inserted as an infix (one or more times) to a pattern resulting in another with the same support and corresponding instances S 1 S 2 < X, A, B, B, C, D, E, F, G > S 3 Pattern: Prefix Ext: {} Suffix Ext: {} Infix Ext: {} 11

Closure Checks and Pruning – Theorems o Closure Checks: If a pattern P has Closure Checks and Pruning – Theorems o Closure Checks: If a pattern P has no (PE, IE and SE) then it is closed otherwise it is not closed o Infix. Scan Pruning Property: If a pattern P has an all IE and IE Seq. DBP, then we can stop growing P. S 1 S 2 S 3 Pattern: Prefix Ext: {} Infix Ext: {} Suffix Ext: {} is not closed and we can stop growing it. No need to check for 12

Main Method Recursive Pattern Growth Closure Checks Infix. Scan Pruning 13 Main Method Recursive Pattern Growth Closure Checks Infix. Scan Pruning 13

Performance & Case Studies 14 Performance & Case Studies 14

Performance Study o Dataset TCAS - Program traces from Siemens dataset - commonly used Performance Study o Dataset TCAS - Program traces from Siemens dataset - commonly used for benchmark in error localization 15

Case Study o JBoss App Server – Most widely used J 2 EE server Case Study o JBoss App Server – Most widely used J 2 EE server – A large, industrial program: more than 100 KLOC – Analyze and mine behavior of transaction component of JBoss App Server o Trace generation – Weave an instrumentation aspect using AOP – Run a set of test cases – Obtain 28 traces of 2551 events and an average of 91 events o Mine using min_sup set at 65% of the |Seq. DB| 29 s vs >8 hrs 16

Case Study o Post-processings & Ranking – 44 patterns o Top-ranked patterns correspond to Case Study o Post-processings & Ranking – 44 patterns o Top-ranked patterns correspond to interesting patterns of software behavior: – Top Longest Patterns – Most Observed Pattern 17

Longest Iter. Pattern from JBoss Transaction Component 18 Longest Iter. Pattern from JBoss Transaction Component 18

Conclusion o Addressing specification problem reduce software cost and save expensive resources. o Novel Conclusion o Addressing specification problem reduce software cost and save expensive resources. o Novel formulation & technique to mine closed set of iterative patterns : – Extends closed sequential pattern & episode mining – Based on the ordering and one-to-one correspondence requirement of MSC & LSC Future Work o Mining other forms of specification commonly used by software engineers: Live Sequence Charts [OOPSLA’ 07 (Poster), ASE’ 07], Linear Temporal Logic Rules [PLDI’ 07 (SRC)], etc. o Case study, improve mining speed, constraints o Other uses, post-mining step 19

Acknowledgement o Jiawei Han, UIUC o Shahar Maoz, Weizmann, Israel o Gazelle dataset, Blue Acknowledgement o Jiawei Han, UIUC o Shahar Maoz, Weizmann, Israel o Gazelle dataset, Blue Martini Software 20

Thank you for your attention Questions ? Advices ? Comments? 21 Thank you for your attention Questions ? Advices ? Comments? 21