Скачать презентацию Incremental Maintenance of Ontology Exploiting Association Rules Ming-Cheng Скачать презентацию Incremental Maintenance of Ontology Exploiting Association Rules Ming-Cheng

8ac96be6f81e388a39c11df6afe2eefa.ppt

  • Количество слайдов: 26

Incremental Maintenance of Ontology. Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Incremental Maintenance of Ontology. Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong Jeng 3 1, 3 Institute of Information Engineering, I-Shou University, Taiwan 2 Dept. of Comp. Sci. & Info. Eng. , National University of Kaohsiung, Taiwan August 20, 2007 ICMLC 2007, Aug. 19~22, 2007, Hong Kong 1

Outline n Introduction n Problem description n The proposed algorithm n Performance evaluation n Outline n Introduction n Problem description n The proposed algorithm n Performance evaluation n Conclusions ICMLC 2007, Aug. 19~22, 2007, Hong Kong 2

Introduction n Motivation n In general, there exist lots of semantic relationships (domain knowledge) Introduction n Motivation n In general, there exist lots of semantic relationships (domain knowledge) among items n n It is natural to incorporate domain ontology into the process of data mining to explore more innovative rules The source databases are changing over time n n E. g. , insertion, deletion, modification The discovered knowledge (rules) has to be updated to reflect new situation ICMLC 2007, Aug. 19~22, 2007, Hong Kong 3

Introduction (cont. ) n Association rules n Given: n n n A database of Introduction (cont. ) n Association rules n Given: n n n A database of customer transactions Each transaction is a set of items Find all rules X Y that correlate the presence of one set of items X with another set of items Y n Example: Sony VAIO HP Laser. Jet 1300 (Sup. 30%, Conf. 60%) ICMLC 2007, Aug. 19~22, 2007, Hong Kong 4

Introduction (cont. ) n Strong association rules n Given: n User’s specified constraints § Introduction (cont. ) n Strong association rules n Given: n User’s specified constraints § Minimum support (min_sup) § minimum confidence (min_conf) n Finding rules X Y with support and confidence larger than the user’s specified minimum values n Example: § min_sup = 25%, min_conf = 50% Sony VAIO HP Laser. Jet 1300 (Sup. 30%, Conf. 60%) ICMLC 2007, Aug. 19~22, 2007, Hong Kong 5

Introduction (cont. ) n Frequent itemsets (patterns) mining n n The association mining problem Introduction (cont. ) n Frequent itemsets (patterns) mining n n The association mining problem can be reduced to the problem of mining frequent itemsets, i. e. , itemsets with support larger than min_sup Example n min_sup = 25%, min_conf = 50% sup({Sony VAIO, HP Laser. Jet 1300}) = 30% sup({Sony VAIO}) = 50% Sony VAIO HP Laser. Jet 1300 (Sup. 30%, Conf. 60%) ICMLC 2007, Aug. 19~22, 2007, Hong Kong 6

Introduction (cont. ) n Ontology n W 3 C Web Ontology Working Group “An Introduction (cont. ) n Ontology n W 3 C Web Ontology Working Group “An ontology formally defines a common set of terms that are used to describe and represent a domain knowledge. ” n e. g. , taxonomy: a kind of ontology presenting classification relationship among objects ICMLC 2007, Aug. 19~22, 2007, Hong Kong 7

Introduction (cont. ) n Ontology-exploiting association rules IBM 60 GB HD => HP Desk. Introduction (cont. ) n Ontology-exploiting association rules IBM 60 GB HD => HP Desk. Jet ICMLC 2007, Aug. 19~22, 2007, Hong Kong 8

Problem Description n Incremental maintenance of ontology-exploiting association rules n Given: n n n Problem Description n Incremental maintenance of ontology-exploiting association rules n Given: n n n n A database of customer transactions DB An incremental database db An item ontology T Discovered frequent itemsets in DB, L minimum support, ms, and minimum confidence, mc Find all frequent itemsets in UD = DB + db w. r. t. ms Construct all strong rules from the frequent itemsets w. r. t. mc ICMLC 2007, Aug. 19~22, 2007, Hong Kong 9

Problem Description (cont. ) -- Example Customer transactions DB TID Item ontology G Purchased Problem Description (cont. ) -- Example Customer transactions DB TID Item ontology G Purchased Items 1 IBM TP, Epson EPL, Toner Cartridge 2 Sony VAIO, IBM TP, Epson EPL 3 IBM TP, HP Desk. Jet, Ink Cartridge 4 HP Desk. Jet 5 IBM TP, HP Desk. Jet, Ink Cartridge 6 Sony VAIO, Ink Cartridge minsup = 70% (algorithms AROC, AROS) Discovered frequent itemsets L L 1 Count L 2 & L 3 Count {Printer} {PC} {IBM TP} {RAM 256 MB*} {IBM 60 GB*} 5 5 4 {Printer, PC} {Printer, IBM TP} {Printer, RAM 256 MB*} {Printer, IBM 60 GB*} {RAM 256 MB*, IBM 60 GB*} {Printer, RAM 256 MB*, IBM 60 GB*} 4 4 4 ICMLC 2007, Aug. 19~22, 2007, Hong Kong 10

Problem Description (cont. ) n Example Item ontology G Customer transactions DB TID Purchased Problem Description (cont. ) n Example Item ontology G Customer transactions DB TID Purchased Items 1 IBM TP, Epson EPL, Toner Cartridge 2 Sony VAIO, IBM TP, Epson EPL 3 IBM TP, HP Desk. Jet, Ink Cartridge 4 HP Desk. Jet 5 IBM TP, HP Desk. Jet, Ink Cartridge 6 Sony VAIO, Ink Cartridge Incremental transactions db TID minsup = 70% Updated frequent itemsets L’ Items Purchased 7 Toner Cartridge 8 IBM TP, HP Desk. Jet, IBM 60 GB, Toner Cartridge 9 ? ? IBM 60 GB, Toner Cartridge ICMLC 2007, Aug. 19~22, 2007, Hong Kong 11

The Proposed Algorithm – IMARO n Basic scheme n An Apriori-based maintenance algorithm n The Proposed Algorithm – IMARO n Basic scheme n An Apriori-based maintenance algorithm n Employing a bottom-up, level-wise searching strategy n Starting from frequent 1 -itemset, L 1, then L 2, …, Lk, etc. ABCD ABC ABD ACD AB AC A AD BC B C BCD BD CD D ICMLC 2007, Aug. 19~22, 2007, Hong Kong 12

The Proposed Algorithm – IMARO (cont. ) n Terminology Notation Definition DB Original database The Proposed Algorithm – IMARO (cont. ) n Terminology Notation Definition DB Original database db Incremental database UD Updated database UD DB + db T Item ontology ED Extension of DB with extended items in T ed Extension of db with extended items in T UE Updated extended database UE ED + ed ICMLC 2007, Aug. 19~22, 2007, Hong Kong 13

The Proposed Algorithm – IMARO (cont. ) n Example ICMLC 2007, Aug. 19~22, 2007, The Proposed Algorithm – IMARO (cont. ) n Example ICMLC 2007, Aug. 19~22, 2007, Hong Kong 14

The Proposed Algorithm – IMARO (cont. ) n Note on database extension n n The Proposed Algorithm – IMARO (cont. ) n Note on database extension n n A component item may exist as a primitive item itself To clarify the meaning of associations involving such an item, we have to differentiate the role this item play e. g. , IBM TP => Ink Cartridge buy an IBM TP notebook, also buy an product composed of Ink Cartridge TID Purchased Items 5 IBM TP, HP Desk. Jet, Ink Cartridge Extended Items 5 TID Primitive Items IBM TP, HP Desk. Jet, Ink Cartridge* PC, RAM 256 MB, IBM 60 GB, Printer, Ink Cartridge ICMLC 2007, Aug. 19~22, 2007, Hong Kong 15

The Proposed Algorithm – IMARO (cont. ) n Process flow for updating frequent k-itemsets The Proposed Algorithm – IMARO (cont. ) n Process flow for updating frequent k-itemsets e. g. , AROC or AROS ICMLC 2007, Aug. 19~22, 2007, Hong Kong 16

The Proposed Algorithm – IMARO (cont. ) n Frequent/infrequent itemsets inference Conditions Results LED The Proposed Algorithm – IMARO (cont. ) n Frequent/infrequent itemsets inference Conditions Results LED Led UE Action Case freq. no 1 undetd. compare sup. UD(A) with ms 2 undetd. scan DB 3 infreq. no 4 ICMLC 2007, Aug. 19~22, 2007, Hong Kong 17

The Proposed Algorithm – IMARO (cont. ) n Optimization 1: Candidate pruning n Any The Proposed Algorithm – IMARO (cont. ) n Optimization 1: Candidate pruning n Any candidate itemset that contains both an item and anyone of its extensions (generalized item or component) is pruned. {Epson EPL, Printer} {Epson EPL, Toner Cartridge*} ICMLC 2007, Aug. 19~22, 2007, Hong Kong 18

The Proposed Algorithm – IMARO (cont. ) n Optimization 2: Extension filtering n The The Proposed Algorithm – IMARO (cont. ) n Optimization 2: Extension filtering n The extension of an item can be added only if that item does appear in at least one candidate itemset being counted currently Printer PC -- -- HP Epson Sony IBM Desk. Jet EPL VAIO TP -- -- Ink Photo Toner S RAM IBM Cartridge Conductor Cartridge 60 GB 256 MB 60 GB ICMLC 2007, Aug. 19~22, 2007, Hong Kong 19

Performance Evaluation n Compared with applying our proposed algorithms, AROC and AROS, to the Performance Evaluation n Compared with applying our proposed algorithms, AROC and AROS, to the whole database DB+db with T n Test data n A synthetic dataset generated by the IBM data generator with artificially–built ontology Parameter |DB| Default value Number of original transactions 200, 000 |t| Average size of transactions N Number of items 362 R Number of groups 30 L Number of levels 4 F Fanout 5 ICMLC 2007, Aug. 19~22, 2007, Hong Kong 20 20

Performance Evaluation (cont. ) n Varying minimum supports |db| = 40, 000 ICMLC 2007, Performance Evaluation (cont. ) n Varying minimum supports |db| = 40, 000 ICMLC 2007, Aug. 19~22, 2007, Hong Kong 21

Performance Evaluation (cont. ) n Varying incremental transaction size ms = 1. 5% ICMLC Performance Evaluation (cont. ) n Varying incremental transaction size ms = 1. 5% ICMLC 2007, Aug. 19~22, 2007, Hong Kong 22

Conclusions n We have investigated the problem of updating ontology- exploiting association rules when Conclusions n We have investigated the problem of updating ontology- exploiting association rules when new transactions are inserted into the database n An Apriori-based algorithm is proposed n Other issues n n More complicated semantic relationships and knowledge Non-uniform minimum support n Generalized item or composite item occurs more frequently n Towards a total solution for evolving environments n Ontology evolution, database update n Interactive refinement of support constraints n … ICMLC 2007, Aug. 19~22, 2007, Hong Kong 23

Thanks for your attention! ICMLC 2007, Aug. 19~22, 2007, Hong Kong 24 Thanks for your attention! ICMLC 2007, Aug. 19~22, 2007, Hong Kong 24

Conclusions (cont. ) n Taxonomy of semantic relationships *source: 1993, Veda C. Storey, VLDB Conclusions (cont. ) n Taxonomy of semantic relationships *source: 1993, Veda C. Storey, VLDB journal ICMLC 2007, Aug. 19~22, 2007, Hong Kong 25

Related Work n Comparison with previous work Contributors Model of incremental maintenance of association Related Work n Comparison with previous work Contributors Model of incremental maintenance of association rules Type of database update Type of ontology Srikant & Agrawal, 1995 none classification Han & Fu, 1995 none classification Cheung et al. , 1996 insertion classification Cheung et al. , 1997 insertion, deletion and modification none Jea et al. , 2003 none composition Chien et al. , 2005 none classification & composition ICMLC 2007, Aug. 19~22, 2007, Hong Kong 26