eb97b9c0e5a6d7596c19248181eeb287.ppt
- Количество слайдов: 13
Consensus Group Stable Feature Selection Steven Loscalzo Dept. of Computer Science Binghamton University June. The, 15 th ACM 30 th 2009 Lei Yu Dept. of Computer Science Binghamton University Loscalzo, Yu, Ding Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington SIGKDD Conference. Group Stable Feature. Discovery and Data Mining Consensus on Knowledge Selection
Overview • Background and motivation • Propose Consensus Feature Group Framework • • Finding Consensus Groups Feature Selection from Consensus Groups • Experimental Study • Conclusion June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 2
Feature Selection Stability Sampling Model Building Acc % Feature Selection Sample 1 All Training Data F={f 2, f 5} 92% F’={f 4, f 10} 91% F’’={f 5, f 11} 93% Sample 2 … Sample k June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 3
Motivation • Need for stable feature selection • • Give confidence to lab tests Uncover “truly” relevant information • Utility of feature groups • • Model feature interaction Lack information about a single feature, another in the group may be well studied June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 4
Dense Feature Group Framework • Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] • Dense Group Stable Feature Selection Framework • • • Map features as points in sample space Apply kernel density estimation locate dense feature groups Select top relevant groups from dense groups • Limitations of this framework • • Unreliable density estimation in high-dimensional spaces Restricts selection of relevant groups to dense groups June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 5
Consensus Feature Group Framework • Consensus feature groups are ensemble of feature grouping results • Select relevant groups from whole spectrum of consensus groups • Challenges • • Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] Aggregate feature grouping results June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 6
Group Aggregation Data sub-sample • 3 aggregation ideas: Feature Group Results • • f 1 f 2 f 3 f 4 f 5 2 • 1 f 2 f 1 f 3 f 4 f 5 3 f 1 f 2 f 4 f 3 Heuristics (reference set) Cluster based [Fern, Brodley, ICML-03] Instance based [Fern, Brodley, ICML-03] f 4 f 5 Consensus Feature Groups f 2 f 1 June 30 th, 2009 f 5 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection f 3 7
The CGS Algorithm D D 1 Result Grouping 1. . . … CGS: The Consensus Group Stable Feature Selection Algorithm Dt Result Grouping t. . . Measure Instance Co-occurrence Hierarchical Clustering for i = 1 to t do Construct Training Partition Di from D Run DGF on Di for every pair of features Xi and Xj in D Update Wi, j : = freq. Xi and Xj appear together in results create consensus groups CG 1, CG 2, …, CGL via hierarchical clustering of all features based on Wi, j for i = 1 to L do Obtain a representative feature Xi from CGi Consensus Feature Groups June 30 th, 2009 . . . Measure relevance of Xi set as relevance of CGi Rank CG 1, CG 2, …, CGL and return the top k Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 8
Experimental Setup Setting • Used 10 random shuffles of data: Data Set # Genes # Samples # Classes • • 62 2 7129 72 2 Lung • Results shown are averages across 10 folds x 10 shuffles 2000 Leukemia 10 fold cross validation 9/10 folds training 1/10 folds testing Colon 12533 181 2 Prostate 6034 102 2 Lymphoma 4026 62 3 SRBCT • 2308 63 4 Algorithms CGS – sub-samples t = 10 DRAGS [Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 9
Stability Selected Groups June 30 th, 2009 Stability Selected Features Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 10
Accuracy Results June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 11
Conclusion • Proposed consensus group stable feature selection framework • • Stable Accurate • Future directions • • Apply different ensemble techniques Incorporate new group finding algorithms June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 12
References Fern, X. Z. , and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20 th Conference on Machine Learning (ICML-03). 186 -192, 2003. Guyon, I. , Weston, J. , Barnhill, S. , Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML 02); 46: 389– 422, 2002. Yu, L. , Ding, C. , and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14 th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08). 803 -811, 2008. June 30 th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 13
eb97b9c0e5a6d7596c19248181eeb287.ppt