1e70bda020f415e618b3a3e6f3122fcb.ppt
- Количество слайдов: 20
Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu ICDM 2004 Business Meeting 11/4/2004 1
Data Mining on ICDM Submission Data n 38 countries, 445 Submissions Regular Papers: 39 (9%) Short Papers: 66 (14. 8%) n High Acceptance Ratio (Regular) n n – Germany: – Finland: – USA: ICDM 2004 Business Meeting 11/4/2004 4/15 (26. 7%) 2/ 9 (22. 2%) 20/109 (18. 3%) 2
Country Regular Short Total Ratio USA 28 109 44. 0% China 3 4 55 12. 7% UK 1 6 39 17. 9% Japan 0 5 28 17. 9% Canada 3 3 25 24. 0% Taiwan 0 1 18 5. 6% Australia 2 1 17 17. 6% Germany 4 5 15 60. 0% France 0 2 14 14. 3% India 1 0 14 7. 1% Singapore 0 3 12 25. 0% Brazil 0 1 12 8. 3% Italy 2 1 10 30. 0% Finland 2 1 9 33. 3% Spain 0 1 7 14. 3% Hong. Kong Country 20 1 1 6 33. 3% 39 63 390 26. 2% 39 66 445 23. 8% Top 15 Total ICDM 2004 Business Meeting 11/4/2004 3
Data Mining on ICDM Submission Data n Top 5 Areas of Submissions: – Data mining applications – Data mining and machine learning algorithms and methods – Mining text and semi-structured data, and mining temporal, spatial and multimedia data – Data pre-processing, data reduction, feature selection and feature transformation – Soft computing and uncertainty management for data mining n High Acceptance Ratio Areas (Regular+Short) – Quality assessment and interestingness metrics of data mining results 5/10 50. 0% – Data pre-processing, data reduction, feature selection and feature transformation 14/35 40. 0% – Complexity, efficiency, and scalability issues in data mining 4/11 36. 4% ICDM 2004 Business Meeting 11/4/2004 4
Regul ar Short Total Ratio Data mining applications 4 10 84 16. 7% Data mining and machine learning algorithms and methods 9 20 81 35. 8% Mining text and semi-structured data, and mining temporal, spatial and multimedia data 3 8 44 25. 0% Data pre-processing, data reduction, feature selection and feature transformation 7 7 35 40. 0% 3 34 8. 8% Topics Soft computing and uncertainty management for data mining Foundations of data mining 2 1 26 11. 5% Mining data streams 3 4 25 28. 0% 1 16 6. 3% Human-machine interaction and visual data mining Security, privacy and social impact of data mining 2 1 15 20. 0% Data and knowledge representation for data mining 1 1 12 16. 7% 1 11 9. 1% Pattern recognition and trend analysis Complexity, efficiency, and scalability issues in data mining 2 2 11 36. 4% Quality assessment and interestingness metrics of data mining results 2 3 10 50. 0% Statistics and probability in large-scale data mining 1 9 11. 1% Integration of data warehousing, OLAP and data mining 1 9 11. 1% Collaborative filtering/personalization 2 7 28. 6% 1 7 28. 6% Post-processing of data mining results 1 Others 2 6 33. 3% High performance and parallel/distributed data mining 1 2 50. 0% 1 0. 0% 445 23. 8% Query languages and user interfaces for mining Total 39 66 5
Corresponding Analysis (Country vs Final Decision) r 2=0. 177 Slovenia Regular Finland Hong Kong Germany USA Italy Australia India Canada r 1=0. 378 Reject UK France Japan Short ICDM 2004 Business Meeting 11/4/2004 6
Corresponding Analysis (Topics vs Final Decision) r 2=0. 184 Applications Collaborative Filtering Reject Short DM Methods Soft-computing Quality-assessment Preprocessing, Feature Selection Security, privacy Statistics and probability Regular r 1=0. 280 High-performance ICDM 2004 Business Meeting 11/4/2004 Post-processing 7
Corresponding Analysis n Country vs Final Decision – Regular: Germany, USA – Short: ? – Reject: Most of the countries are located near this region. n Topics vs Final Decision – Regular: Quality Assessment, Preprocessing/Feature Selection – Short: DM/ML Methods, Collaborative Filtering – Reject: DM Applications ICDM 2004 Business Meeting 11/4/2004 8
Rule Mining on ICDM Submission Datasets – Sample Size: 445 – Attributes: 5 • Paper No. : ordered by submission date • # of Authors • # of Characters in Title • Country • Category – Analyzed by Clementine 7. 1 (and SPSS 12. 0 J) ICDM 2004 Business Meeting 11/4/2004 9
Rule Mining (C 5. 0) on ICDM Submission Data n C 5. 0 – [Topic=Mining semi-structured data, …] & [129< Paper No. <=369] => Reject (Confidence 0. 87, Support 10) – [Country=USA] & [Topic=Mining semi-structured data, …] & [Paper No. >369] & [# of Authors <=3] =>Accept (Confidence 0. 667, Support 3) – [Topic=Preprocessing/Feature Selection] & [# of Authors>4] => Accept (Confidence: 1. 0, Support 3) – Topic, Paper No, # of Authors : Important Features ICDM 2004 Business Meeting 11/4/2004 10
Rule Mining (GRI) on ICDM Submission Data n Generalized Rule Induction – [# of Authors <2] & [Paper No. <120. 5] => Rejected (Confidence 96. 0%, Support 24) – [# of Chars in Title< 27] & [Paper No. > 212] => Accepted (Confidence 100%, Support 5) n Paper No. , # of Chars in Title, # of Authors: Important Features ICDM 2004 Business Meeting 11/4/2004 11
Multidimensional Scaling (2004) Country Decision Topics Review Score Paper No. # of Authors # of Chars in Title ICDM 2004 Business Meeting 11/4/2004 12
Summary (2004) of Mining on ICDM Submission Data n n Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titles needed Mining Text/Web/Semi-structured Data are very popular. # of Application papers are growing now. (But, many: rejected) Strong Topics – Preprocessing/Feature-Selection – Postprocessing – Security and Privacy n Several topics are emerging in ICDM 2004: – Mining Data Streams – Collaborative Filtering – Quality Assessment ICDM 2004 Business Meeting 11/4/2004 13
Comparison between 02 -04 Review Scores: Box-plot ICDM 2004 Business Meeting 11/4/2004 14
Comparison between 02 -04 Countries Country Acceptance Ratio (2002) Country Acceptance Ratio (2003) Country Acceptance Ratio (2004) Hong Kong 64. 7% Israel 55. 0% Germany 60. 0% USA 47. 9% Hong Kong 50. 0% USA 44. 0% Canada 45. 5% Japan 37. 0% Finland 33. 3% USA 33. 0% Hong Kong 33. 0% France 33. 3% Germany 32. 0% Italy 30. 0% ICDM 2004 Business Meeting 11/4/2004 15
Comparison between 02 and 04 Topics Top 5 in 2002 Acceptance Ratio Top 5 in 2003 Acceptance Ratio Top 5 in 2004 Acceptance Ratio Graph Mining 75. 0% Processcentric DM 80. 0% Quality Assessment Temporal Data 52. 6% Security, privacy 57. 0% Preprocessing, Feature Selection 40. 0% Theory 42. 9% Statistics and Probability 47. 0% Complexity/Scalabil ity 36. 4% Text Mining 42. 1% Visual Data Mining 38. 0% DM and ML Methods 35. 8% Rule 41. 7% Postprocessing 41. 7% Collaborative Filtering 28. 6% Post-processing 28. 6% 50. 0% 16
Multidimensional Scaling (2003 and 2004) Topological structure w. r. t. similarities seems not to be changed in 2003 and 2004. Country Decision Topics Review Score # of Authors # of Chars in Title 2003 Paper No. 2004 Country Decision Review Score Paper No Topics # of Authors ICDM 2004 Business Meeting 11/4/2004 # of Chars in Title 17
Data Mining on ICDM Submission Data n Acknowledgements – Many thanks to • PC chairs, Vice Chairs and PC members • All the authors • All the contributors to ICDM 2004 – See you again in ICDM 2005! ICDM 2004 Business Meeting 11/4/2004 18
Multidimensional Scaling (2004) Country Decision Topics Review Score Paper No. # of Authors # of Chars in Title ICDM 2004 Business Meeting 11/4/2004 19
Multidimensional Scaling (2003) Country Decision Topics Review Score Paper No. # of Authors # of Chars in Title ICDM 2004 Business Meeting 11/4/2004 20
1e70bda020f415e618b3a3e6f3122fcb.ppt