Скачать презентацию Weka An open-source tool for data analysis and Скачать презентацию Weka An open-source tool for data analysis and

f8f5f321806e63be1d0186b6eb4d2453.ppt

  • Количество слайдов: 9

Weka: An open-source tool for data analysis and mining with machine learning Quantitative Data Weka: An open-source tool for data analysis and mining with machine learning Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008

Regression lines and correlation • Find relationship between two attributes • Correlation coefficient Regression lines and correlation • Find relationship between two attributes • Correlation coefficient

Categorization • Can we learn one category based on the others? • This search Categorization • Can we learn one category based on the others? • This search for classification lines is called machine learning

Data Sets • • • House of Representative Votes Labor Relations Iris (plant) Discrimination Data Sets • • • House of Representative Votes Labor Relations Iris (plant) Discrimination Breast Cancer Many more at http: //archive. ics. uci. edu/ml/ • Table of Features – Example is a row – Features are discrete or continuous

Weka Time - Explore • http: //www. cs. waikato. ac. nz/ml/weka/ • Open Explorer Weka Time - Explore • http: //www. cs. waikato. ac. nz/ml/weka/ • Open Explorer • Open Data File – ARFF or CSV • Visualize All • Visualize Crosstabs

Discrete : Decision Trees • Reduce confusion (entropy) in the data by drawing recursive Discrete : Decision Trees • Reduce confusion (entropy) in the data by drawing recursive lines • Result is comprehensible to humans

Continuous : ANN and SVM • Artificial Neural Networks simulate activating and thresholding neurons Continuous : ANN and SVM • Artificial Neural Networks simulate activating and thresholding neurons • Support Vector Machines use a kernel to transform data to higher dimensions

Weka Time - Classify • Choose Algorithm – J 48, Multilayered Perceptron, SMO • Weka Time - Classify • Choose Algorithm – J 48, Multilayered Perceptron, SMO • Validate Learning – Training set – Cross validation • Visualize output – ROC Curves – Precision-Recall Curves

Future Topics • Clustering – Number and makeup of categories unknown • Relational Data Future Topics • Clustering – Number and makeup of categories unknown • Relational Data – Features are related within examples – Features are related across examples