ff586826fa72a487ac9abf6be70f21f5.ppt
- Количество слайдов: 11
The GDB Cup: Applying “Real World” Financia Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA
What is the GDB Cup? Modeled after the KDD Cup Start with $100, 000 + Financial Data + Data Mining Techniques = Make As Much Money as Possible
Motivation • Availability of Data • Gain Experience with DM Process • Synthesize ML + Domain Knowledge • Pragmatic implications
Availability of Data • Different Time Series Perspectives – 1 minute to monthly • Different Financial Instruments – Stocks, Futures, Options, Mutual Funds • Large Sample Size – 400 - 700 Stocks (Daily, 2. 5 Years) – EMini Future (5 Minute, 2 Years) • Inexpensive or Free Sources – www. anfutures. com – www. ashkon. com – Screen Scraping (finance. yahoo. com)
DM Process: Data Cleansing • Low = 0 • Volume = 0 • Missing Data (e. g. no Open) • Missing Time Periods
Build Models (Synthesize ML & Domain Knowledge) Machine Learners Supervised NN, GP, SVM, Neuro Fuzzy, SOM, ILP, etc. Tech. Analysis Moving Averages, RSI, MACD, Stochastics, PNF, etc. www. equis. com/Education/TAAZ
Validating Models Statistical Valid. Financial Valid. Ignore Market Conditions (Buy & Hold) Start Date Value End Date Value Unrealistic Conditions (e. g. Drawdown) Standardize portfolio management Validate with EXCEL models
Results - 1 Fall 2002 12/31/99 - 5/31/02 452 stocks Annual ROI = 270% Spring 2003 12/31/99 - 5/31/02 712 stocks Annual ROI = 310% Fall 2003 6/14/02 - 6/12/03 S&P EMini (5 Min. ) Annual ROI = 852%
Results - 2 Spring 2004 (Train) 10/12/01 - 12/26/03 S&P EMini (5 Min. ) Annual ROI = 23, 300% Spring 2004 (Test) 12/29/03 - 04/16/04 S&P EMini (5 Min. ) Annual ROI = 2, 172%
Demo
Conclusions • Effective way to understand DM Process – Data Cleansing – Data Validation • Very Good Results – ROI > 250% in all four cases • Pragmatic implications
ff586826fa72a487ac9abf6be70f21f5.ppt