867a75aa3477ba4d66dc5ec1b0e74321.ppt
- Количество слайдов: 43
More Data Mining Success Stories for Marketing and Related Fields Wolfgang Jank RH Smith School of Business University of Maryland
What is “Data Mining”?
What is Data Mining? n Many Definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data ¨ Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns ¨
Related Fields Machine Learning Visualization Data Mining and Knowledge Discovery Statistics Databases
Why Mine Data?
Because there are Data Floods….
Why Mine Data? n Lots of data is being automatically collected and warehoused ¨ Web data, e-commerce ¨ Scanner data at department/ grocery stores ¨ Bank/Credit Card/Insurance transactions n Computers have become cheaper and more powerful n Competitive Pressure is Strong ¨ Provide better, customized services for an edge
Big Data Examples n Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25 -day observation session ¨ storage n and analysis a big problem AT&T handles billions of calls per day ¨ so much data, it cannot be all stored -- analysis has to be done “on the fly”, on streaming data
Data Growth In 2 years, the size of the largest database TRIPLED!
Data Mining is particularly promising Online n Why? ¨ Because every “click” leaves a digital footprint ¨ We can use these footprints to better understand our customers… n ¨ Coupons, ads, discount, dynamic pricing, … …or guard them against predators n Fraud detection, account protection, spam, junk mail, viruses, …
Blog Pulse n Measures what the world (= the internet) is thinking ¨ Measured in terms of the blogging activity The “Obama Buzz” started here! The Republican Convention & Sarah Palin
Google Trends n Measures what the world is looking for ¨ Measured in terms of search words The world’s interest in “Lehman Brothers” and “AIG”
Google Flu Trends n Detects outbreaks of flu early and only based on search terms More accurate and faster than CDC ¨ Read more at http: //www. google. org/fl utrends/ ¨
Data Mining Success Stories
The Netflix Recommendation Engine n Netflix uses data mining to make recommendations to its users Based on past user behavior ¨ Based on movie similarities ¨ n n n Helps cross-selling of products Improves the search experience for users However, developing good recommendation engines is not easy; therefore, Netflix has initiated the Netflix Challenge
The Netflix Challenge “The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences” n n Netflix offers $1 million for the person/team that can improve their current data mining method by 10% (i. e. classification accuracy) ¨ ¨ ¨ http: //www. netflixprize. com/ Incremental progress prizes of $50, 000 every year AT&T team has won progress prize in 2007
Amazon’s Recommendation Engine n n n Every time we buy a book on Amazon, we receive recommendations about similar books How are they doing this? The answer: massive data mining
Google’s Search Algorithm n n Google continuously collects data about web pages using web spiders It transforms this massive data into search information using the famous “page-rank” algorithm
AT&T’s Fraud Detection In the AT&T telephone network, every day old nodes drop out (terminated accounts) and new nodes pop up (new accounts) Fraudulent account: terminated! Should this new account be allowed? Name Elizabeth Harmon Address APT 1045 4301 ST JOHN RD SCOTTSDALE, AZ Address 180 N 40 TH PL APT 40 PHOENIX, AZ Balance $149. 00 Balance $72. 00 Disconnected 2/19/04 (nonpayment) Connected 1/31/04
AT&T’s Fraud Detection AT&T uses massive graph mining to detect fraud in their telephone network data
Mining Accounting Fraud at Pricewaterhouse. Coopers n n Pw. C uses data mining for the automatic analysis of company general ledgers to detect accounting fraud Helps conform with Sarbanes. Oxley Act n Improves efficiency n Improves accuracy
Sales Lead Identification at IBM n IBM uses predictive modeling to estimate opportunities for cross-selling to existing customers, selling of existing services to new customers ¨ Uses analytic tools to estimate A potential customer’s wallet size n A potential customer’s probability of purchasing a service n
Data Mining at IBM Firmographics Historical total Software sales State is CA Sector is IT IBM Relationship Historical Lotus sales Historical System p sales Company is HQ Historical System x sales Historical System z sales New Rational sales
zata 3: Data-Driven Decisions in Election Campaigns n n zata 3 is an election campaign consulting company They recently decided to add data mining technology to their services
zata 3: Lot’s of data on voters and past voting behavior l l Goal: to predict who will vote in the next election Idea: better targeted spending of election campaign resources
zata 3: Huge savings with data mining n Zata 3 anticipates savings of over 30% using data mining models
Data Mining and Mass e-Customization
Customization for Online Services n Opportunities: ¨ Combination of countless features for highly individualized solutions n n “A single personalized solution for every customer” Challenges: ¨ How does the customer understand what’s right for them? n Moving from consultative selling to self-consultative buying
Ex. : Freddie Mac Mortgage Services n Freddie Mac mass customizes mortgage products ¨ Combines hundreds of different loan characteristics n Challenge: How does the customer find the loan that’s right for them?
Ex. : Mass Customization at e. Bay n e. Bay offers any possible product & service in “garage-type” sales ¨ However, it does not assist the customer much in finding the right product/service.
Ex. : Books on Amazon. com offers books for every taste ¨ But: How can we find the book that’s right for us?
Managing Mass Customization at Amazon n How does Amazon assure that customers find what they are looking for? ¨ Answer: by making (automated) recommendations
Managing Mass Customization n From Expert Salesperson to Expert System: ¨ n How can we assure that our customers get what they are looking for? Pre-Internet customization: ¨ n Expert Salesperson n Experienced with product, process Consultative selling Salesperson provides expertise, identifies needs, defines configuration n Early/current-Internet customization: ¨ Expert Customer n n n Experiences with product Revelation, Transaction buying Customer provides expertise, knows needs, defines configuration Future Internet Customization: ¨ Non-Expert Customer n n n Inexperienced with product, process Self-consultative buying System provides expertise, identifies needs, defines configuration
Providing the non-expert customer with decision support n n Moving from Expert to Non-Expert Buyers: Computerization Assisted service Telephone, email, instant messaging ¨ Drawback: requires human interaction, only limited scalability ¨ n n Self service Search, user ratings, forums, blogs, expert recommendations ¨ Drawback: does not help the customer that is unsure about their needs ¨ Automated service Expert systems for the non-expert Replaces the salesperson Translates customer characteristics and usage requirements into recommended product configurations ¨ Consists of rule-based systems and data mining algorithms ¨ Advantage: fully automatic, scalable, updatable ¨ ¨ ¨
Ex. : Automated-Service at Am. Ex n Offers online tool that, based on desired features, recommends best card ¨ Compensates only for lack of product knowledge, but assumes customer knows why they need the product.
Ex. : Blockbuster’s Recommendation System n Blockbuster recommends similar movies based on movie features and user behavior ¨ “If you liked Indiana Jones, then you will also like Tomb Raider”
Key Component for Automated Service Systems: Data Mining n Collect and mine customer information in order to, e. g. , ¨ Segment the market n n ¨ Analyze behaviors and events n n ¨ Understand when customer has needs and the events that lead to them E. g. path tracking, click stream analysis Optimize prizing n n n Understand customers’ different needs, expertise, profitability E. g. Dell distinguishes between the segments “Home”, “Small Business”, “Medium/Large Business”, “Public Sector” Bundling, price discrimination E. g. Amazon’s price testing; Zilliant’s data-driven pricing software Key requirement: understand customer data
Dangers of Data Mining
Dangers of Data Mining n The danger of using data mining software/technology as a “black box” ¨ Data does not mine itself! ¨ We still need the domain knowledge and expertise of the user; otherwise outcomes may be meaningless n Data quality ¨ Junk-in, junk-out
What Data Mining Isn’t
Data Mining Isn’t… n …smarter than you ¨ Example from De. Veaux: A new backpack inkjet printer is showing higher than expected warranty claims n A neural networks analysis shows that Zip code is the most important predictor n
Data Mining Isn’t… n …always about algorithms ¨ Sometimes is enough n Blogpulse collecting an plotting the right data
More Data Mining Resources n Repository: ¨ http: //www. kdnuggets. com/ ¨ http: //www. the-data-mine. com/ n Tutorials ¨ http: //www. autonlab. org/tutorials/ n Software ¨ SAS Enterprise Miner, SPSS Clementine, Orange, Weka, Rattle, R, …


