Скачать презентацию More Data Mining Success Stories for Marketing and Скачать презентацию More Data Mining Success Stories for Marketing and

867a75aa3477ba4d66dc5ec1b0e74321.ppt

  • Количество слайдов: 43

More Data Mining Success Stories for Marketing and Related Fields Wolfgang Jank RH Smith More Data Mining Success Stories for Marketing and Related Fields Wolfgang Jank RH Smith School of Business University of Maryland

What is “Data Mining”? What is “Data Mining”?

What is Data Mining? n Many Definitions Non-trivial extraction of implicit, previously unknown and What is Data Mining? n Many Definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data ¨ Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns ¨

Related Fields Machine Learning Visualization Data Mining and Knowledge Discovery Statistics Databases Related Fields Machine Learning Visualization Data Mining and Knowledge Discovery Statistics Databases

Why Mine Data? Why Mine Data?

Because there are Data Floods…. Because there are Data Floods….

Why Mine Data? n Lots of data is being automatically collected and warehoused ¨ Why Mine Data? n Lots of data is being automatically collected and warehoused ¨ Web data, e-commerce ¨ Scanner data at department/ grocery stores ¨ Bank/Credit Card/Insurance transactions n Computers have become cheaper and more powerful n Competitive Pressure is Strong ¨ Provide better, customized services for an edge

Big Data Examples n Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each Big Data Examples n Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25 -day observation session ¨ storage n and analysis a big problem AT&T handles billions of calls per day ¨ so much data, it cannot be all stored -- analysis has to be done “on the fly”, on streaming data

Data Growth In 2 years, the size of the largest database TRIPLED! Data Growth In 2 years, the size of the largest database TRIPLED!

Data Mining is particularly promising Online n Why? ¨ Because every “click” leaves a Data Mining is particularly promising Online n Why? ¨ Because every “click” leaves a digital footprint ¨ We can use these footprints to better understand our customers… n ¨ Coupons, ads, discount, dynamic pricing, … …or guard them against predators n Fraud detection, account protection, spam, junk mail, viruses, …

Blog Pulse n Measures what the world (= the internet) is thinking ¨ Measured Blog Pulse n Measures what the world (= the internet) is thinking ¨ Measured in terms of the blogging activity The “Obama Buzz” started here! The Republican Convention & Sarah Palin

Google Trends n Measures what the world is looking for ¨ Measured in terms Google Trends n Measures what the world is looking for ¨ Measured in terms of search words The world’s interest in “Lehman Brothers” and “AIG”

Google Flu Trends n Detects outbreaks of flu early and only based on search Google Flu Trends n Detects outbreaks of flu early and only based on search terms More accurate and faster than CDC ¨ Read more at http: //www. google. org/fl utrends/ ¨

Data Mining Success Stories Data Mining Success Stories

The Netflix Recommendation Engine n Netflix uses data mining to make recommendations to its The Netflix Recommendation Engine n Netflix uses data mining to make recommendations to its users Based on past user behavior ¨ Based on movie similarities ¨ n n n Helps cross-selling of products Improves the search experience for users However, developing good recommendation engines is not easy; therefore, Netflix has initiated the Netflix Challenge

The Netflix Challenge “The Netflix Prize seeks to substantially improve the accuracy of predictions The Netflix Challenge “The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences” n n Netflix offers $1 million for the person/team that can improve their current data mining method by 10% (i. e. classification accuracy) ¨ ¨ ¨ http: //www. netflixprize. com/ Incremental progress prizes of $50, 000 every year AT&T team has won progress prize in 2007

Amazon’s Recommendation Engine n n n Every time we buy a book on Amazon, Amazon’s Recommendation Engine n n n Every time we buy a book on Amazon, we receive recommendations about similar books How are they doing this? The answer: massive data mining

Google’s Search Algorithm n n Google continuously collects data about web pages using web Google’s Search Algorithm n n Google continuously collects data about web pages using web spiders It transforms this massive data into search information using the famous “page-rank” algorithm

AT&T’s Fraud Detection In the AT&T telephone network, every day old nodes drop out AT&T’s Fraud Detection In the AT&T telephone network, every day old nodes drop out (terminated accounts) and new nodes pop up (new accounts) Fraudulent account: terminated! Should this new account be allowed? Name Elizabeth Harmon Address APT 1045 4301 ST JOHN RD SCOTTSDALE, AZ Address 180 N 40 TH PL APT 40 PHOENIX, AZ Balance $149. 00 Balance $72. 00 Disconnected 2/19/04 (nonpayment) Connected 1/31/04

AT&T’s Fraud Detection AT&T uses massive graph mining to detect fraud in their telephone AT&T’s Fraud Detection AT&T uses massive graph mining to detect fraud in their telephone network data

Mining Accounting Fraud at Pricewaterhouse. Coopers n n Pw. C uses data mining for Mining Accounting Fraud at Pricewaterhouse. Coopers n n Pw. C uses data mining for the automatic analysis of company general ledgers to detect accounting fraud Helps conform with Sarbanes. Oxley Act n Improves efficiency n Improves accuracy

Sales Lead Identification at IBM n IBM uses predictive modeling to estimate opportunities for Sales Lead Identification at IBM n IBM uses predictive modeling to estimate opportunities for cross-selling to existing customers, selling of existing services to new customers ¨ Uses analytic tools to estimate A potential customer’s wallet size n A potential customer’s probability of purchasing a service n

Data Mining at IBM Firmographics Historical total Software sales State is CA Sector is Data Mining at IBM Firmographics Historical total Software sales State is CA Sector is IT IBM Relationship Historical Lotus sales Historical System p sales Company is HQ Historical System x sales Historical System z sales New Rational sales

zata 3: Data-Driven Decisions in Election Campaigns n n zata 3 is an election zata 3: Data-Driven Decisions in Election Campaigns n n zata 3 is an election campaign consulting company They recently decided to add data mining technology to their services

zata 3: Lot’s of data on voters and past voting behavior l l Goal: zata 3: Lot’s of data on voters and past voting behavior l l Goal: to predict who will vote in the next election Idea: better targeted spending of election campaign resources

zata 3: Huge savings with data mining n Zata 3 anticipates savings of over zata 3: Huge savings with data mining n Zata 3 anticipates savings of over 30% using data mining models

Data Mining and Mass e-Customization Data Mining and Mass e-Customization

Customization for Online Services n Opportunities: ¨ Combination of countless features for highly individualized Customization for Online Services n Opportunities: ¨ Combination of countless features for highly individualized solutions n n “A single personalized solution for every customer” Challenges: ¨ How does the customer understand what’s right for them? n Moving from consultative selling to self-consultative buying

Ex. : Freddie Mac Mortgage Services n Freddie Mac mass customizes mortgage products ¨ Ex. : Freddie Mac Mortgage Services n Freddie Mac mass customizes mortgage products ¨ Combines hundreds of different loan characteristics n Challenge: How does the customer find the loan that’s right for them?

Ex. : Mass Customization at e. Bay n e. Bay offers any possible product Ex. : Mass Customization at e. Bay n e. Bay offers any possible product & service in “garage-type” sales ¨ However, it does not assist the customer much in finding the right product/service.

Ex. : Books on Amazon. com offers books for every taste ¨ But: How Ex. : Books on Amazon. com offers books for every taste ¨ But: How can we find the book that’s right for us?

Managing Mass Customization at Amazon n How does Amazon assure that customers find what Managing Mass Customization at Amazon n How does Amazon assure that customers find what they are looking for? ¨ Answer: by making (automated) recommendations

Managing Mass Customization n From Expert Salesperson to Expert System: ¨ n How can Managing Mass Customization n From Expert Salesperson to Expert System: ¨ n How can we assure that our customers get what they are looking for? Pre-Internet customization: ¨ n Expert Salesperson n Experienced with product, process Consultative selling Salesperson provides expertise, identifies needs, defines configuration n Early/current-Internet customization: ¨ Expert Customer n n n Experiences with product Revelation, Transaction buying Customer provides expertise, knows needs, defines configuration Future Internet Customization: ¨ Non-Expert Customer n n n Inexperienced with product, process Self-consultative buying System provides expertise, identifies needs, defines configuration

Providing the non-expert customer with decision support n n Moving from Expert to Non-Expert Providing the non-expert customer with decision support n n Moving from Expert to Non-Expert Buyers: Computerization Assisted service Telephone, email, instant messaging ¨ Drawback: requires human interaction, only limited scalability ¨ n n Self service Search, user ratings, forums, blogs, expert recommendations ¨ Drawback: does not help the customer that is unsure about their needs ¨ Automated service Expert systems for the non-expert Replaces the salesperson Translates customer characteristics and usage requirements into recommended product configurations ¨ Consists of rule-based systems and data mining algorithms ¨ Advantage: fully automatic, scalable, updatable ¨ ¨ ¨

Ex. : Automated-Service at Am. Ex n Offers online tool that, based on desired Ex. : Automated-Service at Am. Ex n Offers online tool that, based on desired features, recommends best card ¨ Compensates only for lack of product knowledge, but assumes customer knows why they need the product.

Ex. : Blockbuster’s Recommendation System n Blockbuster recommends similar movies based on movie features Ex. : Blockbuster’s Recommendation System n Blockbuster recommends similar movies based on movie features and user behavior ¨ “If you liked Indiana Jones, then you will also like Tomb Raider”

Key Component for Automated Service Systems: Data Mining n Collect and mine customer information Key Component for Automated Service Systems: Data Mining n Collect and mine customer information in order to, e. g. , ¨ Segment the market n n ¨ Analyze behaviors and events n n ¨ Understand when customer has needs and the events that lead to them E. g. path tracking, click stream analysis Optimize prizing n n n Understand customers’ different needs, expertise, profitability E. g. Dell distinguishes between the segments “Home”, “Small Business”, “Medium/Large Business”, “Public Sector” Bundling, price discrimination E. g. Amazon’s price testing; Zilliant’s data-driven pricing software Key requirement: understand customer data

Dangers of Data Mining Dangers of Data Mining

Dangers of Data Mining n The danger of using data mining software/technology as a Dangers of Data Mining n The danger of using data mining software/technology as a “black box” ¨ Data does not mine itself! ¨ We still need the domain knowledge and expertise of the user; otherwise outcomes may be meaningless n Data quality ¨ Junk-in, junk-out

What Data Mining Isn’t What Data Mining Isn’t

Data Mining Isn’t… n …smarter than you ¨ Example from De. Veaux: A new Data Mining Isn’t… n …smarter than you ¨ Example from De. Veaux: A new backpack inkjet printer is showing higher than expected warranty claims n A neural networks analysis shows that Zip code is the most important predictor n

Data Mining Isn’t… n …always about algorithms ¨ Sometimes is enough n Blogpulse collecting Data Mining Isn’t… n …always about algorithms ¨ Sometimes is enough n Blogpulse collecting an plotting the right data

More Data Mining Resources n Repository: ¨ http: //www. kdnuggets. com/ ¨ http: //www. More Data Mining Resources n Repository: ¨ http: //www. kdnuggets. com/ ¨ http: //www. the-data-mine. com/ n Tutorials ¨ http: //www. autonlab. org/tutorials/ n Software ¨ SAS Enterprise Miner, SPSS Clementine, Orange, Weka, Rattle, R, …