269a87365616f1ba2dfe7f8f45a2f0d5.ppt
- Количество слайдов: 34
Discriminant Analysis Database Marketing Instructor: Nanda Kumar
Multiple Regression w Y = b 0 + b 1 X 1 + b 2 X 2 + …+ bn Xn w Same as Simple Regression in principle w New Issues: – Each Xi must represent something unique – Variable selection
Multiple Regression w Example 1: – Spending = a + b income + c age w Example 2: – weight = a + b height + c sex + d age
Real Estate Example w How is price related to the characteristics of the house?
SAS Code proc reg; model price = section lotsize bed bath age other; run;
Interpreting the Regression Output w Parameter Estimates or Slope Coefficients capture the marginal impact of explanatory variable on price w Example: the coefficient of the variable beds represents the impact of increasing the number of bedrooms by one on price
Significance of the Coefficients w Are they significantly different from zero? – Look at the T values and p values • T value higher than 1. 8 or p<0. 05 good • Sometimes p<0. 10 is considered reasonably significant w Overall Goodness of Fit – Look at R 2 (also refer to note in Session 1)
Where are we Now? Segment 1 Secondary Behavior Data Segment 2 Factor Analysis Cluster Analysis Targeting Discriminant /Logit Analysis Distinguishing Characteristics
Web Browsing w Identified two groups of consumers – One that visits your website frequently – One that doesn’t w Can the differences in behavior be related to socio-demographic variables? w Can we use these discriminators to classify prospects into one of these two groups?
Catalog Business w Identified two consumer segments – One which buys a lot – Other which does not buy as much w Can we find variables that help discriminate the behavior of these two groups? w Can we use these discriminators to classify other consumers into one of these two groups?
Promotional Campaigns w Identify groups based on their response to promotional campaigns – One group purchases a lot on promotion – Other does not w Identify characteristics that distinguish these two groups w Can we use these discriminators to identify price sensitive prospects from the not so price sensitive ones?
Segmentation Analysis w General Problem – Identified segments in the population based on behavior – Want to find targetable characteristics that discriminate these groups – Classify prospects into different groups
Data
Good Stocks
Bad Stocks
All Stocks
Identifying the Best Discriminators w Two groups appear to be well separated on each ratio: ROI and GE/A w Also well separated in two dimensional space w But this need not always be the case!
Discriminating Variables X 1 X 2
Discriminant Analysis w Identify a set of variables that best discriminate between the two groups w Does so by choosing a new line that maximizes the similarity between members of the same group and minimizing the similarity between members belonging to different groups
Discriminant Function Z = w 1 GEA + w 2 ROI Between-Group Sum of Squares – SSb Within-Group Sum of Squares – SSw = (SSb/SSw)
More on the Criterion w For Z to provide maximum separation between the groups, the following must be satisfied: – The means of Z for the two groups should be as far apart as possible (or high SSb) – Values of Z for each group should be as homogenous as possible (or low SSw)
Classification w Discriminant Function: The line that separates the members of the two groups w Methods of Classification – – Cut-Off Value Method Decision Theory Approach Classification Function Approach Mahalanobis Distance Method
Cut-Off Value Method w Uses the Discriminant Function line to score new observations (prospects) and classify them into one of two groups based on a cut-off value
Classification Cut-off Value R 2 R 1 Z
Classification Function Approach w Classifications based on this approach are identical to those done by Decision Theory approach w Classification functions are computed for each group: C 1 = -7. 87 + 61. 237*GEA + 21. 027*ROI C 2 = -0. 004 + 2. 551*GEA – 1. 404*ROI
Basic Idea w Score each new observation using these two scoring functions w The observation gets assigned to the group with the higher score
What To Look For In The Results? w Significance of the Discriminating Variables – Idea is to test whether the means of the discriminating variables are statistically different across the two groups – Statistic: Wilks’ Lamda must be small (Look for the p value/significance level)
Estimate of The Discriminant Function w Canonical Discriminant Function Z = -2. 0018 + 15. 0919*GEA + 5. 769*ROI w It is possible that the group means are statistically different even though for all practical purposes, the differences between the groups may not be large w Look at the squared Canonical Correlation: ratio of between group SS/Total SS (High is good)
Importance of the Discriminant Variables and the Discriminant Function w How important is a variable to the Discriminant Function? w Look at the structure loadings: Pooled Within Canonical Structure – Variable with the higher loading is relatively more important – Caution: If the variables are highly correlated relative importance of the variables can change with sample
Classification Summary w Look at Cross-Validation results
Web Browsing w Can use the Discriminant function to classify prospects into one of these two groups w Target Appropriately
Catalog Business w Classify other consumers into one of these two groups w Do stuff!
Promotional Campaigns w Classify Prospects into price sensitive and not so price sensitive segments w Target appropriately
Summary w Discriminant Analysis w Extremely Useful Segmentation Analysis tool w Intermediate step in the overall picture – helps classify prospects and devise the appropriate targeting strategies
269a87365616f1ba2dfe7f8f45a2f0d5.ppt