Скачать презентацию Regression factor analyses Where regression can

6f549f04eb5242a63b0d176e44715958.ppt

• Количество слайдов: 29

Regression & factor analyses

Where regression can go wrong u An example: n A financial company wishes to ascertain what the drivers of satisfaction are for their service: They are: EXPERT="experts" Q 30 A 2 ="Take the time to understand who you are" Q 30 A 3 ="Communicate clearly, in plain language" Q 30 A 6 ="Go out of their way to tailor the best deal" Q 30 A 7 ="Have the knowledge and authority to make" Q 30 A 8 ="Have a positive, can-do approach" Q 30 A 11 ="Understand your business and the market" Q 30 A 12 ="Are proactive with ideas on how to get t" Q 30 A 13 ="Are prompt and reliable in handling any" Q 30 A 14 ="Treat you with respect and listen" Q 30 A 15 ="Keep in regular contact to keep you updated" Q 32 A 1 ="The competitiveness of their fees and rates" Q 32 A 2 ="Offering a flexible range of lending/rep" Q 32 A 3 ="How easy it is to take out a commercial" Q 32 A 4 ="The features and benefits of their comments" Q 32 A 5 ="Providing a full range of commercial product" Q 32 A 6 ="Being fair and reasonable in their lending“ Q 24 ="Q 3 a. AMP BANKING OVERALL RATING“ NB: this is the response n These were all on a 10 point scale

Example u Let’s clean this data: SAS CODE: Libname hold ‘’; data temp; set hold. model; array new {*} Q 24 EXPERT Q 30 A 2 Q 30 A 3 Q 30 A 6 Q 30 A 7 Q 30 A 8 Q 30 A 11 Q 30 A 12 Q 30 A 13 Q 30 A 14 Q 30 A 15 Q 32 A 1 Q 32 A 2 Q 32 A 3 Q 32 A 4 Q 32 A 5 Q 32 A 6; do i=1 to 26; if new[i] in (11) then new[i]=. ; end; drop i; run; proc standard data=temp replace out=temp; var Q 24 Q 33 Q 34 EXPERT Q 30 A 2 Q 30 A 3 Q 30 A 6 Q 30 A 7 Q 30 A 8 Q 30 A 11 Q 30 A 12 Q 30 A 13 Q 30 A 14 Q 30 A 15 Q 32 A 1 Q 32 A 2 Q 32 A 3 Q 32 A 4 Q 32 A 5 Q 32 A 6; run; data hold. model; set temp; run; u The above code changes 11’s for. (missings in SAS) and replaces them with the mean value for each varaible

Let’s look at the data: STAFF - Experts in Commercial Finance Ma Cumulative EXPERT Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 5 1. 67 2 4 1. 33 9 3. 00 3 5 1. 67 14 4. 67 4 3 1. 00 17 5. 67 5 14 4. 67 31 10. 33 6 16 5. 33 47 15. 67 7 22 7. 33 69 23. 00 7. 462890625 121 40. 33 190 63. 33 8 50 16. 67 240 80. 00 9 24 8. 00 264 88. 00 10 36 12. 00 300 100. 00 Take the time to understand who you are Cumulative Q 30 A 2 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 10 3. 33 2 4 1. 33 14 4. 67 3 11 3. 67 25 8. 33 4 10 3. 33 35 11. 67 5 19 6. 33 54 18. 00 6 19 6. 33 73 24. 33 7 25 8. 33 98 32. 67 7. 4111328125 52 17. 33 150 50. 00 8 48 16. 00 198 66. 00 9 41 13. 67 239 79. 67 10 61 20. 33 300 100. 00 Communicate clearly, in plain language Cumulative Q 30 A 3 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 3 1. 00 2 5 1. 67 8 2. 67 3 3 1. 00 11 3. 67 4 6 2. 00 17 5. 67 5 11 3. 67 28 9. 33 6 12 4. 00 40 13. 33 7 34 11. 33 74 24. 67 7. 98046875 33 11. 00 107 35. 67 8 81 27. 00 188 62. 67 9 48 16. 00 236 78. 67 10 64 21. 33 300 100. 00

Some more code proc reg data = hold. model; model Q 24= expert Q 30 A 2 Q 30 A 3 Q 30 A 6 Q 30 A 7 Q 30 A 8 Q 30 A 11 Q 30 A 12 Q 30 A 13 Q 30 A 14 Q 30 A 15 Q 32 A 1 Q 32 A 2 Q 32 A 3 Q 32 A 4 Q 32 A 5 Q 32 A 6; run; proc corr data = hold. model; var Q 24 expert Q 30 A 2 Q 30 A 3 Q 30 A 6 Q 30 A 7 Q 30 A 8 Q 30 A 11 Q 30 A 12 Q 30 A 13 Q 30 A 14 Q 30 A 15 Q 32 A 1 Q 32 A 2 Q 32 A 3 Q 32 A 4 Q 32 A 5 Q 32 A 6; run;

Regression output Parameter Estimates Variable Intercept EXPERT Label DF Intercept 1 STAFF - Experts in Commercial 1 Finance Matters Q 30 A 2 Take the time to understand 1 who you are Q 30 A 3 Communicate clearly, in plain 1 language Q 30 A 6 Go out of their way to tailor 1 the best Q 30 A 7 Have the knowledge and 1 authority to make Q 30 A 8 Have a positive, can-do 1 approach to doing Q 30 A 11 Understand your business and 1 the market Q 30 A 12 Are proactive with ideas on 1 how to get Q 30 A 13 Are prompt and reliable in 1 handling any Q 30 A 14 Treat you with respect and 1 listen Q 30 A 15 Keep in regular contact to 1 keep you updated Q 32 A 1 The competitiveness of their 1 fees and rates Q 32 A 2 Offering a flexible range of lending/rep 1 Q 32 A 3 How easy it is to take out a commercial 1 Q 32 A 4 The features and benefits of their comments 1 Q 32 A 5 Providing a full range of commercial prod 1 Q 32 A 6 Being fair and reasonable in their lending 1 Parameter Estimate Standard Error t Value Pr > |t| 1. 99970 0. 05590 0. 53770 0. 06486 3. 72 0. 86 0. 0002 0. 3895 0. 01870 0. 07645 0. 24 0. 8069 0. 02263 0. 07383 0. 31 0. 7595 0. 01097 0. 06114 0. 18 0. 8578 0. 11831 0. 06004 1. 97 0. 0498 0. 13498 0. 08037 1. 68 0. 0942 -0. 06802 0. 07025 -0. 97 0. 3338 0. 02511 0. 05764 0. 44 0. 6634 0. 37204 0. 06702 5. 55 <. 0001 -0. 17003 0. 08039 -2. 12 0. 0353 0. 07978 0. 04594 1. 74 0. 0835 0. 00392 0. 06439 0. 06 0. 9514 -0. 05496 0. 07025 -0. 08790 0. 07440 0. 15004 0. 07295 0. 06019 0. 08377 0. 05614 0. 06826 -0. 75 1. 17 -1. 05 1. 33 2. 20 0. 4519 0. 2442 0. 2949 0. 1861 0. 0288

Issues u Note that many of these coefficients are not significant u Even worse some are negatively related when we would expect, in the worst case, that they would be at least >=0 n Eg: Q 30 A 14 Treat you with respect and 1 -0. 17003 0. 08039 -2. 12 0. 0353 listen n u i. e. : this seems to imply that not listening and treating people dis-respectfully would increase overall satisfaction !#&%\$#%*& So what is going on?

Some Correlation output Q 30 A 7 Have the knowledge and authority to make Q 30 A 8 Have a positive, can-do approach to doin Q 30 A 11 Understand your business and the market Q 30 A 12 Are proactive with ideas on how to get t Q 30 A 13 Are prompt and reliable in handling any Q 30 A 14 Treat you with respect and listen to wha Q 30 A 15 Keep in regular contact to keep you upda Q 32 A 1 The competitiveness of their fees and ra u . 0. 61756 0. 58737 0. 61441 0. 59967 0. 64270 <. 0001 1. 00000 0. 71403 0. 59881 0. 60714 <. 0001 0. 60261 0. 58008 0. 76265 0. 68892 0. 70250 0. 71403 <. 0001 1. 00000 0. 76378 0. 70638 <. 0001 0. 52959 0. 62118 0. 81022 0. 66246 0. 64729 0. 59881 0. 76378 <. 0001 <. 0001 1. 00000 0. 71796 0. 53925 0. 55677 0. 73597 0. 59714 0. 66199 0. 60714 0. 70638 0. 71796 <. 0001 <. 0001 0. 64158 0. 47558 0. 63395 0. 64574 0. 54768 0. 68501 0. 68526 0. 64092 0. 59023 <. 0001 <. 0001 0. 47386 0. 51258 0. 65066 0. 69404 0. 55816 0. 57507 0. 66858 0. 60788 0. 51475 <. 0001 <. 0001 0. 50963 0. 51407 0. 67322 0. 59555 0. 55953 0. 51578 0. 54464 0. 60346 0. 64993 <. 0001 <. 0001 0. 31972 0. 37541 0. 40878 0. 40499 0. 46758 0. 40688 0. 32594 0. 38509 0. 37980 <. 0001 <. 0001 1. 00000 It appears that the explanatory variables are very highly correlated with each other

Where do we go from here? u Clearly we have data that is multi-collinear ( i. e. : variable are linearly related and hence one variable may explain others) u In this case, some relationships may be hidden as another variable has ‘hogged’ the relationship in terms of explanation u So how do we go about seeing if we can reduce the number of variables we look at without losing the finer detail? u The answer is …. F (PS: let’s leave this example for a while and return to it later)

Factor analyses

Factor Analysis u Background n u A Factor Analysis takes answers from many (maybe different) types of questions and summarises them with a smaller number of factors. It works by pulling out “common dimensions” from the input variables and grouping them together (e. g. . if Income and Education were input into a Factor Analysis they would probably come out on one factor resembling Socio-Economic Status). The reasons for doing this are: n n n to gain greater control over final solutions to equate the scale of variables that have been measured on different scales that the output factors are independent or orthogonal to each other

Principal components vs Factor analysis u u u With Principal components we compute: Y=G(x-m) where G is othogonal and G’S G=L and L is diagonal matrix of eignevalues of S, the covariance matrix of x With factor analysis we compute: x= m + Lf + e where L is a matrix of factor loadings. Here S=LL’+Y PC reduced dimensionality by taking a linear combination of the x’s FA attempts to understand correlations between observable variables in terms of underlying factors, which are themselves not directly observable (latent) Essentially the code you obtain is PC with ‘fudge factors’ so that we can investigate underlying or latent (i. e. factors) patterns

Problems u Missing Values n u Variable Correlation and Factor Interpretation n u To perform a Factor Analysis the variables must contain no missing values. To overcome this any missing variables need to be filled in with the mean, median or mode - depending on the type of data. If there are missing values, the entire observation will be omitted from the analysis. Since Factor Analysis works by grouping variables which are correlated, the correlations between the variables should be checked before performing the analysis. From the qualitative research certain variables are expected to be correlated. This needs to be true if we are to reproduce the qualitative model. If this is not the case, it can result in problems interpreting the factors from the analysis. We need factors that make sense to continue with Regression Analysis or Segmentation (much later). Number Of Factors n The number of factors used depends upon the individual and the job. The key point to note is that the factors need to be interpretable to be useful in analysis. Interpretability can make the final decision on how many factors you have.

Example The following pages are an example of a Factor Analysis from a project done for the Auckland Regional Council regarding recycling in businesses. The questions used for the following Factor Analysis example are on the next page. n What the ARC wanted was a segmentation so they could target recycling programs at businesses which would be receptive to them. They also wanted to find out which media channels would be most effective for reaching the target market.

u Q 6 I am now going to read a series of statements which describe how an organisation might feel about buying recycled products. Please indicate how strongly you agree or disagree that each of the following statements applies to your company on a scale of 1 to 10, where 1 means you strongly disagree, and 10 means you strongly agree. ROTATE AND READ u u My company wouldn’t use recycled products because they look cheap and nasty. u Recycled products seem to be of much lower quality than non-recycled products. u Using recycled products results in our equipment breaking down and needing more maintenance. u They would need to be a lot cheaper before we would consider buying them. u If there were no other problems with recycled products we would even pay a small premium to use them. u All recycled products cost more than non-recycled products. u It’s not worth the time and effort finding and changing suppliers just to get recycled products. u It would be too hard to make the system changes necessary to use recycled products. u The range of recycled products available is not wide enough to warrant using them. u It’s just too difficult to get enough people to change their routines and to use more recycled products. u We would use recycled products if someone in our company took the responsibility to push the initiative ahead. u Using recycled products doesn’t really fit with our image. u If quality, price and availability were the same, we would choose to buy recycled products over not recycled products whenever we could u Manufacturing recycled products is actually less energy efficient and more harmful to the environment. u There are benefits to us if our customers see us as “Green”.

Preliminaries Prior to performing a Factor Analysis a couple of preliminaries need to be worked through. First of all, the data used for the Factor Analysis needs to be cleaned (i. e. . missing values or don’t knows replaced, influential points/ outliers checked and null microtab values that result in zeros). Next the correlations between the variables should be checked to see whether they are as the qualitative researcher (for segmentations) or client (for threshold analyses) expects. Checking Data First the variables in the Factor Analysis need to be checked for missing or invalid points. This can be done using a frequency table with code: proc freq data=hold. cards; table q 33 a 1 -q 33 a 15; run; This table will show all values for the listed questions and how many missing values there are. The output for one table is shown below. The SAS System 10: 40 Tuesday, February 25, 1997 11 Cumulative Q 33 A 13 Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 12 4. 9 2 8 3. 2 20 8. 1 3 16 6. 5 36 14. 6 4 8 3. 2 44 17. 8 5 26 10. 5 70 28. 3 6 23 9. 3 93 37. 7 31 12. 6 124 50. 2 8 47 19. 0 171 69. 2 9 13 5. 3 184 74. 5 10 60 24. 3 244 98. 8 11 3 1. 2 247 100. 0

Data cleaning issues Replacing Don’t Knows Or Refused's With Missing's Say the questions have a 1 -10 scale for answers with 11’s as don’t knows. To convert the don’t knows to missings the following code can be used: data hold. cards; set hold. cards; /* setting up an array for the variables to be replaced */ array new {*} q 33 a 1 -q 33 a 15; /* running through that array */ do i=1 to dim(new); /* replacing 11’s with missings for all variables in the array */ if new[i]=11 then new[i]=. ; end; /* dropping unneeded variable i */ drop i; run; Replacing Missings With Means Now the variables do not have any don’t know answers - but a heap of missing values. To replace all the missings with means the following code can be used: proc standard data=hold. cards replace out=hold. cards; var q 33 a 1 -q 33 a 15; run;

Data cleaning issues… Replacing Missings With Other Values However if you want to replace missings with other values either of the following two sets of code can be used: To replace all variables with the same value: data hold. cards; set hold. cards; array new {*} q 33 a 1 -q 33 a 15; do i=1 to dim(new); if new[i]=. then new[i]=8; end; drop i; run; To replace all variables with different values: data hold. cards; set hold. cards; if q 33 a 1=. then q 33 a 1=8; if q 33 a 2=. then q 33 a 2=8. 25; if q 33 a 3=. then q 33 a 3=8. 5; . . . run;

Inspecting the data Checking Variable Correlations To check correlations between variables the following code can be used: proc corr data=hold. cards best=7; var q 33 a 1 -q 33 a 15; run; The output from this procedure is shown over the next 2 pages. The best= option shows the 7 most highly correlated variables with each variable in the procedure. If the correlations between variables are not as they should be you can either: 1. leave the offending variable out of the Factor Analysis or 2. run separate Factor Analyses for different sets of variables (renaming the different sets of factors in between) The SAS System 10: 40 Tuesday, February 25, 1997 12 Correlation Analysis 15 'VAR' Variables: Q 33 A 1 Q 33 A 2 Q 33 A 3 Q 33 A 4 Q 33 A 5 Q 33 A 6 Q 33 A 7 Q 33 A 8 Q 33 A 9 Q 33 A 10 Q 33 A 11 Q 33 A 12 Q 33 A 13 Q 33 A 14 Q 33 A 15 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Q 33 A 1 247 2. 534694 2. 031215 626. 069388 1. 000000 10. 000000 Q 33 A 2 247 3. 906780 2. 452764 964. 974576 1. 000000 10. 000000 Q 33 A 3 247 2. 845000 2. 042431 702. 715000 1. 000000 10. 000000 Q 33 A 4 247 4. 289362 2. 528153 1059. 472340 1. 000000 10. 000000 Q 33 A 5 247 4. 608333 2. 376360 1138. 258333 1. 000000 10. 000000 Q 33 A 6 247 3. 889952 2. 238311 960. 818182 1. 000000 10. 000000 Q 33 A 7 247 4. 504098 2. 584346 1112. 512295 1. 000000 10. 000000 Q 33 A 8 247 3. 144068 2. 055235 776. 584746 1. 000000 10. 000000 Q 33 A 9 247 4. 276316 2. 383541 1056. 250000 1. 000000 10. 000000 Q 33 A 10 247 3. 698347 2. 384724 913. 491736 1. 000000 10. 000000 Q 33 A 11 247 5. 782427 2. 800034 1428. 259414 1. 000000 10. 000000

Inspecting the data The SAS System 10: 40 Tuesday, February 25, 1997 20 Correlation Analysis Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 247 Q 33 A 1 Q 33 A 2 Q 33 A 12 Q 33 A 3 Q 33 A 9 Q 33 A 7 Q 33 A 4 1. 00000 0. 41681 0. 41245 0. 38841 0. 31230 0. 30666 0. 29660 0. 0 0. 0001 Q 33 A 2 Q 33 A 3 Q 33 A 15 Q 33 A 7 Q 33 A 9 Q 33 A 4 1. 00000 0. 48408 0. 41681 0. 36507 0. 35198 0. 33727 0. 32125 0. 0 0. 0001 Q 33 A 3 Q 33 A 2 Q 33 A 1 Q 33 A 4 Q 33 A 10 Q 33 A 15 Q 33 A 8 1. 00000 0. 48408 0. 38841 0. 35555 0. 34687 0. 30598 0. 28709 0. 0 0. 0001 Q 33 A 4 Q 33 A 6 Q 33 A 3 Q 33 A 7 Q 33 A 2 Q 33 A 1 Q 33 A 8 1. 00000 0. 42521 0. 35555 0. 32410 0. 32125 0. 29660 0. 26938 0. 0 0. 0001 Q 33 A 5 Q 33 A 13 Q 33 A 14 Q 33 A 11 Q 33 A 12 Q 33 A 1 Q 33 A 7 1. 00000 0. 20620 0. 17659 0. 11134 -0. 09332 0. 08699 -0. 08606 0. 0 0. 0011 0. 0054 0. 0807 0. 1436 0. 1729 0. 1776 . . .

SAS Code u The code for performing a factor analysis is as follows: proc factor data=hold. cards nfact=6 rotate=varimax out=hold. cards fuzz =. 3; var q 33 a 1 -q 33 a 15; run; input data set n data= n nfact= number of factors asked for n Out= output data set with factor values for each individual n variables in the Factor Analysis fuzz =. 3 , eliminates any value less than. 3 in absolute value in the FA output (see below)

SAS output SAS System 12: 05 Monday, February 24, 1997 1 Initial Factor Method: Principal Components Prior Communality Estimates: ONE 1. Eigenvalues of the Correlation Matrix: Total = 15 Average = 1 1 2 3 4 5 6 7 8 Eigenvalue 3. 9958 1. 5398 1. 1739 1. 0474 0. 9795 0. 8416 0. 8112 0. 7429 Difference 2. 4560 0. 3659 0. 1265 0. 0679 0. 1380 0. 0303 0. 0683 0. 0584 Proportion 0. 2664 0. 1027 0. 0783 0. 0698 0. 0653 0. 0561 0. 0541 0. 0495 Cumulative 0. 2664 0. 3690 0. 4473 0. 5171 0. 5824 0. 6385 0. 6926 0. 7421 9 10 11 12 13 14 15 Eigenvalue 0. 6845 0. 6624 0. 5972 0. 5746 0. 5072 0. 4646 0. 3774 Difference 0. 0221 0. 0652 0. 0226 0. 0673 0. 0426 0. 0872 Proportion 0. 0456 0. 0442 0. 0398 0. 0383 0. 0338 0. 0310 0. 0252 Cumulative 0. 7878 0. 8319 0. 8717 0. 9100 0. 9439 0. 9748 1. 0000 6 factors will be retained by the NFACTOR criterion. 2. Factor Pattern FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 Q 33 A 1 0. 64261 0. 10466 0. 22909 0. 00702 0. 37260 0. 19244 Q 33 A 2 0. 70770 0. 06589 0. 02988 -0. 15700 0. 28081 -0. 19919 Q 33 A 3 0. 64134 0. 14019 -0. 13186 -0. 04640 0. 25951 -0. 24317 Q 33 A 4 0. 57238 0. 35839 -0. 29582 0. 10771 -0. 02757 0. 27656 Q 33 A 5 -0. 08357 0. 53286 0. 53208 -0. 09429 0. 15764 -0. 23052 Q 33 A 6 0. 43021 0. 47717 -0. 41209 -0. 13827 -0. 28342 0. 05263 Q 33 A 7 0. 60480 0. 04908 -0. 10888 0. 33017 -0. 19909 0. 04189 Q 33 A 8 0. 60941 -0. 16610 0. 22173 0. 31312 -0. 23898 -0. 00199 Q 33 A 9 0. 55460 0. 10588 0. 28130 -0. 21724 -0. 42052 -0. 08247 Q 33 A 10 0. 58595 0. 04968 0. 18529 0. 21746 -0. 27972 -0. 17652 Q 33 A 11 -0. 23996 0. 50645 -0. 23555 0. 54390 0. 32545 0. 01683 Q 33 A 12 0. 51615 -0. 32144 0. 32763 0. 05628 0. 22886 0. 55261 Q 33 A 13 -0. 24610 0. 49055 0. 39569 0. 02863 0. 00242 -0. 02809 Q 33 A 14 -0. 37558 0. 46974 0. 09508 -0. 27936 -0. 20472 0. 47604 Q 33 A 15 0. 48719 -0. 00283 -0. 24449 -0. 54913 0. 19115 -0. 02302 Variance explained by each factor FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 3. 995833 1. 539818 1. 173885 1. 047402 0. 979544 0. 841555

SAS output… 3. Final Communality Estimates: Total = 9. 578037 Q 33 A 1 Q 33 A 2 Q 33 A 3 Q 33 A 4 Q 33 A 5 Q 33 A 6 Q 33 A 7 Q 33 A 8 0. 652294 0. 649247 0. 576990 0. 632425 0. 660909 0. 684801 0. 530450 0. 603293 Q 33 A 9 Q 33 A 10 Q 33 A 11 Q 33 A 12 Q 33 A 13 Q 33 A 14 Q 33 A 15 0. 628753 0. 536827 0. 771585 0. 838002 0. 459394 0. 717317 0. 635753 The SAS System 12: 05 Monday, February 24, 1997 2 Rotation Method: Varimax 4. Orthogonal Transformation Matrix 1 2 3 4 5 6 1 0. 60090 0. 60049 0. 31986 -0. 14237 0. 35172 -0. 17902 2 -0. 04409 0. 07833 0. 60942 0. 70061 -0. 16650 0. 31930 3 0. 25547 -0. 14998 -0. 47831 0. 67364 0. 37609 -0. 29702 4 0. 54786 -0. 28654 -0. 18312 -0. 10344 0. 06205 0. 75476 5 -0. 46699 0. 57038 -0. 32109 0. 06755 0. 37548 0. 45600 6 -0. 23126 -0. 45094 0. 40111 -0. 14078 0. 74986 0. 01305

SAS output… 5. Rotated Factor Pattern Q 33 A 1 Q 33 A 2 Q 33 A 3 Q 33 A 4 Q 33 A 5 Q 33 A 6 Q 33 A 7 Q 33 A 8 Q 33 A 9 Q 33 A 10 Q 33 A 11 Q 33 A 12 Q 33 A 13 Q 33 A 14 Q 33 A 15 . . . 0. 59762 0. 71377 0. 49688 0. 68783. . . -0. 38964. 0. 48345 0. 72062 0. 68685. . -0. 45439 0. 60575 . . . 0. 64305. 0. 76294. . . . 0. 42849 0. 301 . . 0. 79651. . . . 0. 64644 0. 39468. 0. 57939. . 0. 86208. . . Variance explained by each factor FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 2. 095412 2. 052493 1. 520768 1. 401878 1. 318371 1. 189115 Final Communality Estimates: Total = 9. 578037 Q 33 A 1 Q 33 A 2 Q 33 A 3 Q 33 A 4 Q 33 A 5 Q 33 A 6 Q 33 A 7 Q 33 A 8 0. 652294 0. 649247 0. 576990 0. 632425 0. 660909 0. 684801 0. 530450 0. 603293 Q 33 A 9 Q 33 A 10 Q 33 A 11 Q 33 A 12 Q 33 A 13 Q 33 A 14 Q 33 A 15 0. 628753 0. 536827 0. 771585 0. 838002 0. 459394 0. 717317 0. 635753 Scoring Coefficients Estimated by Regression Squared Multiple Correlations of the Variables with each Factor FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 1. 000000 . . . . -0. 50583. 0. 83377. . . -0. 3431

SAS output… 6. Standardized Scoring Coefficients FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 Q 33 A 1 -0. 08335 0. 18455 -0. 03213 0. 14900 0. 43336 0. 11645 Q 33 A 2 -0. 05022 0. 41908 -0. 08899 0. 09010 -0. 01441 -0. 01110 Q 33 A 3 -0. 01743 0. 41447 -0. 03231 0. 02842 -0. 12090 0. 11730 Q 33 A 4 0. 00491 -0. 05167 0. 43022 -0. 08589 0. 15910 0. 19259 Q 33 A 5 0. 02684 0. 18768 -0. 15766 0. 60951 -0. 04507 -0. 01852 Q 33 A 6 0. 00968 -0. 01383 0. 53336 -0. 04939 -0. 21569 -0. 04682 Q 33 A 7 0. 32195 -0. 12140 0. 13970 -0. 11504 -0. 00638 0. 15652 Q 33 A 8 0. 42291 -0. 16895 -0. 08466 -0. 01712 0. 06781 -0. 00350 Q 33 A 9 0. 25110 -0. 08846 0. 10820 0. 19609 -0. 12006 -0. 42766 Q 33 A 10 0. 42262 -0. 06087 -0. 03939 0. 09682 -0. 14606 -0. 03909 Q 33 A 11 0. 02286 0. 05149 0. 08346 0. 06972 0. 02062 0. 71907 Q 33 A 12 -0. 07340 -0. 15889 -0. 04086 -0. 05886 0. 76862 -0. 01701 Q 33 A 13 0. 05660 -0. 05396 -0. 00596 0. 46108 0. 02966 0. 03395 Q 33 A 14 -0. 22857 -0. 34256 0. 45994 0. 21551 0. 27576 -0. 19905 Q 33 A 15 -0. 35190 0. 37817 0. 15988 -0. 08769 -0. 01491 -0. 26763

SAS Output - interpretation u The above lines of code result in the output on the last few pages. The output shows: 1. the eigenvalues for each factor (check for reasonable size). The cumulative row shows what percentage of the variance is explained in the Factor Analysis using different numbers of factors. Aim for approximately 60% or more ultimately depending on the interpretability of the Factor Analysis. 2. the unrotated factor pattern (ignore this). 3. final communality estimates (check for any low ones). These show much of each variables variance is explained by the factors. It is desirable for these to be approximately 60% or better for those variables which are important in the final analysis. Any variable with a low communality is essentially NOT used in the factor solution. If an important variable has a low communality, it can be used in a segmentation as a separate variable (more later).

SAS Output - interpretation… u The above lines of code result in the output on the last few pages. The output shows: 4. the orthogonal transformation matrix (ignore this). 5. the rotated factor pattern (the key output - examine this closely). This shows each variables weighting on each factor. The important variables for each factor are those with weightings of around 50% and over. 6. the standardised scoring coefficients (use this in FA regression).

Output - meaning Q 33 A 1. 0. 48 . . 0. 58 . My company wouldn’t use recycled products because they look Q 33 A 2. 0. 72 . . Recycled products seem to be of much lower quality than non- Q 33 A 3. 0. 69 . . Using recycled products results in our equipment breaking down a Q 33 A 4. . 0. 64 . . . They would need to be a lot cheaper before we would consider Q 33 A 5. . . 0. 80 . . If there were no other problems with recycled products we would Q 33 A 6. . 0. 76 . . . All recycled products cost more than non-recycled products. Q 33 A 7 0. 60 . . . It’s not worth the time and effort finding and changing suppliers just Q 33 A 8 0. 71 . . . It would be too hard to make the system changes necessary to use Q 33 A 9 0. 50 . . -0. 51 The range of recycled products available is not wide enough to Q 33 A 10 0. 69 . . . It’s just too difficult to get enough people to change their routines Q 33 A 11. . . 0. 83 We would use recycled products if someone in our company took the Q 33 A 12. . 0. 86 . Using recycled products doesn’t really fit with our image. Q 33 A 13. . . 0. 65 . . If quality, price and availability were the same, we would choose to Q 33 A 14 -0. 39 -0. 45 0. 43 0. 39 . . Manufacturing recycled products is actually less energy efficient and Q 33 A 15. 0. 61 0. 30 . . -0. 34 There are benefits to us if our customers see us as “Green”.

Interpretation u So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows: n n n Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products. ” Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment. ” Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products. ” Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could. ” Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image. ” Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead. ”