Скачать презентацию Graphics in EG and R HRP 223 Скачать презентацию Graphics in EG and R HRP 223

011f2c6e3d63bce8fe3eac41c571c1b1.ppt

  • Количество слайдов: 111

Graphics in EG and R HRP 223 – 2009 November 16 th, 2009 Copyright Graphics in EG and R HRP 223 – 2009 November 16 th, 2009 Copyright © 1999 -2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Robbins • Creating More Effective Graphics by Naomi Robbins is a wonderful book showing Robbins • Creating More Effective Graphics by Naomi Robbins is a wonderful book showing the right and wrong ways to visualize scientific data. Read it when you have an afternoon off. It is an ideal read on a transcontinental flight. 2

Why Do Data Visualization? • Well designed pictures will show you the details and Why Do Data Visualization? • Well designed pictures will show you the details and the whole pattern in your data. • Numeric descriptions can easily hide important patterns. • Some patterns are hard to detect in tables. – Whenever data is reported over time or locations, you need art. 3

Fisher’s Plot Data Reported in Cleveland Year 1 Year 2 Based on code written Fisher’s Plot Data Reported in Cleveland Year 1 Year 2 Based on code written by Robert Allison at SAS Institute 4

Scatter Plot for Correlations Anscombe 1973, Graphs in Statistical Analysis 5 All have r Scatter Plot for Correlations Anscombe 1973, Graphs in Statistical Analysis 5 All have r 2 =. 67

Bad Things • First, I want to talk about bad graphics that I frequently Bad Things • First, I want to talk about bad graphics that I frequently see. – 3 d – Pie – Donuts – Stacked graphics 6

General • 3 D graphics – Don’t, Don’t While the SAS implementation of 3 General • 3 D graphics – Don’t, Don’t While the SAS implementation of 3 D graphics is relatively good, don’t use 3 D effects, unless you are measuring something in 3 D. Even then, don’t. 7

Tufte is a God to many. • The empiricist in me is very nervous Tufte is a God to many. • The empiricist in me is very nervous about the amount of pontificating in his books… – I want to have evidence-based advice. • His best advice is to put no extra ink on the page. – Think about the ink-to-information ratio. – Remove all chart junk. Note: the irony of the chart junk on this slide…. 8

Example Bar Chart Serum Samples in Each Trimester You can remove ink rather than Example Bar Chart Serum Samples in Each Trimester You can remove ink rather than adding. 9

Ink-to-Information Ratio • How much ink for seven numbers? Based on Soukup & Davidson, Ink-to-Information Ratio • How much ink for seven numbers? Based on Soukup & Davidson, 2002 Visual Data Mining 10

Cleveland • If you want to know how to do scientific visualization, you must Cleveland • If you want to know how to do scientific visualization, you must read William Cleveland’s work. – He attempted to quantify what makes a good graphic good. • His early work on graphics is one of the reasons why R/S -plus is taking over the statistical world. 11

Pie is bad. • Work by Cleveland (and experimental psychologists) suggests that: – people Pie is bad. • Work by Cleveland (and experimental psychologists) suggests that: – people are bad at judging the relative magnitude of angles – if you twist the rotation of the pie you can cause people to systematically misjudge the size of the angles – a 3 rd dimension makes judgment worse • If you get a glossy handout with a 3 D pie, assume someone is lying to you. • Don’t use them. 12

Don’t Explode! • This exploded 3 D pie (brought to you by Excel) is Don’t Explode! • This exploded 3 D pie (brought to you by Excel) is nearly useless for judging amounts. Total tweaked twisted wrecked 13

Forbidden Donut…. • Donut plots have the same problems as pies (if not worse) Forbidden Donut…. • Donut plots have the same problems as pies (if not worse) …. 14

Stacking is Bad • Cleveland also quantified the fact that people are bad at Stacking is Bad • Cleveland also quantified the fact that people are bad at judging the relative height of stacked data. 15

Wow, a cinnamon roll plot! • Good luck making rapid judgments using this stacked Wow, a cinnamon roll plot! • Good luck making rapid judgments using this stacked 3 D pie. 16

What is a good graphic? • Don’t make your audience think unnecessarily! • Minimize What is a good graphic? • Don’t make your audience think unnecessarily! • Minimize the amount of ink on the page. – This needs to be studied. • Show the central tendency and the variability. • Plot the quantity (inference) that you want people to notice. • Be sure colorblind people can understand it. – Use a black and white photocopier and make sure you can distinguish all groups. 17

Avoid Thinking • But labels on the graphic directly instead of using a key. Avoid Thinking • But labels on the graphic directly instead of using a key. • If you want people to compare the difference between two lines, plot the difference, not the two lines. 18

Bivariate Comparisons with Lines • People are extremely bad at judging the distance between Bivariate Comparisons with Lines • People are extremely bad at judging the distance between two curves. Never ask people to judge up and down (vertical) distances between curves. The distance between the two curves is the same at all points. Based on: Robbins Creating More Effective Graphs, 2005 19

Plot Types • Univariate (one variable) – Categorical variables • Bar charts • Dot Plot Types • Univariate (one variable) – Categorical variables • Bar charts • Dot plots • Waffle plots – Continuous variables • Histogram • Box plot • Violin plots 20

Bar Charts • The ink-to-information ratio is lousy. • A one dimensional quantity is Bar Charts • The ink-to-information ratio is lousy. • A one dimensional quantity is being “expanded” into two dimensions. – Doubling of the amount corresponds to how much of an increase in area? 21

SAS Bar Charts • SAS makes the reader do extra work by rotating the SAS Bar Charts • SAS makes the reader do extra work by rotating the axis labels in Active. X images. • They pointlessly include variable labels by default. 22

How to do it? Notice you can Edit the data and apply filters. You How to do it? Notice you can Edit the data and apply filters. You can right click on variables and apply user-defined formats off the Properties dialog. 23

First create the format. In the Data windowpane of the Bar Chart GUI, right First create the format. In the Data windowpane of the Bar Chart GUI, right click on the variable and change the format to the User Defined format you had created. 24

The GUI is Solid • My only complaints are that the rotate grouping values The GUI is Solid • My only complaints are that the rotate grouping values text does not work (position in this example) and the summary statistics do not show up when you request Active. X images. 25

Saving the Graphic for Publication • The easiest way to get publication quality graphics Saving the Graphic for Publication • The easiest way to get publication quality graphics is to set the output type to be RTF. 26

. PNG format Active. X image format 27 . PNG format Active. X image format 27

Default Output and Graphics • The default graphic format in EG is Active. X. Default Output and Graphics • The default graphic format in EG is Active. X. These images can be edited (even on the web) but they only display with Internet Explorer. I have set my graphics to display as Active. X images. Tweak this with Tools> Options… > Graph. 28

Types of Images • The default formats of the images are determined by the Types of Images • The default formats of the images are determined by the ODS destinations you are using: – LISTING: pgn visible in the Windows Image Fax Viewer – HTML: png, gif, jpg contained in web pages and visible in Internet Explorer, Firefox or Opera – LATEX: Post. Scrpt, epsi, gif, jpeg, pgn are visible in Ghost. View – PCL or PS: contained in Postscript file are visible in Ghost. View – PDF: contained in pdf, which is visible with Adobe Reader – RTF: visible in MS Word 29

I Typically Use HTML Include image_dpi = 200 to set the resolution to be I Typically Use HTML Include image_dpi = 200 to set the resolution to be higher than the default 100 dots per inch. Try 200 for final images pasting into MS Office. This is the appearance template. For optimal results use: Analysis: color Default : overdistinguishes symbols for color or B&W Journal or journal 2, etc: black and white Statistical or statistical 2, etc: color This says the images should show tooltips with extra statistical details when you hover the mouse over parts of the graphic. (I can’t image these. ) 30

Useful ods graphics Options • After the ods graphics on statement, type a / Useful ods graphics Options • After the ods graphics on statement, type a / then: – imagename = “file. Name” – reset • resets the counter of images back to 0. – imagefmt = jpg – width = 4. 5 in – height = 4. 5 in • If you set only width or height, it will use a 4: 3 aspect ratio. 31

ODS Graphics Editor • If you want to do extensive tweaking to a graphic, ODS Graphics Editor • If you want to do extensive tweaking to a graphic, you can use the WYSIWYG ODS Graphics editor. Unfortunately it only works with ODS graphics procedures and you need to rerun the code in SAS to invoke it. – Right click on the graphic node and choose Open… Open Last Submitted Code. – Copy the code beginning with the SQL that makes the data. – Start SAS and paste the code into the program editor. 32

Specifying the plot name is optional in proc freq. 33 Specifying the plot name is optional in proc freq. 33

34 34

WYSIWYG Editing • Right click and/or double click to set properties for objects in WYSIWYG Editing • Right click and/or double click to set properties for objects in the plot. The tool is optimized for some of the ODS style templates but you can use custom colors. 35

 • Right click on things to set properties. – Colors, text details, fonts • Right click on things to set properties. – Colors, text details, fonts – Point and click annotation – Symbols, arrows, text, circles 36

What is ODS? • The Output Delivery System (ODS) controls the type and appearance What is ODS? • The Output Delivery System (ODS) controls the type and appearance of SAS output. Different appearance templates Different output destinations/types. 37

You can browse the ODS appearance templates from the Style Manager on the Tools You can browse the ODS appearance templates from the Style Manager on the Tools menu. 38

ODS Graphics • Compared to the competition, for the last 10 years SAS graphics ODS Graphics • Compared to the competition, for the last 10 years SAS graphics have been between poor and pathetic. – Graphics procedures that rendered okay quality, at best. – No “what you see is what you get” editing. – Many plots were nearly impossible to render. – Custom graphics required extensive programming. • SAS 9. x has attempted to solve this problem. 39

Old vs. New Procedures • The old (commonly used) graphics procedures were gchart, gplot. Old vs. New Procedures • The old (commonly used) graphics procedures were gchart, gplot. • Now most analysis procedures have built in high quality graphics that can be invoked with an ODS graphics on statement. – Early on in the class I told you to tweak the EG options to include “ODS graphics on” with every run. • There also new “easy to use” statistical graphics (sg) procedures. 40

41 41

New Graphics Statistical Graphics Procs • proc sg. Plot – general plotting procedure that New Graphics Statistical Graphics Procs • proc sg. Plot – general plotting procedure that replaces gplot • proc sg. Scatter – lots of tools for scatterplots and scatter matrices • proc sg. Panel – quick and easy trellis/lattice/matrix/panel of plots • Proc sg. Render – used with proc template to make totally custom plots – It replaces proc greplay 42

Plot Types • Univariate (one variable) – Categorical variables • Bar charts • Dot Plot Types • Univariate (one variable) – Categorical variables • Bar charts • Dot plots • Waffle plots – Continuous variables • • Histogram Box plot Violin plots Quantile and QQ plots 43

You can get an okay looking graphic using sgpanel. 44 You can get an okay looking graphic using sgpanel. 44

I was able to get exactly the graphic I wanted using R. 45 I was able to get exactly the graphic I wanted using R. 45

If you want to use R • Download R for Mac or PC cran. If you want to use R • Download R for Mac or PC cran. cnr. berkeley. edu/bin/macosx/ cran. cnr. berkeley. edu/bin/windows/base 46

If you use a PC, also get PERL and Tinn-R • PERL is a If you use a PC, also get PERL and Tinn-R • PERL is a text manipulation language that is used by a couple of key R packages. It ships with Mac OS X. PC users can get Active. Perl (what I use) or Strawberry Perl for Windows. www. perl. org/get. html • Tinn-R is a text editor that knows the R language. sourceforge. net/projects/tinn-r/ 47

R Help • R help files are user hostile. To learn about the options R Help • R help files are user hostile. To learn about the options for dotchart type: ? dotchart • Use: rseek. org 48

Browse • To see why people use R for graphics look here: addictedtor. free. Browse • To see why people use R for graphics look here: addictedtor. free. fr/graphiques/thumbs. php 49

Additional Libraries • If you see sample code that includes require() or library(), you Additional Libraries • If you see sample code that includes require() or library(), you will need to do a onetime download of the additional package. If you are using Vista, run R as the administrator (by right clicking on the R icon instead of just double clicking ) to install and update packages. 50

Waffle Plots • I have not found software to do them. I need to Waffle Plots • I have not found software to do them. I need to find their real name… Image from: Visual language for Designers by Connie Malamed 2009. 51

Continuous Outcomes • The Distribution Analysis menu option can do basic plots. 52 Continuous Outcomes • The Distribution Analysis menu option can do basic plots. 52

The resolution of the histogram is okay but the others are unacceptable. 53 The resolution of the histogram is okay but the others are unacceptable. 53

Use sgplot for high resolution plots. 54 Use sgplot for high resolution plots. 54

55 55

Violin • A violin plot mirrors the shape of the histogram (density). They can Violin • A violin plot mirrors the shape of the histogram (density). They can be done in R. 56

Grouped Categorical Data • To graph categorical data in SAS you need to get Grouped Categorical Data • To graph categorical data in SAS you need to get Michael Friendly’s Visualizing Categorical Data. Unfortunately, his macros are copyrighted with the book… So I will show you the R versions. – Fourfold plots – Mosaic plots – Association plots 57

Fourfold Plots • They draw 4 slices of pie with the area corresponding to Fourfold Plots • They draw 4 slices of pie with the area corresponding to the number of people in each cell of a 2 x 2 table and they have confidence bands such that if the confidence bounds overlap on adjacent pieces, they are not statistically significantly different. 58

More males were admitted than females. There is clear evidence of sexist policies in More males were admitted than females. There is clear evidence of sexist policies in admissions! 59

Department A admitted more females than males and every other department had no bias! Department A admitted more females than males and every other department had no bias! The joy of Simpsons paradox. 60

Mosaic Plots • So you have an contingency table and you want to know Mosaic Plots • So you have an contingency table and you want to know if there is as an association. You do a chi-square test and it says there associations between the rows and columns. What next? 61

Some basic voodoo in R shows which combinations are over (in blue) or under Some basic voodoo in R shows which combinations are over (in blue) or under represented (in red). 62

I prefer the simpler association plots. 63 I prefer the simpler association plots. 63

Grouped Continuous Variables • You can use the Distribution Analysis to get basic grouped Grouped Continuous Variables • You can use the Distribution Analysis to get basic grouped plots. • For better looking plots you need to write sgplot and/or sgpanel code. 64

Request distinct graphics by subgroups. 65 Request distinct graphics by subgroups. 65

66 66

Actually this took a bit of voodoo. 67 Actually this took a bit of voodoo. 67

2 nd 1 st 68 2 nd 1 st 68

Double click here. Put details on the histogram tweaks here. I use/tweak nrow ncol Double click here. Put details on the histogram tweaks here. I use/tweak nrow ncol and endpoints often. endpoints = 2 to 10 by 0. 5 midpoints = 5. 6 5. 8 6. 0 6. 2 6. 4 69

70 70

71 71

Side by Side Violin Plots 72 Side by Side Violin Plots 72

# People in household Scatter Plot 73 # People in household Scatter Plot 73

Take the values and add randuni() # People in household Jittered Plot Notice the Take the values and add randuni() # People in household Jittered Plot Notice the family is gone. They jittered off of the graphic. 74

Jitter vs. Sunflowers In R you can also do sunflower plots. 75 Jitter vs. Sunflowers In R you can also do sunflower plots. 75

Ordinary Least Squares Regression • People typically plot a regression line to show a Ordinary Least Squares Regression • People typically plot a regression line to show a relationship between two continuous variables. 76

Bisquare • Figure out what is an odd value and then put a weight Bisquare • Figure out what is an odd value and then put a weight on it to devalue it. There are many robust regression algorithms around. R and S-Plus software have them well implemented. 77

Loess • This technique essentially creates a rolling window and gets a weighted average Loess • This technique essentially creates a rolling window and gets a weighted average across the values visible inside the window. 78

79 79

80 80

81 81

82 82

Specialized Plots • Most analysis procedures now have customized high resolution graphics. Some are Specialized Plots • Most analysis procedures now have customized high resolution graphics. Some are automatically produced if you type ods graphics on. • Proc Freq – I wanted a deviation plot for a 2 x 2 (or really any sized table) showing which cell is driving a significant chisquare. They only give you a plot for a one-way table. – The ORPlot is very nice. 83

Turn on editable graphics. Specifying the plot name is optional in proc freq. 84 Turn on editable graphics. Specifying the plot name is optional in proc freq. 84

Deviance Plot 85 Deviance Plot 85

86 86

WYSIWYG Editing • Right click and/or double click to set properties for objects in WYSIWYG Editing • Right click and/or double click to set properties for objects in the plot. The tool is optimized for some of the ODS style templates but you can use custom colors. 87

 • Right click on things to set properties. – Colors, text details, fonts • Right click on things to set properties. – Colors, text details, fonts – Point and click annotation – Symbols, arrows, text, circles 88

89 89

90 90

Too Many Graphics • If the ods graphics on statement gives you too many Too Many Graphics • If the ods graphics on statement gives you too many graphics, you can specify which graphics you want by including code designed for the procedure. Typically it looks like this: plot(only) = (table names). This design is poorly implemented because you need to know where to put the plot statement and what the table names are. Does it go on the proc line (like phreg), the tables line (like proc freq), or some other line? Also the table names specified with a plot statement do not always match the ODS table names. 91

 • Usually you can use an ODS exclude statement or an ODS select • Usually you can use an ODS exclude statement or an ODS select statement to pick the correct things to print. 92

Proc phreg has a lot of new features but nothing major in the graphics. Proc phreg has a lot of new features but nothing major in the graphics. With phreg, if you specify ods graphics on you do not automatically get any plots. Here I request survival and cumulative hazard plots including the global confidence limits option (cl). Once again the option names are not consistent with the table names. 93

Proc lifetest can show the number at risk but the implementation is weak. It Proc lifetest can show the number at risk but the implementation is weak. It labels the groups with numbers even if the strata are character strings. You have to manually edit them and this affords ample opportunity for mistakes. I don’t see a way to change the censoring symbol in the legend. 94

Splitting a Grid • Some procedures produce a grid of plots. You can get Splitting a Grid • Some procedures produce a grid of plots. You can get access to the individual plots by specifying plots(unpack). Then you can use plots(only)=table. Name to get just the right parts. • ODS select or exclude statements will not work. 95

96 96

Beyond the Basic Univariate plots • There are 4 SG procedures that allow you Beyond the Basic Univariate plots • There are 4 SG procedures that allow you to build up complex univariate plots and do multivariate (trellis/lattice) plots. 97

New Graphics Statistical Graphics Procs • proc sg. Plot – general plotting procedure that New Graphics Statistical Graphics Procs • proc sg. Plot – general plotting procedure that replaces gplot • proc sg. Scatter – lots of tools for scatterplots and scatter matrices • proc sg. Panel – quick and easy trellis/lattice/matrix/panel of plots • Proc sg. Render – used with proc template to make totally custom plots – It replaces proc greplay 98

proc sg. Plot • Basic plots – scatter, series, band, needle • Fits curves proc sg. Plot • Basic plots – scatter, series, band, needle • Fits curves and generates confidence bounds – loess, regression, penalized b-splines, ellipse • Distributions – boxplots, histograms, normal curves, kernel density • Categorization – dot plots, bar charts, line charts 99 From Heath 2007. SAS/Graph procedures for creating statistical grap

As you add more requests to the plot, it resizes and shifts things to As you add more requests to the plot, it resizes and shifts things to make room. It draws them in the order you request them. It reads the requests from the first listed to the bottom. Change the order if you want to have an item appear layered on top of, or behind, another thing. Some colors are not set yet in the enhanced editor. Use the menu Tools>Options>Enhanced Editor… then click User Defined Keywords to add the coloring. 10

I want to add in a reference line showing what is normal and put I want to add in a reference line showing what is normal and put the categories in order. 10

10 10

10 10

Grids • You can produce lattices full of graphics with proc gpanel. 10 Grids • You can produce lattices full of graphics with proc gpanel. 10

10 10

Spaghetti Plots Data from Singer and Willett: www. ats. ucla. edu/stat/examples/alda. ht m 10 Spaghetti Plots Data from Singer and Willett: www. ats. ucla. edu/stat/examples/alda. ht m 10

10 10

10 10

10 10

11 11

11 11