Скачать презентацию Lies Damned Lies and Health Physics Some Random Скачать презентацию Lies Damned Lies and Health Physics Some Random

87462bfe5ed86abc76612a410f5c0c54.ppt

  • Количество слайдов: 42

Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Tom La. Bone Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011 1

“There are three kinds of lies: lies, damned lies, and statistics. ” Mark Twain “There are three kinds of lies: lies, damned lies, and statistics. ” Mark Twain “It is easy to lie with statistics. ” “It is hard to tell the truth without statistics. " Andrejs Dunkels 2

Today n Informal, mostly apocryphal discussion of what statistics really is, ¨ who practices Today n Informal, mostly apocryphal discussion of what statistics really is, ¨ who practices statistics and how they do it, and ¨ why all of this is important to you as a health physicist ¨ n Main message of talk A good working knowledge of statistics is essential in any endeavor where data are collected analyzed (e. g. , health physics) ¨ Everyone in the room should become a statistician (of sorts) ¨ n No math is used in this presentation and no health physicists were harmed during its preparation 3

Health Physics and Statistics n Some HP “stat” books I used in school ¨ Health Physics and Statistics n Some HP “stat” books I used in school ¨ G. F. Knoll Radiation Detection and Measurement 1 st Edition 1979 ¨ J. Shapiro Radiation Protection 1 nd Edition 1972 ¨ H. Cember Introduction to Health Physics 1 st Edition 1969 ¨ R. D. Evans The Atomic Nucleus 1955 ¨ P. R. Bevington Data Reduction and Error Analysis for the Physical Sciences 1 st Edition 1969 n Statistics was a tool, a “wrench to turn a nut” ¨ Is that all it is? 4

What is Statistics? “Humans are good, she knew, at discerning subtle patterns that are What is Statistics? “Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent. ” Carl Sagan in Contact 5

Signals and Noise Useful information comes to us in the form of signals that Signals and Noise Useful information comes to us in the form of signals that form distinct patterns n The signals are contaminated with varying degrees of noise, which can make it difficult to see the signal n 6

Seeing Patterns n In our evolutionary history, seeing patterns where none existed may have Seeing Patterns n In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did exist ¨ That noise in the grass – is it just the wind or is it a lion? n So, we as a species got very good at seeing patterns, even in the absence of a signal 7

Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data n What do you see below? n 8

Face on Mars Viking 1 Orbiter Mars Global Surveyor 9 Face on Mars Viking 1 Orbiter Mars Global Surveyor 9

Face in Food, et cetera 10 Face in Food, et cetera 10

Face in Data 11 Face in Data 11

Statistics is … n n … a science that helps us to differentiate signal Statistics is … n n … a science that helps us to differentiate signal from noise and make decisions with a known probability of being wrong … a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs … based on the idea that variability and noise are natural and unavoidable … a relatively modern science that is actively evolving ¨ especially available since cheap, powerful computers became 12

Really, What is Statistics? “Statistics is concerned with collecting, analyzing, and interpreting data in Really, What is Statistics? “Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation” Chris Chatfield Problem Solving: A Statistician’s Guide 13

Exploratory Data Analysis n Look at data (usually with graphics) and use our ability Exploratory Data Analysis n Look at data (usually with graphics) and use our ability to see patterns in the data to ¨ Suggest hypotheses to test ¨ Assess validity of assumptions on which statistical inference will be based ¨ Support the selection of appropriate inferential tests ¨ Suggest ideas for further data collection 14

Air Filters Fecal Samples 15 Air Filters Fecal Samples 15

Confirmatory Data Analysis n Use statistical tests to answer questions about the data along Confirmatory Data Analysis n Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion ¨ Is the material on the filters the same material that is in the fecal samples? ¨ Are the Pu-239 to Am-241 ratios in the fecal samples and air samples the same once we account for random noise? 16

Fecal Samples 2 95% CI = (1. 33, 1. 46) 17 Fecal Samples 2 95% CI = (1. 33, 1. 46) 17

Data Dredging n n n Are the two Pu-239 to Am-241 ratios the same? Data Dredging n n n Are the two Pu-239 to Am-241 ratios the same? If this question was asked before we saw the data we can proceed with the test to answer it If this question was inspired by the data then we should not test the same data to get the answer ¨ Referred to as data snooping, data dredging, etc. ¨ Cancer clusters 18

Statistical Method n Define the problem ¨ Formulate your questions in such a way Statistical Method n Define the problem ¨ Formulate your questions in such a way that unambiguous answers are possible n Collect data ¨ Collect n n data capable of answering your question Analyze the data Present the results ¨ in terms your audience can understand 19

Define the Problem “An approximate answer to the right problem is worth a good Define the Problem “An approximate answer to the right problem is worth a good deal more than an exact "It is better to solve the right problem the answer to an approximate problem. ” wrong way than to solve the wrong problem the right way". John Tukey Richard Hamming 20

Data Collection n Collect data that are capable of answering the question asked (Data Data Collection n Collect data that are capable of answering the question asked (Data Quality Objectives) ¨ Designed experiments ¨ Observational studies n Sampling ¨ You select samples from a population in order to make inferences about the population 21

GIGO n n The collection of data is often the most timeconsuming and expensive GIGO n n The collection of data is often the most timeconsuming and expensive part of a study Reverend Bayes and all of his horses can’t fix a bum dataset 22

Analyze the Data n n All statistical procedures have assumptions In practice, the assumptions Analyze the Data n n All statistical procedures have assumptions In practice, the assumptions of any given statistical procedure are violated to some degree ¨ Can n n n the validity of the assumptions be verified? the validity of the answer be verified? How robust is your statistical procedure to violations of its assumptions? Simple approximate solutions you can understand may be better than complex exact solutions that you can’t Augment standard statistical analyses with simulations 23

Present Results n Technical answer versus the functional answer ¨ “the null hypothesis is Present Results n Technical answer versus the functional answer ¨ “the null hypothesis is not rejected” ¨ technically “not rejected” ¹ “accepted” ¨ functionally “not rejected” = “accepted” n Statistical significance and practical significance ¨ Apply “so what” test to your answers 24

What is a Statistician? “Powerful spirits should only be called by the master himself” What is a Statistician? “Powerful spirits should only be called by the master himself” Goethe The Sorcerer's Apprentice 25

What is a Statistician? n n n Based on Chatfield’s definition of statistics, anyone What is a Statistician? n n n Based on Chatfield’s definition of statistics, anyone who makes decisions based on the analysis of data might be called a statistician However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics Key difference between the sorcerer and his apprentice Contrary to what you might think, there is a lot of subjectivity and professional judgment in the practice of statistics ¨ Statistics is vast in scope and detail, and the apprentice does not know what he does not know ¨ “It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so. ” Mark Twain 26

The Sorcerer’s Apprentice n n We may not be statisticians, but we are clearly The Sorcerer’s Apprentice n n We may not be statisticians, but we are clearly doing statistics, often without adult supervision Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control “Should I refuse a good dinner simply because I do not understand the processes of digestion? ” Oliver Heaviside [On being criticized for using formal mathematical manipulations without understanding how they worked] 27

How We Can be Better Statisticians Master the basics n Learn the language n How We Can be Better Statisticians Master the basics n Learn the language n Play with your data n Use better software n Perform reproducible work n Consult with a real statistician n 28

Master the Basics Kahn Academy http: //www. khanacademy. org/ 29 Master the Basics Kahn Academy http: //www. khanacademy. org/ 29

Statistics MS/Certificate Distance Programs University of South Carolina n Colorado State University n Texas Statistics MS/Certificate Distance Programs University of South Carolina n Colorado State University n Texas A&M University n Penn State University n 30

Concepts and Terminology n Specialized Concepts ¨ n Population versus sample for example Statistics Concepts and Terminology n Specialized Concepts ¨ n Population versus sample for example Statistics has a very precise language all its own “the null hypothesis is not rejected” ¨ “not rejected” ¹ “accepted” ¨ n Questions and answers are not right unless you use the proper language to convey the proper concept ¨ n some statisticians can be intolerant of laymen who misuse the language of statistics Learn to phrase questions and interpret answers properly 31

Exploratory Statistics Learn to play with your data and see if it is trying Exploratory Statistics Learn to play with your data and see if it is trying to tell you something new n Study graphs of your data n “There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart. ” John Tukey 32

Software used for Statistics n I use the following software for statistical calculations (in Software used for Statistics n I use the following software for statistical calculations (in order of usage) ¨R ¨ Minitab ¨ SAS ¨ Spreadsheet n (e. g. , MS Excel, Gnumeric) There are many others 33

Spreadsheets (Excel) n What some people can do in Excel is nothing short of Spreadsheets (Excel) n What some people can do in Excel is nothing short of amazing (but should they be doing it? ) ¨ Amarillo Slim beat tennis champ Bobby Riggs at Ping. Pong, using a frying pan instead of a paddle n Spreadsheet Addiction by Patrick Burns ¨ http: //lib. stat. cmu. edu/S/Spoetry/Tutor/spreadsheet_ad diction. html n Problems with spreadsheet implementation ¨ Excel n has a long history of doing bad stats Problems with spreadsheet paradigm ¨ Reproducible science 34

http: //www. msnbc. msn. com/id/21033161/from/RS. 1/ 9/28/2007 M. G. Almiron et al. On the http: //www. msnbc. msn. com/id/21033161/from/RS. 1/ 9/28/2007 M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010 35

Reproducible Research n Reproducible research refers to the idea that the ultimate product of Reproducible Research n Reproducible research refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results Raw Data Massaging Calculations Plots and Tables Final Paper 36

The R Project for Statistical Computing n n R is a language and environment The R Project for Statistical Computing n n R is a language and environment for statistical computing and graphics R is available as Free Software under the terms of the GNU General Public License in source code form It compiles and runs on a wide variety of UNIX platforms and similar systems (including Free. BSD and Linux), Windows and Mac. OS Download from http: //www. r-project. org/ 37

Advantages of R n Command line interface rather than a GUI ¨ Promotes n Advantages of R n Command line interface rather than a GUI ¨ Promotes n reproducible statistics Open source ¨ Flexible licensing ¨ Availability of source code for peer review ¨ Bugs are public knowledge and are fixed quickly ¨ New tests and methods tend to appear first in R n n Many dozens of recently published books devoted to R Free (and very good) community support available 38

Consult with a Statistician n If you are going to involve a statistician, do Consult with a Statistician n If you are going to involve a statistician, do it at the study design and data collection phases ¨ If not, at least estimate how much it will cost to collect the data all over again n Anybody can analyze compelling data “To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of. ” Sir Ronald Fisher 39

Twisted Answers to Crooked Questions n n As health physicists there are times when Twisted Answers to Crooked Questions n n As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis In such situations we base our decisions on professional judgment, often augmented with “statistics” ¨ We must not fool ourselves about what we are doing n … of all the wrong answers we have to choose from, this one is the best ¨ We have no right to expect a statistician to endorse such mischief 40

The Apprentice Should Beware of … The Management Prior n Being bamboozled by other The Apprentice Should Beware of … The Management Prior n Being bamboozled by other people’s statistics n “The only right way to do this is X [insert statistical method here]” n Being seduced by complexity n 41

Statistics in the Workplace: Musings of a Sorcerer's Apprentice Presentation to USC Stat Club Statistics in the Workplace: Musings of a Sorcerer's Apprentice Presentation to USC Stat Club March 26, 2009 n Main message ¨A degree in statistics is a “Swiss Army Knife” that is very useful in any endeavor where data are collected analyzed ¨ Everyone in the room should become a health physicist (I had no takers) 42