Скачать презентацию Quick Simple Simulation in Excel with Clinical Скачать презентацию Quick Simple Simulation in Excel with Clinical

bdf57818874de0ffc99ac1c503b90ca6.ppt

  • Количество слайдов: 57

Quick & Simple Simulation in Excel with Clinical Trials Applications Presented to the Delaware Quick & Simple Simulation in Excel with Clinical Trials Applications Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph. D. www. Dennis-Sweitzer. com

Background • Occasional need for simulations • Excel is convenient, but – does not Background • Occasional need for simulations • Excel is convenient, but – does not explicitly support simulations – Simulation usually requires VBA programming (so why not use R or SAS instead) – Or Add-in commercial programs (eg. , @Risk) – Or some academic add-ins • Does have iterative calculations, Solver • Why not simulation?

Simulate what? • Stochastic Models – Unknown parameters? Guestimate a distribution – Optimizing policy? Simulate what? • Stochastic Models – Unknown parameters? Guestimate a distribution – Optimizing policy? Test each with simulations • Sensitivity Analysis – Variations in Inputs Variations in Outputs – 2 parameters: use a table – >2 parameters: simulate & compare variation

Excel: Pros Common Language / Common Tools MEGO • Most people understand Excel • Excel: Pros Common Language / Common Tools MEGO • Most people understand Excel • Many tools available in Excel Transparency: Modeling assumptions can be: Specified -- Graphed -- Debated What you see is what you get! More hands on deck, more eyes on the prize…. : Statistician Team Member Initial Model Explores & breaks model Repair & enhance …Repeat until satisfied

Excel Cons Slower than in SAS, S+, R, etc Lacks some statistical/probability functions • Excel Cons Slower than in SAS, S+, R, etc Lacks some statistical/probability functions • Latest versions are a little better • Still need to add some VBA code • Known bugs in statistical routines (often fixed) Tradeoffs: • Quicker modifications vs slower execution

Simple Solution: Data Tables Excel Data Tables • Creates a table of values of Simple Solution: Data Tables Excel Data Tables • Creates a table of values of a function – (ie, Random Variables) • Leftmost column is used as an argument – (which is ignored in a simulation) • Data Table repeats calculations for each row – (Each row is an iteration of the simulation)

1. Create Simulation Create Random Variables using Inverse Probability Method: For Random Variable X 1. Create Simulation Create Random Variables using Inverse Probability Method: For Random Variable X with distribution function F(x), F(x): → [0, 1] If Random Uniform UÎ [0, 1] X = F-1(U) (Excel: U=Rand() )

2. Align Random Variables • Calculations can be anywhere in Spreadsheet • Reference the 2. Align Random Variables • Calculations can be anywhere in Spreadsheet • Reference the Variables in a row • Is best to label variables in same way

3. Select Data Table • Select table region – 1 st row is Rand 3. Select Data Table • Select table region – 1 st row is Rand Vars – 1 st column is not used (can label iterations) • From toolbar: – Data>Data Table

4. Create Simulation Table • Column input cell = Upper left hand corner of 4. Create Simulation Table • Column input cell = Upper left hand corner of table • Row input cell = ignore • OK Populates the table • (may have to manually recalcule)

5. Execute Simulation Iterative development • Simulation can be changed • Add reporting variables 5. Execute Simulation Iterative development • Simulation can be changed • Add reporting variables • Recalculate to rerun – (no need to use Data Table again, unless expanding) • Hint: debug with short table, expand for final run

The End (of the key concepts) The End (of the key concepts)

But still more…. • Why use inverse probability distributions (instead of random variables)? • But still more…. • Why use inverse probability distributions (instead of random variables)? • When not to use a spreadsheet for simulation? • Tools: – Macros to set up a simulation – VBA functions for common simulation distributions

Inverse Probability Function • Most systems directly generate random variables with the desired distribution Inverse Probability Function • Most systems directly generate random variables with the desired distribution • Why use Inverse Probability Functions? – Which are (probably) slower? Personal opinion • Testing & Debugging • Verification Calculates correctly • Validation Calculations answer Problem • Sensitivity Input vs Output variability

As Mapping function U F-1 Probability Distribution: F(x): → [0, 1] Random Uniform: UÎ As Mapping function U F-1 Probability Distribution: F(x): → [0, 1] Random Uniform: UÎ (0, 1] Inverse PDF: X = F-1(U) For Continuous (or monotone) F-1 Small changes in u∈U small changes in F-1 (u)

Mapping 2 Random Uniform Var As input to Deterministic Function Mapping 2 Random Uniform Var As input to Deterministic Function

Mapping Random numbers in (should) Map to outputs in Mapping Random numbers in (should) Map to outputs in

Example #1 Simple model, function of 2 RV A Max value looks high. Is Example #1 Simple model, function of 2 RV A Max value looks high. Is it a bug? If not, how often? Saved random U[0, 1] For each iteration Check u∈U[0, 1] That generated high value u=0. 983… random high Rarely happens Saving {Ui}: • Verify • Replicate • Quantify

Example #1 (Sensitivity) Sort by U 1, U 2 Sensitive to U 1 Insensitive Example #1 (Sensitivity) Sort by U 1, U 2 Sensitive to U 1 Insensitive to U 2

Spreadsheet limitations • Only simple data structures are available – Rows & columns, no Spreadsheet limitations • Only simple data structures are available – Rows & columns, no lists & trees – Discrete event simulations • Complex algorithms: difficult – Eg, While or for loops – Can improvise (cumbersome, slow, buggy) • Speed: slow • Data Storage: what-you-see-is-all-you-get

Tools: Excel Simulation Template • Adds some missing random functions • Adds some set-up Tools: Excel Simulation Template • Adds some missing random functions • Adds some set-up macros

Macro Simulate. Sampler To start a new simulation when you don't remember the names Macro Simulate. Sampler To start a new simulation when you don't remember the names & parameters of common random variables used in simulation: • Run the Macro Simulation. Sample • Copy, delete, and edit as needed. • Make sure all random values are referenced in the first row of the data table at the bottom.

Macro Simulation. Sampler • Creates a simulation with each of common simulation functions Macro Simulation. Sampler • Creates a simulation with each of common simulation functions

Macro Simulation. Sampler ……… • Sets up header row for data table • Sets Macro Simulation. Sampler ……… • Sets up header row for data table • Sets up a place for statistics

Macro Simulate • Highlight the row of random variables – (1 st row of Macro Simulate • Highlight the row of random variables – (1 st row of simulation table) • Run macro "Simulate” – Prompts for which will ask for the number of simulation iterations, – The default number of iterations is 100 – Debug & develop (manually recalculate) – Final run with >1000 iterations – Visual Basic code is computationally intensive,

Macro Simulate Macro Simulate

Note bene • Run Simulate right after Simulation. Sampler – Risk of “Ref!” error Note bene • Run Simulate right after Simulation. Sampler – Risk of “Ref!” error • Sim. Template, Plot, Sampler contains – The sampler – A distribution plot of all random variable • Crude, but handy for quick comparisons – Ready to edit

Sim. Template, Plot, Sampler • Crude Distribution plot of ALL variables • Uses Percentile Sim. Template, Plot, Sampler • Crude Distribution plot of ALL variables • Uses Percentile Ranks to save space • Good for Continuous Var. • Bad for Discrete Var. 1. Copy 2. Delete unwanted variables 3. Make it pretty

Excel Random Variables Rand() --Random Uniform [0, 1] Norm. SInv() – Inverse Standard Normal Excel Random Variables Rand() --Random Uniform [0, 1] Norm. SInv() – Inverse Standard Normal Distribution Critical. Binomial() – Inverse Binomial Distribution Log. Norm. Inv() - Inverse Log Normal Distribution Caveat: parameters are mean, SD after the Log transformation

Erlang Distribution How long do you wait until you get a predetermined number of Erlang Distribution How long do you wait until you get a predetermined number of arrivals? • Interarrival times are distributed IID exponential • Erlang is Gamma with integer parameter

Beta Distribution Can use as • Distribution of a Binomial probability • Range = Beta Distribution Can use as • Distribution of a Binomial probability • Range = [0, 1] • Generic bounded hump (vs Normal as generic unbounded hump)

Example#2, Problem Client: “Here’s our plan…. ” • Simple spreadsheet calculation – But only Example#2, Problem Client: “Here’s our plan…. ” • Simple spreadsheet calculation – But only the expected value, – but not variability

Example #2, Simulation • Time to 100 th patient • Patients arrive IID Exponential Example #2, Simulation • Time to 100 th patient • Patients arrive IID Exponential Summary Statistics of Simulated values (below) Interpretation: under the assumptions, 90% of simulations required more than 4. 4 months

Added VBA Functions Inverse Functions Needed for Simulation • Poisson, Negative Binomial Interpolation from Added VBA Functions Inverse Functions Needed for Simulation • Poisson, Negative Binomial Interpolation from Table • Interpolate: 1 or 2 dimensional interpolation Convenience • Beta with Mean, SD as parameters • Beta with Hi, Low, and Mode used for parameters (often used for PERT/CPM charts) • Log Normal with mean, SD as parameters

Missing Statistical Functions • Inv. Poisson : : Poisson Distribution • Inv. Pascal : Missing Statistical Functions • Inv. Poisson : : Poisson Distribution • Inv. Pascal : : Integer valued Negative Binomial – (how many failures before k successes) Negative Binomial is continuous valued distribution; discrete version is often denoted Pascal distribution

Example#3, Patients to Screen Expected Enrollment rate = 75% ± 5% ~ Beta Distribution Example#3, Patients to Screen Expected Enrollment rate = 75% ± 5% ~ Beta Distribution # Screen Failures ~ Negative Binomial (Pascal) – Depends on Enrollment Rate

Beta Distribution (2) For Convenience • Beta distribution given Mean, SD, upper, lower bounds Beta Distribution (2) For Convenience • Beta distribution given Mean, SD, upper, lower bounds • Beta distribution given Mode, Upper, Lower bounds – Sometimes used for PERT/Critical Path Analysis • 3 estimates for tasks: Optimistic, Pessimistic, Most Likely • Beta distributed time for each task – Assumes SD = 1/6 of the interval [low, high]

Simulation from a Table Simulate arbitrary distribution: • Top Row: values in [0, 1] Simulation from a Table Simulate arbitrary distribution: • Top Row: values in [0, 1] • Bottom Row: Quantiles • Result: interpolated value of U from table Or a function: y=f(x) • X is found in top row, y is interpolated from bottom row

Table Simulation Uses • Polygonal distributions (like Triangular) • Survival curve (for time to Table Simulation Uses • Polygonal distributions (like Triangular) • Survival curve (for time to event) –Est. K-M curve from data, simulate rest of trial • Arbitrary empirical distributions • Distribution from observations

Simulation from a 2 -dimensional table Here: • Rows are quartiles of a random Simulation from a 2 -dimensional table Here: • Rows are quartiles of a random function • Left column is value of a parameter • A family of distributions which vary with the parameter • Parameter y=75% (can be random) • Generate random numbers from the interpolated distribution.

Example #4: Interim Review • After 2 months, review randomization rates • Continue to Example #4: Interim Review • After 2 months, review randomization rates • Continue to Randomize to 100 patients • How long?

Example#4: Interim Review (Simulation) Y= # Patients at 2 mos ~ Poisson Time to Example#4: Interim Review (Simulation) Y= # Patients at 2 mos ~ Poisson Time to Randomize (100 -Y) additional pts ~ Erlang (Gamma) 80% CI: ; (2. 5, 3. 7) months

Clinical Trials Applications • Simulations for planning • Prototyping larger simulation • Checking assumptions/validation Clinical Trials Applications • Simulations for planning • Prototyping larger simulation • Checking assumptions/validation

Why Simulate? Expected Trial Performance • Usually not of interest -- already done w/o Why Simulate? Expected Trial Performance • Usually not of interest -- already done w/o simulation Variability of Trial Performance • Important for Risk Management: “What’s the earliest, the latest, the most, the least, etc” • 80% CIs Structural Problems • Interactions of parameters may doom the trial before it even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

Prototyping: • Toy simulation with hands-on teamwork • Development model • Get team buy-in Prototyping: • Toy simulation with hands-on teamwork • Development model • Get team buy-in on assumptions • Processing speed not important • Rapid modifications are important Ideal? • Develop a prototype in an 1 hour meeting • Check for errors later • Run large simulations later for precise estimates

Checking planning assumptions • H 0 = Simulation assumptions • Observed: a value X Checking planning assumptions • H 0 = Simulation assumptions • Observed: a value X • {xi} = corresponding values in simulation • Rank of X in {xi} ≈ p-value Stored Values: Use Function Percent Rank Descriptive Statistics: Use Frequency Count Use to: • Test assumptions, validate model, +? ? • If an observed value of X is rare in the simulation, question assumptions!

Checking Assumptions (2) Example: • A trial is designed based on a non-trivial simulation. Checking Assumptions (2) Example: • A trial is designed based on a non-trivial simulation. • The model predicts a completion rate of 65% with 95% C. I. = (55%, 75%) • 4 months into the trial, a 50% completion rate is observed. • How significant is this discrepancy? Resimulate: • {xi} = simulated completion rates (1/iteration) • Rank of observed 50% in {xi} ≈ p-value • “How likely is the observation, under the modeled assumptions? ”

Example #5: Simulating a 30 patient trial • Each patient is a random variable Example #5: Simulating a 30 patient trial • Each patient is a random variable • Survival times are interpolated • Estimated survival curves have confidence intervals • All 30 patients in an iteration use the same random conf. level • Conf. Level is updated each iteration

Example #5: Testing Assumptions • Assume the trial was carried out 70% of patients Example #5: Testing Assumptions • Assume the trial was carried out 70% of patients complete Q: is this consistent with the simulations? A: Yes, but… Only 6. 1% of simulations had >70% completion Statistics on the patients are the simulation random • variables

Macro Management VBA Editor: Alt-F 11 (or find the menu) • Some versions of Macro Management VBA Editor: Alt-F 11 (or find the menu) • Some versions of Excel • Copy Module between sheets • Copy code from. xls sheet & insert into VBA editor • Open & save as new sheet

Macro Management (newer) Later versions: In Visual Basic From the Tool Bar • File Macro Management (newer) Later versions: In Visual Basic From the Tool Bar • File > Export File – Export VBA code (module: “Sweitzer. Simulation. Core. Code”) • File > Import File – Imports VBA code (into a module)

Further resources Commercial and Free software packages Provide: • More rigorous algorithms • More Further resources Commercial and Free software packages Provide: • More rigorous algorithms • More functions – Resampling, multivariate, etc • More support

Commercial Add-Ins @RISK www. palisade. com Crystal Ball www. decisioneering. com Commercial Add-Ins @RISK www. palisade. com Crystal Ball www. decisioneering. com

Free Add-Ins Pop. Tools www. cse. csiro. au/poptools Sim. Tools. xla http: //home. uchicago. Free Add-Ins Pop. Tools www. cse. csiro. au/poptools Sim. Tools. xla http: //home. uchicago. edu/~rmyerson/addins. htm Caveat: Licensing • Free for non-commercial (eg, education) • Not clear for other uses

Semi-Commercial Low-cost Excel simulation add-in: • Risk. Sim by Michael Middleton www. treeplan. com Semi-Commercial Low-cost Excel simulation add-in: • Risk. Sim by Michael Middleton www. treeplan. com –Currently 1 output var with lots of graphs – also: decision trees, sensitivity analysis – on-line text-book: http: //www. treeplan. com/chapters. htm

Additional Reading INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION www. informs-sim. Additional Reading INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION www. informs-sim. org/wsc 07 papers/008. pdf Spreadsheet Simulation (Seila, 2006) www. informs-sim. org/wsc 06 papers/002. pdf Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments www. informs-sim. org/wsc 06 papers/005. pdf Tips for the Successful Practice of Simulation www. informs-sim. org/wsc 06 papers/007. pdf

The End (Actual – not simulated) The End (Actual – not simulated)