
4cf66255437a8fb75b41b52079e99d87.ppt
- Количество слайдов: 48
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
What is TRIM? • TRends and Indices for Monitoring data • Computer program for the analysis of time series of count data with missing observations • Loglinear, Poisson regression (GLM) • Made for the production of wildlife statistics by Statistics Netherlands (Jeroen Pannekoek / freeware / version 3. 0) Introduction
Why TRIM? • To get better indices? No, GLM in statistical packages (Splus, Genstat. . . ) may produce similar results • But statistical packages are often unpractical for large datasets • TRIM is more easy to use Introduction
The program of this workshop Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Introduction
INDEX: the total (= sum of al sites) for a year divided by the total of the base year Introduction
Missing values affect indices Theory imputation
How to impute missing values? 2 6 200 ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1 (site & year effect taken into account) Theory imputation
Another example. . 6 8 200 ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1 Theory imputation
And another example. . . 9 12 300 ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: THREE TIMES AS MANY AS IN YEAR 1 Theory imputation
Try this one…. . THERE IS NOT A SINGLE SOLUTION (TRIM will prompt an ERROR) Theory imputation
Difficult to guess missings here. . Theory imputation
Estimating missing values by an iterative procedure (REQUIRED IN CASE OF MORE THAN A FEW MISSING VALUES) Theory imputation
First estimate of site 2, year 2: 1 X 4/7 = 0. 6 >>1. 6 >>4. 6 >>7. 6 RECALCULATE THE MARGIN TOTALS AND REPEAT ESTIMATION OF MISSING Theory imputation
2 nd estimate of site 2, year 2: 1. 6 X 4. 6/7. 6 = 0. 96 REPEAT AGAIN: MISSING VALUE = 1. 22, 1. 40, 1. 54 ETC. … >> 2 Theory imputation
• To get proper indices, it is necessary to estimate (impute) missings • Missings may be estimated from the margin totals using an iterative procedure (taking into account both site effect as year effect) (Note: TRIM uses a much faster algorithm to impute missing values). • Assumption: year-to-year changes are similar for all sites (assumption will be relaxed later!) • Test this assumption using a Goodness-of-fit (X 2 test) Theory imputation
X 2: COMPARE EXPECTED COUNTS WITH REAL COUNTS PER CELL (1. 8) (1. 2) (4. 2) (2. 8) X 2 IS SUMMATION OF (COUNTED - EXPECTED VALUE)2 / EXP. VALUE (2 -1. 8)2 /1. 8 + (4 -4. 2)2 /4. 2 ETC. >> X 2 = 0. 08 WITH A P-VALUE OF 0. 78 >> MODEL NOT REJECTED (FITS, but note: cell values in this example are too small for a proper X 2 test) Theory imputation
Imputation without covariate (X 2 = 18 and p-value = 0. 18) Theory imputation
Using a covariate: better imputations & indices, X 2 = 1. 7 p = 0. 99 Theory imputation
What is the best model? <<< rejected < not rejected Both model 2 and 3 are valid Theory imputation
Summary imputation theory • To get proper indices, it is necessary to impute missings • Assumption: year-to-year changes are similar for all sites of the same covariate category • Test assumption using a GOF test; if p -value < 0. 05, try better covariates • If these cannot be found, the resulting indices may be of low quality (and standard errors high). See also FAQ’s! Theory imputation
The program of this workshop Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weigh particular sites Using TRIM
Using TRIM • several statistical models (time effects, linear model) • statistical complications (overdispersion, serial correlation) taken into account • Wald tests to test significances • model versus imputed indices • interpretation of slope Using TRIM
Time effects model (skylark data) without covariate Using TRIM
Time effects model with covariate 0 = total 1= dunes 2 = heathland Using TRIM
Lineair trend model (uses trend estimate to impute missing values) Using TRIM
Lineair trend model with a changepoint at year 2 Using TRIM
Lineair trend model with changepoints at year 2 and 3 Using TRIM
Lineair trend model with all changepoints = time effects model Use lineair trend model when: • data are too sparse for the time effects model • one is interested in testing trends, e. g. trends before and after a particular year (or let TRIM stepwise search for relevant changepoints) But be careful with simple linear models! Using TRIM
Statistical complications: • Serial correlation: dependence of counts of earlier years (0 = no corr. ) • Overdispersion: deviation from Poisson distribution (1 = Poisson) Run TRIM with overdispersion = on and serial correlation = on, else standard errors and statistical tests are usually invalid Using TRIM
Running TRIM features • trim command file • output: GOF (as X 2) test and Wald tests • output (fitted values, indices) • indices, time totals • overall trend slope • Frequently Asked Questions • different models (lineair trend model, changepoints, covariate) Using TRIM
What is the best model? Both 2 and 3 are valid. Model 3 is the most sparse model. Using TRIM
Model choice • The indices depend on the statistical model! • TRIM allows to search for the best model using GOF test, Akaikes Information Criterion and Wald tests • In case of substantial overdispersion, one has to rely on the Wald tests Using TRIM
Wald tests Different Wald-tests to test for the significance of: • the trend slope parameters • changes in the slope • deviations from a linear trend • the effect of each covariate Using TRIM
TRIM generates both model indices and imputed indices Using TRIM
Imputed vs model indices Imputed indices: summation of real counts plus - for missing counts model predictions. Closer to real counts (more realistic course in time) Model indices: summation of model predictions of all sites. Often more stable Usually Model and Imputed Indices hardly differ! Using TRIM
TRIM computes both additive and multiplicative slopes Additive + s. e. 0. 0485 0. 0124 Multiplicative + s. e. 1. 0497 0. 0130 Relation: ln(1, 0497) = 0. 0485 Multiplicative parameters are easier to understand Using TRIM
Interpretation multiplicative slope Slope of 1. 05 means 5% increase a year Standard error of 0. 013 means a confidence interval of 2 x 0. 013 = 0. 026 Thus, slope between 1. 024 and 1. 076 Or, 2% to 8% increase a year = significant different from 1 Using TRIM
Summary use of TRIM: • choice between time effects and linear trend model • include overdispersion & serial correlation in models • use GOF and Wald tests for better models and indices & to test hypotheses • choice between model and imputed indices • use multiplicative slope Using TRIM
The program of this workshop Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Weighting
Unequal sampling due to • stratified random site selection, with oversampling of particular strata. Weighting results in unbiased national indices • site selection by the free choice of observers, with oversampling of particular regions & attractive habitat types. Weighting reduces the bias of indices. Weighting
To cope with unequal sampling. • stratify the data, e. g. into regions and habitat types • strata are to be expected to have different indices & trends • weigh strata according to (1) the number of sample sites in the stratum and (2) the area surface of the stratum • or weigh by population size per stratum Weighting
Weighting factor for each stratum or 10 or 5 Weighting factor for stratum i = total area of i / area of i sampled Weighting
Another example. . 100/5= 20 (or 4) 50/10=5 (or 1) Weighting factor for stratum i = total area of i / area of i sampled Weighting
Weighting in TRIM • include weight factor (different per stratum) in data file for each site and year record • weight strata and combine the results to produce a weighted total (= run TRIM with weighting = on and covariate = on) Weighting
Indices for Skylark unweighted (0 = total index 1= dunes 2 = heath-land) Weighting
Indices for Skylark with weight factor for each dune site = 10 (0 = total index 1= dunes 2 = heathland) Weighting
Final remarks To facilitate the calculation of many indices on a routine basis • TRIM in batch mode, using TRIM Command Language (see manual) • Option to incorporate TRIM in your own automation system (Access or Delphi or so) (not in manual)
That’s all, but: • if you have any questions about TRIM, see the manual, the FAQ’s in TRIM or mail Arco van Strien asin@cbs. nl Success!