Computer Intensive Statistical Methods
Validation, Model Selection and Bootstrap
Chapman and Hall, 1994
From the preface:
Much statistical work and data analysis is now
made by computers in ways that are too complicated
for realistic analytical treatment.
Automatic model selection poses for example
new statistical questions which have been around
for some time, but first recently have found
working solutions. The statistical properties of
the results from these and other extensive
computations may well be different from the results
from classical analyses. These problems are present
in most regression and time series modelling and
also in classification, clustering and the like.
The new effects caused by all this computation
can be approached by a further round of
computations as we do in validation and bootstrap
methods. Probably this is the only general way to proceed.
Contents
1 Prelude 1
1.1 Background 1
1.2 About models 3
1.2.1 Notation 5
1.2.2 Some model examples 6
2 Computer intensive philosophy 12
2.1 Sources of error 14
2.2 Model selection uncertainty 15
3 Cross validation 24
3.1 Introduction 24
3.2 Cross validation as estimation method 28
3.3 Selection of variables in multiple regression 30
3.4 A theoretical difficulty and a deeper look at
the model selection measure 34
3.5 Validation of model size 38
3.6 Cross Model Validation 40
3.6.1 Flow shart of Cross Model Validation
for independent observations 42
3.6.2 Rationalizations 43
3.7 Some alternative selection criteria 45
3.8 A critical look at validation 47
3.9 A meteorological data set 49
3.10 CMV fortran code 50
3.11 Exercises 54
4 Validation of time series problems 57
4.1 Introduction 57
4.1.1 Terminology 58
4.2 Model examples 59
4.2.1 Scalar model, type one 59
4.2.2 Multiplicative effects 61
4.2.3 Scalar model, type two 62
4.2.4 A multivariate model 63
4.3 Forward validation 65
4.3.1 Selection procedure 67
4.3.2 Estimate of performance 67
4.3.3 Weights 69
4.3.4 Forward validation flow shart 71
4.3.5 Computer output example 71
4.4 Confusing information, large and small model sets 74
4.4.1 A multivariate data base 75
4.5 Appendix 77
4.5.1 Examples of loss functions 77
4.6 Forward validation fortran code 79
4.7 Exercises 84
5 Statistical bootstrap 85
5.1 The parameter concept 85
5.2 Classical measures and the bootstrap 88
5.3 The bootstrap method 90
5.3.1 Flow chart of bootstrapped parameter estimation 93
5.4 Numerical illustration in two versions 94
5.4.1 Median estimation by the median 94
5.4.2 Median estimation by the average 96
5.5 Double bootstrap 99
5.5.1 Flow chart of double bootstrap bias correction 100
5.6 Percentile estimation 101
5.7 Confidence intervals 103
5.7.1 Simple intervals 104
5.7.2 Studentized intervals 105
5.8 Statistical small sample properties 109
5.9 Bootstrap as a definer of functions 112
5.10 Example of bootstrap program, fortran code 124
5.11 Bootstrap exercises 127
6 Further bootstrap results 131
6.1 Parametric bootstrap 131
6.2 Basic asymptotic concepts 135
6.2.1 Boundedness, root-n-consistency, and order of convergence 135
6.2.2 Ordo terms 138
6.2.3 A reminder of some classical limit theorems 138
6.3 Convergence of resampling distributions 140
6.3.1 The finite case 140
6.3.2 A continuous continuation 142
6.4 Asymptotic results for averages and percentiles 145
6.4.1 Averages 146
6.4.2 Median and percentile estimation 149
6.5 Edgeworth expansion 151
6.5.1 Empirical Edgeworth expansion 157
6.5.2 Percentile approximation 158
6.6 Confidence interval methods 160
6.6.1 A parametric case with studentized interval 160
6.6.2 Transformation based methods 166
6.6.3 A prepivoting method 175
6.6.4 Asymptotic properties of the prepivoting 180
6.6.5 Loh's level adjustment 183
6.7 Bootstrapping regression models 185
6.7.1 Basic residual resampling 186
6.7.2 Vector resampling 187
6.7.3 Projected residuals 188
6.7.4 Nonlinear regression 190
6.7.5 Abstract residual resampling 190
6.7.6 Varying variance 194
6.7.7 Resampling in generalized linear models 195
6.7.8 Model selection and resampling 200
6.8 Bootstrap realizations of a stationary process 204
6.8.1 Residual resampling 205
6.8.2 Spectral resampling 207
6.9 Exercises 214
7 Computer intensive applications 215
7.1 Validation and bootstrap in road safety analysis 215
7.1.1 The single crossing 215
7.1.2 Several crossings 217
7.1.3 Estimating the uncertainty of a 219
7.2 Forward validation on the stock market 221
7.3 Model selection and validation in meteorology 230
7.3.1 Covariance and linear prediction 230
7.3.2 Validation 232
7.3.3 Illustration of results 233
7.3.4 Validating the need of data 236
7.3.5 Validation of rain probability forecasts 238
7.4 Bootstrapping a cost function 242
7.4.1 The replacement problem 242
7.4.2 Estimation 243
7.4.3 Bootstrap resampling 245
7.4.4 Appendix: A continuous empirical survival function 249
7.5 A backbone bootstrap 251
7.5.1 Resampling procedure 252
References 255
Index 261
Programs in the book and a few more (in fortran):
Cross Validation, kova.f
CV of Forward Selection, kovafs.f
Forward Validation, forval.f
Adaptive FV version, forvalad.f
Bootstrap example, bo.f