pmcalibration
The code below has been
expanded into the pminternal
package - see https://stephenrho.github.io/pminternal/
pmcalibration
is focused on the external
validation of existing models on new data. Nevertheless, it is
possible to use the functions provided to perform internal
validation of a model developed on a particular data set. This
vignette focuses on internal validation via the calculation of
optimism through bootstrap resampling (see References).
First we simulate some data with two ‘true’ predictors
(x1
, x2
) that interact to predict the outcome
and we add two ‘noise’ variables (x3
, x4
)
which do not relate to the outcome. The model we consider
(m1
) is one with all 4 variables and their two way
interactions.1
library(pmcalibration)
library(data.table)
set.seed(1234)
n <- 1000
dat <- sim_dat(n, a1 = -3, a3 = .3)
# add some noise variables
dat$x3 <- rnorm(n)
dat$x4 <- rnorm(n)
mean(dat$y)
#> [1] 0.107
m1 <- glm(y ~ (x1 + x2 + x3 + x4)^2, data = dat,
family = binomial(link = "logit"))
p1 <- predict(m1, type="response")
We can use pmcalibration
to plot a calibration curve for
this model but glms are always perfectly calibrated for the data they
were developed in so this is of zero use.
(cc_app <- pmcalibration(y = dat$y, p = p1, smooth = "gam", bs="tp", ci="none"))
#> Calibration metrics based on a calibration curve estimated for a binary outcome via a generalized additive model (see ?mgcv::s) using logit transformed predicted probabilities.
#>
#> Estimate
#> Eavg 1.4e-06
#> E50 7.9e-07
#> E90 4.2e-06
#> Emax 1.4e-05
#> ECI 6.2e-10
plot(cc_app)
Internal validation can provide an unbiased assessment of the performance of a model building strategy.
The bootstrap optimism approach to internal validation of some performance metric (M) proceeds as follows:
Below is a function to implement this bootstrap optimism for calibration metrics. It requires two other functions be specified. One function that implements the entire model development process and returns the fitted model object. And a second function that takes the fitted model object and produces predicted probabilities for a given data set. The functions below work for our all-two-way-interactions model.
mod <- function(data){
# function that implements the model development procedure
glm(y ~ (x1 + x2 + x3 + x4)^2, data = data,
family = binomial(link = "logit"))
}
pred <- function(model, data){
# function to get predictions
predict.glm(object = model, newdata = data, type="response")
}
This function implements the steps described above and
should work for any sensible combination of
model_fun
and pred_fun
. It hasn’t been widely
tested though so should be used with care! y
is the name of
the outcome column in data
and B
is the number
of bootstrap replicates. This example uses the gam
smooth
in pmcalibration
though this could easily be changed.
internal_cal <- function(model_fun, data, pred_fun, y="y", B=100){
# assess apparent performance
appmodel <- model_fun(data)
p <- pred_fun(appmodel, data)
pp <- seq(min(p), max(p), length.out=100) # for plotting
apparent <- pmcalibration(y = data[[y]], p = p, smooth = "gam", ci = "none", neval = pp)
# one bootstrap resample
one_boot <- function(model, data, pred_fun){
d <- data[sample(nrow(data), replace = T), ]
# bootmodel <- eval(modelcall) # fit model on resampled data
bootmodel <- model_fun(d)
p_boot <- pred_fun(bootmodel, d) # evaluate on 'training' data (bootstrap resample)
p_orig <- pred_fun(bootmodel, data) # evaluate on 'test' (original data)
cc_boot <- pmcalibration(y = d[[y]], p = p_boot, smooth = "gam", ci = "none", neval = pp)
cc_orig <- pmcalibration(y = data[[y]], p = p_orig, smooth = "gam", ci = "none", neval = pp)
# calculate optimism and return
metrics_optimism <- cc_boot$metrics - cc_orig$metrics
plot_optimism <- cc_boot$plot$p_c_plot - cc_orig$plot$p_c_plot
return(list(metrics_optimism, plot_optimism))
}
# run B bootstrap replicates (could be sped up via, e.g., pbapply)
opt <- lapply(seq(B), function(i){
one_boot(model = model, data = data, pred_fun = pred_fun)
})
# calculate average optimism for metrics and plot (to make bias corrected curve)
metrics_opt <- rbindlist(lapply(opt, function(x) data.frame(t(x[[1]]))))
metrics_opt <- apply(metrics_opt, 2, mean)
plot_opt <- rbindlist(lapply(opt, function(x) data.frame(t(x[[2]]))))
plot_opt <- apply(plot_opt, 2, mean)
# return
out <- list(
metrics = data.frame(apparent = apparent$metrics,
optimism = metrics_opt,
bias_corrected = apparent$metrics - metrics_opt
),
plot = data.frame(p = pp,
apparent = apparent$plot$p_c_plot,
optimism = plot_opt,
bias_corrected = apparent$plot$p_c_plot - plot_opt
)
)
return(out)
}
The code below runs internal validation on calibration metrics for the two way interaction model.
iv <- internal_cal(model_fun = mod, data = dat, pred_fun = pred)
iv$metrics
#> apparent optimism bias_corrected
#> Eavg 1.410450e-06 0.000461544 -0.0004601335
#> E50 7.883345e-07 -0.001027228 0.0010280167
#> E90 4.246544e-06 0.006225747 -0.0062215007
#> Emax 1.415855e-05 0.020638791 -0.0206246329
#> ECI 6.195421e-10 0.034309109 -0.0343091083
plot(iv$plot$p, iv$plot$apparent, type="l", xlim=c(0,1), ylim=c(0,1),
xlab="Predicted", ylab="Observed")
lines(iv$plot$p, iv$plot$bias_corrected, lty=2)
legend("topleft", lty=c(1,2), legend = c("Apparent", "Bias Corrected"))
Suppose we consider another model building strategy in which we use
backwards stepwise selection of variables via AIC. This can be assessed
by simply changing the model_fun
argument (the
pred_fun
works with the returned object so doesn’t need to
be changed).
mod2 <- function(data){
# function that implements the model development procedure
m <- glm(y ~ (x1 + x2 + x3 + x4)^2, data = data,
family = binomial(link = "logit"))
step(m, direction = "backward", trace = 0)
}
iv2 <- internal_cal(model_fun = mod2, data = dat, pred_fun = pred)
iv2$metrics
#> apparent optimism bias_corrected
#> Eavg 6.466123e-07 -0.001544511 0.001545157
#> E50 3.409134e-07 -0.001428112 0.001428453
#> E90 1.785011e-06 -0.001973877 0.001975662
#> Emax 5.952555e-06 0.002651626 -0.002645673
#> ECI 1.192142e-10 0.013766006 -0.013766006
plot(iv2$plot$p, iv2$plot$apparent, type="l", xlim=c(0,1), ylim=c(0,1),
xlab="Predicted", ylab="Observed")
lines(iv2$plot$p, iv2$plot$bias_corrected, lty=2)
legend("topleft", lty=c(1,2), legend = c("Apparent", "Bias Corrected"))
Steyerberg, E. W., Bleeker, S. E., Moll, H. A., Grobbee, D. E., & Moons, K. G. (2003). Internal and external validation of predictive models: a simulation study of bias and precision in small samples. Journal of clinical epidemiology, 56(5), 441-447. https://doi.org/10.1016/s0895-4356(03)00047-7
Harrell Jr, F. E., Lee, K. L., & Mark, D. B. (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine, 15(4), 361-387.
Note that this is not supposed to be an example of good model building! See the TRIPOD statement for guidance.↩︎