3  General Workflow and application in Survival Models

Comparison of coxph versus svycoxph after multiple imputation and propensity score matching

In Chapter 3 we illustrate a reproducible example on how to use coxph (survival package1) and svycoxph (survey package2) in combination with multiple imputation by chained equations (mice package3) and propensity score matching using the MatchThem package4.

First, we load the required R libraries/packages and some custom functions that are part of the encore.io R package that is being developed to streamline the analysis of all ENCORE trial emulations (non-public package).

library(dplyr)
library(survival)
library(mice)
library(MatchThem)
library(survey)
library(here)
library(gtsummary)
library(parallelly)
library(ranger)
library(furrr)
library(cobalt)
library(gsDesign)
library(encore.analytics)
library(yaml)
library(gt)
library(ggplot2)

source(here("functions", "covariate_vectors.R"))

# track time
runtime <- tictoc::tic()

3.1 Data generation

We use the simulate_flaura() function to simulate a realistic oncology comparative effectiveness analytic cohort dataset with similar distributions to FLAURA, a randomized controlled trial that evaluated the efficacy and safety of osimertinib to standard-of-care (SoC) tyrosine kinase inhibitors (TKIs) in advanced NSCLC patients with a sensitizing EGFR mutation.

The following cohort resembles distributions observed in the EHR-derived EDB1 dataset used in ENCORE. Note: the values of some continuous covariates (labs) are displayed after log/log-log transformation.

# load example dataset with missing observations
data_miss <- simulate_data(
  n_total = 3500, 
  seed = 42, 
  include_id = FALSE, 
  imposeNA = TRUE,
  propNA = .33
  )

We can use the create_table1() function from the encore.analytics package to create a summary table of the simulated dataset. The function is a convenient wrapper around the gtsummary package5 and allows to create a table in the following fashion:

# read YAML file
design_data <- read_yaml(here("_params.yml"))

# combine to a table
params <- bind_rows(design_data$params)

# named list
covariates <- params |> 
  dplyr::filter(dimension == "Covariate Assessment Window")

covariate_list <- as.list(setNames(covariates$label, covariates$variable))

data_miss |> 
  create_table1(
    covariates = names(covariate_list),
    covariates_labels = covariate_list
    )
Table 3.1: Summary table of the simulated dataset.
Patient characteristic Total
N = 35001
Treatment received
Difference2
0
N = 1487
(42.5%)1
1
N = 2013
(57.5%)1
Age [yrs] 69 (64, 74) 69 (64, 74) 69 (64, 74) -0.04
Sex 1,146 (33%) 494 (33%) 652 (32%) 0.02
Smoking history


0.10
    TRUE 846 (24%) 396 (27%) 450 (22%)
    FALSE 1,035 (30%) 419 (28%) 616 (31%)
    Missing 1,619 (46%) 672 (45%) 947 (47%)
Number of unqiue metastatic sites


-0.05
    1 1,364 (74%) 595 (75%) 769 (73%)
    2 414 (22%) 166 (21%) 248 (23%)
    3 52 (2.8%) 23 (2.9%) 29 (2.7%)
    4 16 (0.9%) 6 (0.8%) 10 (0.9%)
    Missing 1,654 697 957
ECOG Performance Status 1,643 (57%) 742 (60%) 901 (55%) 0.10
    Missing 640 257 383
Stage at initial diagnosis


-0.12
    1 27 (1.3%) 14 (1.6%) 13 (1.1%)
    2 55 (2.7%) 29 (3.3%) 26 (2.3%)
    3 424 (21%) 202 (23%) 222 (19%)
    4 1,506 (75%) 628 (72%) 878 (77%)
    Missing 1,488 614 874
Race


0.04
    Asian 849 (24%) 369 (25%) 480 (24%)
    Other 55 (1.6%) 22 (1.5%) 33 (1.6%)
    White 1,402 (40%) 601 (40%) 801 (40%)
    Missing 1,194 (34%) 495 (33%) 699 (35%)
Geographic region


0.07
    Midwest 359 (10%) 153 (10%) 206 (10%)
    Northeast 518 (15%) 209 (14%) 309 (15%)
    South 947 (27%) 397 (27%) 550 (27%)
    West 673 (19%) 278 (19%) 395 (20%)
    Missing 1,003 (29%) 450 (30%) 553 (27%)
Socioeconomic status


-0.04
    1 270 (13%) 123 (14%) 147 (12%)
    2 297 (14%) 120 (13%) 177 (14%)
    3 398 (18%) 164 (18%) 234 (19%)
    4 529 (25%) 227 (25%) 302 (24%)
    5 664 (31%) 270 (30%) 394 (31%)
    Missing 1,342 583 759
Hemoglobin [g/dL] 12.92 (12.01, 13.79) 12.83 (11.96, 13.75) 12.99 (12.04, 13.83) -0.09
    Missing 404 182 222
Urea nitrogen [mg/dL] 2.71 (2.50, 2.92) 2.72 (2.51, 2.94) 2.70 (2.49, 2.91) 0.07
    Missing 1,218 493 725
Platelets [10^9/L] 260 (221, 297) 261 (223, 298) 260 (220, 296) 0.03
    Missing 1,292 549 743
Calcium [mg/dL] 2.23 (2.20, 2.26) 2.23 (2.20, 2.26) 2.23 (2.20, 2.26) -0.06
    Missing 907 403 504
Glucose [mg/dL] 4.64 (4.57, 4.72) 4.65 (4.57, 4.72) 4.64 (4.56, 4.72) 0.07
    Missing 1,314 538 776
Lymphocyte to leukocyte ratio 2.87 (2.75, 2.99) 2.86 (2.74, 2.98) 2.89 (2.75, 3.00) -0.07
    Missing 1,650 729 921
Alkaline phosphatase [U/L] 4.48 (4.39, 4.56) 4.48 (4.39, 4.56) 4.48 (4.39, 4.57) -0.06
    Missing 592 266 326
Protein [g/L] 67.9 (65.3, 70.7) 67.9 (65.2, 70.6) 68.0 (65.3, 70.7) -0.04
    Missing 914 371 543
Alanine aminotransferase [U/L] 2.89 (2.67, 3.10) 2.90 (2.69, 3.10) 2.88 (2.66, 3.09) 0.08
    Missing 1,658 705 953
Albumin [g/L] 39.9 (37.9, 41.9) 40.0 (37.9, 41.9) 39.9 (37.9, 41.9) 0.02
    Missing 1,718 733 985
Bilirubin [mg/dL] -0.92 (-1.80, -0.07) -0.89 (-1.77, -0.05) -0.94 (-1.83, -0.07) 0.05
    Missing 378 152 226
Chloride [mmol/L] 102.03 (100.02, 104.04) 101.90 (99.91, 103.97) 102.13 (100.11, 104.18) -0.08
    Missing 934 385 549
Monocytes [10^9/L] -0.50 (-0.67, -0.33) -0.52 (-0.68, -0.34) -0.49 (-0.66, -0.32) -0.07
    Missing 1,067 447 620
Eosinophils to leukocytes ratio 0.68 (0.30, 1.08) 0.71 (0.37, 1.11) 0.67 (0.27, 1.06) 0.10
    Missing 1,602 702 900
Lactate dehydrogenase [U/L] 1.69 (1.66, 1.72) 1.69 (1.66, 1.72) 1.69 (1.66, 1.72) -0.03
    Missing 411 176 235
Heart rate [bpm] 4.43 (4.40, 4.46) 4.43 (4.40, 4.46) 4.43 (4.40, 4.45) 0.09
    Missing 1,734 751 983
Systolic blood pressure [mmHg] 4.85 (4.79, 4.91) 4.85 (4.79, 4.91) 4.85 (4.79, 4.91) 0.07
    Missing 1,669 694 975
Oxygen saturation [SpO2] 97.000 (96.994, 97.006) 97.000 (96.993, 97.006) 97.000 (96.994, 97.006) 0.01
    Missing 323 133 190
Neutrophil to lymphocyte ratio 1.29 (1.03, 1.55) 1.27 (1.02, 1.55) 1.29 (1.04, 1.56) -0.05
    Missing 995 408 587
Body mass index [kg/m^2] 3.23 (3.14, 3.32) 3.24 (3.15, 3.32) 3.22 (3.13, 3.32) 0.07
    Missing 802 359 443
Aspartate aminotransferase to alanine aminotransferase ratio 0.11 (-0.09, 0.31) 0.11 (-0.09, 0.31) 0.10 (-0.08, 0.32) -0.04
    Missing 1,605 689 916
Time from diagnosis to index date [days] 44 (32, 55) 44 (32, 56) 43 (31, 55) 0.02
    Missing 891 378 513
De novo metastatic disease at diagnosis


0.06
    TRUE 1,608 (46%) 657 (44%) 951 (47%)
    FALSE 396 (11%) 170 (11%) 226 (11%)
    Missing 1,496 (43%) 660 (44%) 836 (42%)
Height [cm] 1.65 (1.60, 1.70) 1.65 (1.60, 1.70) 1.65 (1.60, 1.70) 0.00
    Missing 1,343 561 782
Weight [kg] 69 (61, 76) 69 (61, 75) 69 (61, 76) -0.02
    Missing 1,458 633 825
Diastolic blood pressure [mmHg] 76.2 (72.0, 80.2) 76.4 (72.2, 80.1) 76.0 (71.8, 80.3) 0.02
    Missing 799 340 459
Year of index date


0.06
    <2018 3,324 (95%) 1,400 (94%) 1,924 (96%)
    2018+ 176 (5.0%) 87 (5.9%) 89 (4.4%)
1 Median (Q1, Q3); n (%)
2 Standardized Mean Difference
Abbreviation: CI = Confidence Interval

3.2 Step 1 - Multiple imputation

The first step after deriving the analytic cohort includes the creation of multiple imputed datasets using mice R package3.

The mice algorithm is one particular instance of a fully conditionally specified model. The algorithm starts with a random draw from the observed data, and imputes the incomplete data in a variable-by-variable fashion. One iteration consists of one cycle through all \(Y_j\).

MICE algorithm for imputation of multivariate missing data.

The number of iterations \(M\) (= number of imputed datasets) in this example is 10, but in ENCORE we follow Stef van Buuren’s advice:

[…] if calculation is not prohibitive, we may set \(M\) to the average percentage of missing data.

(Flexible imputation of Missing Data, Sub-chapter 2.8)

Following the results of various simulation studies6,7, we use a non-parametric (random forest-based) imputation approach as the actual imputation algorithm.

TipAdvantages of non-parametric imputation approaches
  • Parametric imputation models have to be correctly specified, i.e. also have to explicitly model nonlinear and non-additive covariate relationships

  • Many imputation algorithms are not prepared for mixed type of data

  • Popular: random forest-based algorithms

    • for each variable random forest is fit on the observed part and then predicts the missing part

    • missForest8 provides OOB error but only provides single imputations

    • Alternatives: rf, cart in mice package3

Note: In this example we utilize the futuremice() instead of the legacy mice() function to run the mice imputation across 3 cores in parallel.

# impute data
data_imp <- futuremice(
  parallelseed = 42,
  n.core = parallel::detectCores()-1,
  data = data_miss,
  method = "rf",
  m = 10,
  print = FALSE
  )

The imputation step creates an object of class…

class(data_imp)
[1] "mids"

…which stands for multiple imputed datasets. It contains important information on the imputation procedure and the actual imputed datasets.

3.3 Step 2 - Propensity score matching and weighting

Apply propensity score matching and weighting with replacement within in each imputed dataset. As pointed in Section 2.3.1, the MIte approach performed best in terms of bias, standardized differences/balancing, coverage rate and variance estimation. In MatchThem this approach is referred to a within approach (performing matching within each dataset), while the inferior MIps approach (estimating propensity scores within each dataset, averaging them across datasets, and performing matching using the averaged propensity scores in each dataset) is referred to as across approach. Since MIte/within has been shown to have superior performance in most cases, we only illustrate this approach here.

Let’s assume we fit the following propensity score model within each imputed dataset.

# apply propensity score matching on mids object
ps_form <- as.formula(paste("treat ~", paste(covariates_for_ps, collapse = " + ")))
ps_form
treat ~ dem_age_index_cont + dem_sex_cont + c_smoking_history + 
    c_number_met_sites + c_hemoglobin_g_dl_cont + c_urea_nitrogen_mg_dl_cont + 
    c_platelets_10_9_l_cont + c_calcium_mg_dl_cont + c_glucose_mg_dl_cont + 
    c_lymphocyte_leukocyte_ratio_cont + c_alp_u_l_cont + c_protein_g_l_cont + 
    c_alt_u_l_cont + c_albumin_g_l_cont + c_bilirubin_mg_dl_cont + 
    c_chloride_mmol_l_cont + c_monocytes_10_9_l_cont + c_eosinophils_leukocytes_ratio_cont + 
    c_ldh_u_l_cont + c_hr_cont + c_sbp_cont + c_oxygen_cont + 
    c_ecog_cont + c_neutrophil_lymphocyte_ratio_cont + c_bmi_cont + 
    c_ast_alt_ratio_cont + c_stage_initial_dx_cont + dem_race + 
    dem_region + dem_ses + c_time_dx_to_index

The matching step happens using the matchthem() function, which is a wrapper around the matchit() function. This function not only provides the functionality to match on the propensity score, but also to perform (coarsened) exact matching, cardinality matching, genetic matching and more. In this example, we use a simple 1:1 nearest neighbor matching on the propensity score (estimated through logistic regression) without replacement with a caliper of 1% of the standard deviation of the propensity score.

# matching
mimids_data <- matchthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = 'nearest',
  distance = "glm",
  link = "logit",
  caliper = 0.01,
  ratio = 1,
  replace = F
  )

# print summary for matched dataset #1
mimids_data
A `matchit` object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score [caliper]

             - estimated with logistic regression
 - caliper: <distance> (0.001)
 - number of obs.: 3500 (original), 2712 (matched)
 - target estimand: ATT
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index

The resulting “mimids” object contains the original imputed data and the output of the calls to matchit() applied to each imputed dataset.

The weighting step is performed very similarly using the weightthem() function. In this example we apply SMR weighting to arrive at the same ATT estimand as matching which is indicated through the estimand = "ATT" argument. In case we wanted to weight patients based on overlap weights, estimand = "AT0" would need to be specified (which is one of the sensitivity analyses in the FLAURA protocol).

To mitigate the risks of extreme weights, the subsequent trim() function truncates large weights by setting all weights higher than that at a given quantile (in this example the 95% quantile) to the weight at the quantile. Since we specify lower = TRUE, this is done symmetrically also with the 5% quantile.

# SMR weighting
wimids_data <- weightthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = "glm",
  estimand = "ATT"
  )

# trim extreme weights
wimids_data <- trim(
  x = wimids_data, 
  at = .95, 
  lower = TRUE
  )

wimids_data
A weightit object
 - method: "glm" (propensity score weighting with GLM)
 - number of obs.: 3500
 - sampling weights: none
 - treatment: 2-category
 - estimand: ATT (focal: 1)
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index
 - weights trimmed at 5% and 95%

The resulting “wimids” object contains the original imputed data and the output of the calls to weightit() applied to each imputed dataset.

3.4 Step 3 - Balance assessment

The inspection of balance assessment in multiple imputed and matched/weighted data can be done in a similar way as with a single complete dataset. For illustration we just look at the matched datasets, but the exact same principles also apply to the weighted datasets.

Table 3.2: Covariate balance table.
# create balance table
balance_table <- bal.tab(
  x = mimids_data, 
  stats = "m",
  abs = TRUE
  )

balance_table
Balance summary across all imputations
                                        Type Mean.Diff.Adj Max.Diff.Adj
distance                            Distance        0.0054       0.0058
dem_age_index_cont                   Contin.        0.0142       0.0311
dem_sex_cont                          Binary        0.0059       0.0158
c_smoking_history                     Binary        0.0055       0.0141
c_number_met_sites                   Contin.        0.0160       0.0314
c_hemoglobin_g_dl_cont               Contin.        0.0090       0.0177
c_urea_nitrogen_mg_dl_cont           Contin.        0.0182       0.0548
c_platelets_10_9_l_cont              Contin.        0.0165       0.0413
c_calcium_mg_dl_cont                 Contin.        0.0231       0.0444
c_glucose_mg_dl_cont                 Contin.        0.0132       0.0213
c_lymphocyte_leukocyte_ratio_cont    Contin.        0.0084       0.0192
c_alp_u_l_cont                       Contin.        0.0124       0.0255
c_protein_g_l_cont                   Contin.        0.0074       0.0189
c_alt_u_l_cont                       Contin.        0.0113       0.0314
c_albumin_g_l_cont                   Contin.        0.0155       0.0303
c_bilirubin_mg_dl_cont               Contin.        0.0126       0.0270
c_chloride_mmol_l_cont               Contin.        0.0214       0.0590
c_monocytes_10_9_l_cont              Contin.        0.0152       0.0331
c_eosinophils_leukocytes_ratio_cont  Contin.        0.0137       0.0304
c_ldh_u_l_cont                       Contin.        0.0097       0.0197
c_hr_cont                            Contin.        0.0122       0.0314
c_sbp_cont                           Contin.        0.0092       0.0290
c_oxygen_cont                        Contin.        0.0131       0.0341
c_ecog_cont                           Binary        0.0059       0.0188
c_neutrophil_lymphocyte_ratio_cont   Contin.        0.0116       0.0353
c_bmi_cont                           Contin.        0.0118       0.0352
c_ast_alt_ratio_cont                 Contin.        0.0150       0.0462
c_stage_initial_dx_cont              Contin.        0.0107       0.0327
dem_race_Asian                        Binary        0.0056       0.0102
dem_race_Other                        Binary        0.0021       0.0067
dem_race_White                        Binary        0.0056       0.0117
dem_region_Midwest                    Binary        0.0069       0.0156
dem_region_Northeast                  Binary        0.0041       0.0104
dem_region_South                      Binary        0.0047       0.0111
dem_region_West                       Binary        0.0047       0.0148
dem_ses                              Contin.        0.0117       0.0273
c_time_dx_to_index                   Contin.        0.0104       0.0269

Average sample sizes across imputations
               0      1
All       1487.  2013. 
Matched   1352.7 1352.7
Unmatched  134.3  660.3
love.plot(
  x = mimids_data,
  abs = TRUE,
  thresholds = 0.1, 
  drop.distance = TRUE,
  var.order = "unadjusted",
  colors = c("orange", "blue"), 
  stars = "std",
  shapes = 17, 
  size = 4, 
  grid = TRUE,
  position = "top"
  )
Figure 3.1: Covariate balance plot (love plot).
bal.plot(
  x = mimids_data,
  var.name = "distance",
  which = "both",
  which.imp = .none,
  colors = c("orange", "blue")
  )
Ignoring unknown labels:
• colour : "Treatment"

For power calculations, we use the method proposed by Schoenfeld9 to compute 1 - type II error rate \(\beta\) . For this, we assume the following:

  • \(\alpha\) = 0.05 (two-sided)

  • % exposed (1:1 matching in main analysis) = 50%

  • HR (desired) = 0.8

  • Events = as observed in data

Since we have multiple imputed and matched datasets, we need to average the number of events before before computing \(\beta\).

# make long dataset
data_long <- MatchThem::complete(
  # datasets
  data = mimids_data, 
  # produces a dataset with multiply imputed datasets stacked vertically
  action = "long", 
  # do NOT include observations with a zero estimated weight (non-matched)
  all = FALSE, 
  # do NOT include original dataset with missing values
  include = FALSE
  )

# compute average number of events
# by summing up all events
# and dividing by number of imputed datasets
avg_events <- sum(data_long$death_itt)/mimids_data$object$m

# compute beta
beta_gsDesign <- nEvents(
  alpha = 0.05, 
  sided = 2,
  n = avg_events,
  hr = .8,
  ratio = 1,
  tbl = TRUE
  )

# print results
cat("beta is", beta_gsDesign$beta, "\n")
beta is 6.929226e-05 
cat("power is", (1-beta_gsDesign$beta)*100, "% \n")
power is 99.99307 % 
# gsDesign table
beta_gsDesign
   hr      n alpha sided         beta     Power     delta ratio hr0         se
1 0.8 2675.1  0.05     2 6.929226e-05 0.9999307 0.1115718     1   1 0.03866874

3.5 Step 4 - Estimation of marginal treatment effects

Next, we compare the marginal treatment effect estimates coming from a Cox proportional hazards model after propensity score matching and weighting as implemented in the coxph() and in the svycoxph() functions.

From the MatchThem documentation:

Important
  • with() applies the supplied model in expr to the (matched or weighted) multiply imputed datasets, automatically incorporating the (matching) weights when possible. The argument to expr should be of the form glm(y ~ z, family = quasibinomial), for example, excluding the data or weights argument, which are automatically supplied.

  • Functions from the survey package, such as svyglm(), are treated a bit differently. No svydesign object needs to be supplied because with() automatically constructs and supplies it with the imputed dataset and estimated weights. When cluster = TRUE (or with() detects that pairs should be clustered; see the cluster argument above), pair membership is supplied to the ids argument of svydesign().

  • After weighting using weightthem(), glm_weightit() should be used as the modeling function to fit generalized linear models. It correctly produces robust standard errors that account for estimation of the weights, if possible. See WeightIt::glm_weightit() for details. Otherwise, svyglm() should be used rather than glm() in order to correctly compute standard errors.

  • For Cox models, coxph() will produce approximately correct standard errors when used with weighting, but svycoxph() will produce more accurate standard errors when matching is used.

We now want to compare treatment effect estimates for treat when computed (a) using coxph (survival package) and (b) svycoxph (survey package). More information on estimating treatment effects after matching is provided in https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html#survival-outcomes

# coxph result
coxph_results <- with(
  data = mimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat, 
               weights = weights, 
               cluster = subclass,
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results
# svycoxph result
svycoxph_results <- with(
  data = mimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high)

svycoxph_results

3.5.0.1 Summary

rbind(coxph_results, svycoxph_results)
   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7023964 0.04327771 0.6450011 0.7648989
2   survey treat 0.7023964 0.04329082 0.6450152 0.7648822
# coxph result
coxph_results <- with(
  data = wimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat,
               weights = weights, 
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results
# svycoxph result
svycoxph_results <- with(
  data = wimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

svycoxph_results

3.5.0.2 Summary

rbind(coxph_results, svycoxph_results)
   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7064623 0.03554309 0.6588960 0.7574625
2   survey treat 0.7064623 0.03554796 0.6589064 0.7574505

3.6 References

1.
Therneau TM. A package for survival analysis in r. 2024. Available at: https://CRAN.R-project.org/package=survival.
2.
Lumley T. Survey: Analysis of complex survey samples. 2024.
3.
Buuren S van, Groothuis-Oudshoorn K. Mice: Multivariate imputation by chained equations in r. 2011; 45: 1–67. doi:10.18637/jss.v045.i03.
4.
Pishgar F, Greifer N, Leyrat C, Stuart E. MatchThem: Matching and Weighting after Multiple Imputation. R Journal 2021; 13: 292–305. doi:10.32614/RJ-2021-073.
5.
Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible summary tables with the gtsummary package. 2021; 13: 570–580. doi:10.32614/RJ-2021-053.
6.
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. American Journal of Epidemiology 2014; 179: 764–774. doi:10.1093/aje/kwt312.
7.
Weberpals J, Raman S, Shaw P, et al. A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records. Clinical Epidemiology 2024; Volume 16: 329–343. doi:10.2147/clep.s436131.
8.
Stekhoven DJ, Bühlmann P. MissForestnon-parametric missing value imputation for mixed-type data. Bioinformatics 2012; 28: 112–118. doi:10.1093/bioinformatics/btr597.
9.
Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics 1983: 499–503.

3.7 Session info

Script runtime: 1.10 minutes.

pander::pander(subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion)))
  package loadedversion
cobalt cobalt 4.6.1
dplyr dplyr 1.1.4
encore.analytics encore.analytics 0.2.1
furrr furrr 0.3.1
future future 1.68.0
ggplot2 ggplot2 4.0.1
gsDesign gsDesign 3.8.0
gt gt 1.2.0
gtsummary gtsummary 2.5.0
here here 1.0.2
MatchThem MatchThem 1.2.1
Matrix Matrix 1.7-3
mice mice 3.19.0
parallelly parallelly 1.46.0
ranger ranger 0.17.0
survey survey 4.4-8
survival survival 3.8-3
yaml yaml 2.3.12
pander::pander(sessionInfo())

R version 4.5.1 (2025-06-13)

Platform: x86_64-pc-linux-gnu

locale: LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8 and LC_IDENTIFICATION=C

attached base packages: grid, stats, graphics, grDevices, datasets, utils, methods and base

other attached packages: ggplot2(v.4.0.1), gt(v.1.2.0), yaml(v.2.3.12), encore.analytics(v.0.2.1), gsDesign(v.3.8.0), cobalt(v.4.6.1), furrr(v.0.3.1), future(v.1.68.0), ranger(v.0.17.0), parallelly(v.1.46.0), gtsummary(v.2.5.0), here(v.1.0.2), survey(v.4.4-8), Matrix(v.1.7-3), MatchThem(v.1.2.1), mice(v.3.19.0), survival(v.3.8-3) and dplyr(v.1.1.4)

loaded via a namespace (and not attached): tidyselect(v.1.2.1), farver(v.2.1.2), S7(v.0.2.1), smd(v.0.8.0), fastmap(v.1.2.0), digest(v.0.6.39), rpart(v.4.1.24), lifecycle(v.1.0.4), magrittr(v.2.0.4), compiler(v.4.5.1), sass(v.0.4.10), rlang(v.1.1.6), tools(v.4.5.1), knitr(v.1.51), labeling(v.0.4.3), htmlwidgets(v.1.6.4), xml2(v.1.5.1), r2rtf(v.1.2.0), RColorBrewer(v.1.1-3), withr(v.3.0.2), purrr(v.1.2.0), nnet(v.7.3-20), jomo(v.2.7-6), xtable(v.1.8-4), globals(v.0.18.0), scales(v.1.4.0), iterators(v.1.0.14), MASS(v.7.3-65), cli(v.3.6.5), rmarkdown(v.2.30), crayon(v.1.5.3), reformulas(v.0.4.3), generics(v.0.1.4), sessioninfo(v.1.2.3), commonmark(v.2.0.0), minqa(v.1.2.8), DBI(v.1.2.3), pander(v.0.6.6), stringr(v.1.6.0), splines(v.4.5.1), assertthat(v.0.2.1), parallel(v.4.5.1), base64enc(v.0.1-3), mitools(v.2.4), vctrs(v.0.6.5), WeightIt(v.1.5.1), boot(v.1.3-31), glmnet(v.4.1-10), sandwich(v.3.1-1), jsonlite(v.2.0.0), litedown(v.0.9), mitml(v.0.4-5), listenv(v.0.10.0), locfit(v.1.5-9.12), foreach(v.1.5.2), tidyr(v.1.3.2), glue(v.1.8.0), nloptr(v.2.2.1), pan(v.1.9), chk(v.0.10.0), codetools(v.0.2-20), stringi(v.1.8.7), shape(v.1.4.6.1), gtable(v.0.3.6), lme4(v.1.1-38), tibble(v.3.3.0), pillar(v.1.11.1), htmltools(v.0.5.9), R6(v.2.6.1), Rdpack(v.2.6.4), rprojroot(v.2.1.1), evaluate(v.1.0.5), lattice(v.0.22-7), markdown(v.2.0), cards(v.0.7.1), tictoc(v.1.2.1), rbibutils(v.2.4), backports(v.1.5.0), MatchIt(v.4.7.2), broom(v.1.0.11), simsurv(v.1.0.0), renv(v.1.1.5), cardx(v.0.3.1), Rcpp(v.1.1.0), nlme(v.3.1-168), xfun(v.0.55), forcats(v.1.0.1), fs(v.1.6.6), zoo(v.1.8-15) and pkgconfig(v.2.0.3)

pander::pander(options('repos'))
  • repos:

    Table continues below
    CRAN
    https://cran.rstudio.com
    PositPackageManager
    https://packagemanager.posit.co/cran/linux/noble/latest