3 General Workflow and application in Survival Models

Comparison of coxph versus svycoxph after multiple imputation and propensity score matching

In Chapter 3 we illustrate a reproducible example on how to use coxph (survival package¹) and svycoxph (survey package²) in combination with multiple imputation by chained equations (mice package³) and propensity score matching using the MatchThem package⁴.

First, we load the required R libraries/packages and some custom functions that are part of the encore.io R package that is being developed to streamline the analysis of all ENCORE trial emulations (non-public package).

library(dplyr)
library(survival)
library(mice)
library(MatchThem)
library(survey)
library(here)
library(gtsummary)
library(parallelly)
library(ranger)
library(furrr)
library(cobalt)
library(gsDesign)
library(encore.analytics)
library(yaml)
library(gt)
library(ggplot2)

source(here("functions", "covariate_vectors.R"))

# track time
runtime <- tictoc::tic()

3.1 Data generation

We use the simulate_flaura() function to simulate a realistic oncology comparative effectiveness analytic cohort dataset with similar distributions to FLAURA, a randomized controlled trial that evaluated the efficacy and safety of osimertinib to standard-of-care (SoC) tyrosine kinase inhibitors (TKIs) in advanced NSCLC patients with a sensitizing EGFR mutation.

The following cohort resembles distributions observed in the EHR-derived EDB1 dataset used in ENCORE. Note: the values of some continuous covariates (labs) are displayed after log/log-log transformation.

# load example dataset with missing observations
data_miss <- simulate_data(
  n_total = 3500, 
  seed = 42, 
  include_id = FALSE, 
  imposeNA = TRUE,
  propNA = .33
  )

We can use the create_table1() function from the encore.analytics package to create a summary table of the simulated dataset. The function is a convenient wrapper around the gtsummary package⁵ and allows to create a table in the following fashion:

# read YAML file
design_data <- read_yaml(here("_params.yml"))

# combine to a table
params <- bind_rows(design_data$params)

# named list
covariates <- params |> 
  dplyr::filter(dimension == "Covariate Assessment Window")

covariate_list <- as.list(setNames(covariates$label, covariates$variable))

data_miss |> 
  create_table1(
    covariates = names(covariate_list),
    covariates_labels = covariate_list
    )

Table 3.1: Summary table of the simulated dataset.

Patient characteristic	Total N = 3500¹	Treatment received		Difference²
Patient characteristic	Total N = 3500¹	0 N = 1487 (42.5%)¹	1 N = 2013 (57.5%)¹	Difference²
Age [yrs]	69 (64, 74)	69 (64, 74)	69 (64, 74)	-0.04
Sex	1,146 (33%)	494 (33%)	652 (32%)	0.02
Smoking history				0.10
TRUE	846 (24%)	396 (27%)	450 (22%)
FALSE	1,035 (30%)	419 (28%)	616 (31%)
Missing	1,619 (46%)	672 (45%)	947 (47%)
Number of unqiue metastatic sites				-0.05
1	1,364 (74%)	595 (75%)	769 (73%)
2	414 (22%)	166 (21%)	248 (23%)
3	52 (2.8%)	23 (2.9%)	29 (2.7%)
4	16 (0.9%)	6 (0.8%)	10 (0.9%)
Missing	1,654	697	957
ECOG Performance Status	1,643 (57%)	742 (60%)	901 (55%)	0.10
Missing	640	257	383
Stage at initial diagnosis				-0.12
1	27 (1.3%)	14 (1.6%)	13 (1.1%)
2	55 (2.7%)	29 (3.3%)	26 (2.3%)
3	424 (21%)	202 (23%)	222 (19%)
4	1,506 (75%)	628 (72%)	878 (77%)
Missing	1,488	614	874
Race				0.04
Asian	849 (24%)	369 (25%)	480 (24%)
Other	55 (1.6%)	22 (1.5%)	33 (1.6%)
White	1,402 (40%)	601 (40%)	801 (40%)
Missing	1,194 (34%)	495 (33%)	699 (35%)
Geographic region				0.07
Midwest	359 (10%)	153 (10%)	206 (10%)
Northeast	518 (15%)	209 (14%)	309 (15%)
South	947 (27%)	397 (27%)	550 (27%)
West	673 (19%)	278 (19%)	395 (20%)
Missing	1,003 (29%)	450 (30%)	553 (27%)
Socioeconomic status				-0.04
1	270 (13%)	123 (14%)	147 (12%)
2	297 (14%)	120 (13%)	177 (14%)
3	398 (18%)	164 (18%)	234 (19%)
4	529 (25%)	227 (25%)	302 (24%)
5	664 (31%)	270 (30%)	394 (31%)
Missing	1,342	583	759
Hemoglobin [g/dL]	12.92 (12.01, 13.79)	12.83 (11.96, 13.75)	12.99 (12.04, 13.83)	-0.09
Missing	404	182	222
Urea nitrogen [mg/dL]	2.71 (2.50, 2.92)	2.72 (2.51, 2.94)	2.70 (2.49, 2.91)	0.07
Missing	1,218	493	725
Platelets [10^9/L]	260 (221, 297)	261 (223, 298)	260 (220, 296)	0.03
Missing	1,292	549	743
Calcium [mg/dL]	2.23 (2.20, 2.26)	2.23 (2.20, 2.26)	2.23 (2.20, 2.26)	-0.06
Missing	907	403	504
Glucose [mg/dL]	4.64 (4.57, 4.72)	4.65 (4.57, 4.72)	4.64 (4.56, 4.72)	0.07
Missing	1,314	538	776
Lymphocyte to leukocyte ratio	2.87 (2.75, 2.99)	2.86 (2.74, 2.98)	2.89 (2.75, 3.00)	-0.07
Missing	1,650	729	921
Alkaline phosphatase [U/L]	4.48 (4.39, 4.56)	4.48 (4.39, 4.56)	4.48 (4.39, 4.57)	-0.06
Missing	592	266	326
Protein [g/L]	67.9 (65.3, 70.7)	67.9 (65.2, 70.6)	68.0 (65.3, 70.7)	-0.04
Missing	914	371	543
Alanine aminotransferase [U/L]	2.89 (2.67, 3.10)	2.90 (2.69, 3.10)	2.88 (2.66, 3.09)	0.08
Missing	1,658	705	953
Albumin [g/L]	39.9 (37.9, 41.9)	40.0 (37.9, 41.9)	39.9 (37.9, 41.9)	0.02
Missing	1,718	733	985
Bilirubin [mg/dL]	-0.92 (-1.80, -0.07)	-0.89 (-1.77, -0.05)	-0.94 (-1.83, -0.07)	0.05
Missing	378	152	226
Chloride [mmol/L]	102.03 (100.02, 104.04)	101.90 (99.91, 103.97)	102.13 (100.11, 104.18)	-0.08
Missing	934	385	549
Monocytes [10^9/L]	-0.50 (-0.67, -0.33)	-0.52 (-0.68, -0.34)	-0.49 (-0.66, -0.32)	-0.07
Missing	1,067	447	620
Eosinophils to leukocytes ratio	0.68 (0.30, 1.08)	0.71 (0.37, 1.11)	0.67 (0.27, 1.06)	0.10
Missing	1,602	702	900
Lactate dehydrogenase [U/L]	1.69 (1.66, 1.72)	1.69 (1.66, 1.72)	1.69 (1.66, 1.72)	-0.03
Missing	411	176	235
Heart rate [bpm]	4.43 (4.40, 4.46)	4.43 (4.40, 4.46)	4.43 (4.40, 4.45)	0.09
Missing	1,734	751	983
Systolic blood pressure [mmHg]	4.85 (4.79, 4.91)	4.85 (4.79, 4.91)	4.85 (4.79, 4.91)	0.07
Missing	1,669	694	975
Oxygen saturation [SpO2]	97.000 (96.994, 97.006)	97.000 (96.993, 97.006)	97.000 (96.994, 97.006)	0.01
Missing	323	133	190
Neutrophil to lymphocyte ratio	1.29 (1.03, 1.55)	1.27 (1.02, 1.55)	1.29 (1.04, 1.56)	-0.05
Missing	995	408	587
Body mass index [kg/m^2]	3.23 (3.14, 3.32)	3.24 (3.15, 3.32)	3.22 (3.13, 3.32)	0.07
Missing	802	359	443
Aspartate aminotransferase to alanine aminotransferase ratio	0.11 (-0.09, 0.31)	0.11 (-0.09, 0.31)	0.10 (-0.08, 0.32)	-0.04
Missing	1,605	689	916
Time from diagnosis to index date [days]	44 (32, 55)	44 (32, 56)	43 (31, 55)	0.02
Missing	891	378	513
De novo metastatic disease at diagnosis				0.06
TRUE	1,608 (46%)	657 (44%)	951 (47%)
FALSE	396 (11%)	170 (11%)	226 (11%)
Missing	1,496 (43%)	660 (44%)	836 (42%)
Height [cm]	1.65 (1.60, 1.70)	1.65 (1.60, 1.70)	1.65 (1.60, 1.70)	0.00
Missing	1,343	561	782
Weight [kg]	69 (61, 76)	69 (61, 75)	69 (61, 76)	-0.02
Missing	1,458	633	825
Diastolic blood pressure [mmHg]	76.2 (72.0, 80.2)	76.4 (72.2, 80.1)	76.0 (71.8, 80.3)	0.02
Missing	799	340	459
Year of index date				0.06
<2018	3,324 (95%)	1,400 (94%)	1,924 (96%)
2018+	176 (5.0%)	87 (5.9%)	89 (4.4%)
¹ Median (Q1, Q3); n (%)
² Standardized Mean Difference
Abbreviation: CI = Confidence Interval

3.2 Step 1 - Multiple imputation

The first step after deriving the analytic cohort includes the creation of multiple imputed datasets using mice R package³.

The mice algorithm is one particular instance of a fully conditionally specified model. The algorithm starts with a random draw from the observed data, and imputes the incomplete data in a variable-by-variable fashion. One iteration consists of one cycle through all \(Y_j\).

MICE algorithm for imputation of multivariate missing data.

The number of iterations \(M\) (= number of imputed datasets) in this example is 10, but in ENCORE we follow Stef van Buuren’s advice:

[…] if calculation is not prohibitive, we may set \(M\) to the average percentage of missing data.

(Flexible imputation of Missing Data, Sub-chapter 2.8)

Following the results of various simulation studies^6,7, we use a non-parametric (random forest-based) imputation approach as the actual imputation algorithm.

Advantages of non-parametric imputation approaches

Parametric imputation models have to be correctly specified, i.e. also have to explicitly model nonlinear and non-additive covariate relationships
Many imputation algorithms are not prepared for mixed type of data
Popular: random forest-based algorithms
- for each variable random forest is fit on the observed part and then predicts the missing part
- missForest⁸ provides OOB error but only provides single imputations
- Alternatives: rf, cart in mice package³

Note: In this example we utilize the futuremice() instead of the legacy mice() function to run the mice imputation across 3 cores in parallel.

# impute data
data_imp <- futuremice(
  parallelseed = 42,
  n.core = parallel::detectCores()-1,
  data = data_miss,
  method = "rf",
  m = 10,
  print = FALSE
  )

The imputation step creates an object of class…

class(data_imp)

[1] "mids"

…which stands for multiple imputed datasets. It contains important information on the imputation procedure and the actual imputed datasets.

3.3 Step 2 - Propensity score matching and weighting

Apply propensity score matching and weighting with replacement within in each imputed dataset. As pointed in Section 2.3.1, the MIte approach performed best in terms of bias, standardized differences/balancing, coverage rate and variance estimation. In MatchThem this approach is referred to a within approach (performing matching within each dataset), while the inferior MIps approach (estimating propensity scores within each dataset, averaging them across datasets, and performing matching using the averaged propensity scores in each dataset) is referred to as across approach. Since MIte/within has been shown to have superior performance in most cases, we only illustrate this approach here.

Let’s assume we fit the following propensity score model within each imputed dataset.

# apply propensity score matching on mids object
ps_form <- as.formula(paste("treat ~", paste(covariates_for_ps, collapse = " + ")))
ps_form

treat ~ dem_age_index_cont + dem_sex_cont + c_smoking_history + 
    c_number_met_sites + c_hemoglobin_g_dl_cont + c_urea_nitrogen_mg_dl_cont + 
    c_platelets_10_9_l_cont + c_calcium_mg_dl_cont + c_glucose_mg_dl_cont + 
    c_lymphocyte_leukocyte_ratio_cont + c_alp_u_l_cont + c_protein_g_l_cont + 
    c_alt_u_l_cont + c_albumin_g_l_cont + c_bilirubin_mg_dl_cont + 
    c_chloride_mmol_l_cont + c_monocytes_10_9_l_cont + c_eosinophils_leukocytes_ratio_cont + 
    c_ldh_u_l_cont + c_hr_cont + c_sbp_cont + c_oxygen_cont + 
    c_ecog_cont + c_neutrophil_lymphocyte_ratio_cont + c_bmi_cont + 
    c_ast_alt_ratio_cont + c_stage_initial_dx_cont + dem_race + 
    dem_region + dem_ses + c_time_dx_to_index

The matching step happens using the matchthem() function, which is a wrapper around the matchit() function. This function not only provides the functionality to match on the propensity score, but also to perform (coarsened) exact matching, cardinality matching, genetic matching and more. In this example, we use a simple 1:1 nearest neighbor matching on the propensity score (estimated through logistic regression) without replacement with a caliper of 1% of the standard deviation of the propensity score.

# matching
mimids_data <- matchthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = 'nearest',
  distance = "glm",
  link = "logit",
  caliper = 0.01,
  ratio = 1,
  replace = F
  )

# print summary for matched dataset #1
mimids_data

A `matchit` object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score [caliper]

             - estimated with logistic regression
 - caliper: <distance> (0.001)
 - number of obs.: 3500 (original), 2712 (matched)
 - target estimand: ATT
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index

The resulting “mimids” object contains the original imputed data and the output of the calls to matchit() applied to each imputed dataset.

The weighting step is performed very similarly using the weightthem() function. In this example we apply SMR weighting to arrive at the same ATT estimand as matching which is indicated through the estimand = "ATT" argument. In case we wanted to weight patients based on overlap weights, estimand = "AT0" would need to be specified (which is one of the sensitivity analyses in the FLAURA protocol).

To mitigate the risks of extreme weights, the subsequent trim() function truncates large weights by setting all weights higher than that at a given quantile (in this example the 95% quantile) to the weight at the quantile. Since we specify lower = TRUE, this is done symmetrically also with the 5% quantile.

# SMR weighting
wimids_data <- weightthem(
  formula = ps_form,
  datasets = data_imp,
  approach = 'within',
  method = "glm",
  estimand = "ATT"
  )

# trim extreme weights
wimids_data <- trim(
  x = wimids_data, 
  at = .95, 
  lower = TRUE
  )

wimids_data

A weightit object
 - method: "glm" (propensity score weighting with GLM)
 - number of obs.: 3500
 - sampling weights: none
 - treatment: 2-category
 - estimand: ATT (focal: 1)
 - covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index
 - weights trimmed at 5% and 95%

The resulting “wimids” object contains the original imputed data and the output of the calls to weightit() applied to each imputed dataset.

3.4 Step 3 - Balance assessment

The inspection of balance assessment in multiple imputed and matched/weighted data can be done in a similar way as with a single complete dataset. For illustration we just look at the matched datasets, but the exact same principles also apply to the weighted datasets.

Table 3.2: Covariate balance table.

# create balance table
balance_table <- bal.tab(
  x = mimids_data, 
  stats = "m",
  abs = TRUE
  )

balance_table

Balance summary across all imputations
                                        Type Mean.Diff.Adj Max.Diff.Adj
distance                            Distance        0.0054       0.0058
dem_age_index_cont                   Contin.        0.0142       0.0311
dem_sex_cont                          Binary        0.0059       0.0158
c_smoking_history                     Binary        0.0055       0.0141
c_number_met_sites                   Contin.        0.0160       0.0314
c_hemoglobin_g_dl_cont               Contin.        0.0090       0.0177
c_urea_nitrogen_mg_dl_cont           Contin.        0.0182       0.0548
c_platelets_10_9_l_cont              Contin.        0.0165       0.0413
c_calcium_mg_dl_cont                 Contin.        0.0231       0.0444
c_glucose_mg_dl_cont                 Contin.        0.0132       0.0213
c_lymphocyte_leukocyte_ratio_cont    Contin.        0.0084       0.0192
c_alp_u_l_cont                       Contin.        0.0124       0.0255
c_protein_g_l_cont                   Contin.        0.0074       0.0189
c_alt_u_l_cont                       Contin.        0.0113       0.0314
c_albumin_g_l_cont                   Contin.        0.0155       0.0303
c_bilirubin_mg_dl_cont               Contin.        0.0126       0.0270
c_chloride_mmol_l_cont               Contin.        0.0214       0.0590
c_monocytes_10_9_l_cont              Contin.        0.0152       0.0331
c_eosinophils_leukocytes_ratio_cont  Contin.        0.0137       0.0304
c_ldh_u_l_cont                       Contin.        0.0097       0.0197
c_hr_cont                            Contin.        0.0122       0.0314
c_sbp_cont                           Contin.        0.0092       0.0290
c_oxygen_cont                        Contin.        0.0131       0.0341
c_ecog_cont                           Binary        0.0059       0.0188
c_neutrophil_lymphocyte_ratio_cont   Contin.        0.0116       0.0353
c_bmi_cont                           Contin.        0.0118       0.0352
c_ast_alt_ratio_cont                 Contin.        0.0150       0.0462
c_stage_initial_dx_cont              Contin.        0.0107       0.0327
dem_race_Asian                        Binary        0.0056       0.0102
dem_race_Other                        Binary        0.0021       0.0067
dem_race_White                        Binary        0.0056       0.0117
dem_region_Midwest                    Binary        0.0069       0.0156
dem_region_Northeast                  Binary        0.0041       0.0104
dem_region_South                      Binary        0.0047       0.0111
dem_region_West                       Binary        0.0047       0.0148
dem_ses                              Contin.        0.0117       0.0273
c_time_dx_to_index                   Contin.        0.0104       0.0269

Average sample sizes across imputations
               0      1
All       1487.  2013. 
Matched   1352.7 1352.7
Unmatched  134.3  660.3

love.plot(
  x = mimids_data,
  abs = TRUE,
  thresholds = 0.1, 
  drop.distance = TRUE,
  var.order = "unadjusted",
  colors = c("orange", "blue"), 
  stars = "std",
  shapes = 17, 
  size = 4, 
  grid = TRUE,
  position = "top"
  )

Figure 3.1: Covariate balance plot (love plot).

bal.plot(
  x = mimids_data,
  var.name = "distance",
  which = "both",
  which.imp = .none,
  colors = c("orange", "blue")
  )

Ignoring unknown labels:
• colour : "Treatment"

For power calculations, we use the method proposed by Schoenfeld⁹ to compute 1 - type II error rate \(\beta\) . For this, we assume the following:

\(\alpha\) = 0.05 (two-sided)
% exposed (1:1 matching in main analysis) = 50%
HR (desired) = 0.8
Events = as observed in data

Since we have multiple imputed and matched datasets, we need to average the number of events before before computing \(\beta\).

# make long dataset
data_long <- MatchThem::complete(
  # datasets
  data = mimids_data, 
  # produces a dataset with multiply imputed datasets stacked vertically
  action = "long", 
  # do NOT include observations with a zero estimated weight (non-matched)
  all = FALSE, 
  # do NOT include original dataset with missing values
  include = FALSE
  )

# compute average number of events
# by summing up all events
# and dividing by number of imputed datasets
avg_events <- sum(data_long$death_itt)/mimids_data$object$m

# compute beta
beta_gsDesign <- nEvents(
  alpha = 0.05, 
  sided = 2,
  n = avg_events,
  hr = .8,
  ratio = 1,
  tbl = TRUE
  )

# print results
cat("beta is", beta_gsDesign$beta, "\n")

beta is 6.929226e-05

cat("power is", (1-beta_gsDesign$beta)*100, "% \n")

power is 99.99307 %

# gsDesign table
beta_gsDesign

   hr      n alpha sided         beta     Power     delta ratio hr0         se
1 0.8 2675.1  0.05     2 6.929226e-05 0.9999307 0.1115718     1   1 0.03866874

3.5 Step 4 - Estimation of marginal treatment effects

Next, we compare the marginal treatment effect estimates coming from a Cox proportional hazards model after propensity score matching and weighting as implemented in the coxph() and in the svycoxph() functions.

From the MatchThem documentation:

Important

with() applies the supplied model in expr to the (matched or weighted) multiply imputed datasets, automatically incorporating the (matching) weights when possible. The argument to expr should be of the form glm(y ~ z, family = quasibinomial), for example, excluding the data or weights argument, which are automatically supplied.
Functions from the survey package, such as svyglm(), are treated a bit differently. No svydesign object needs to be supplied because with() automatically constructs and supplies it with the imputed dataset and estimated weights. When cluster = TRUE (or with() detects that pairs should be clustered; see the cluster argument above), pair membership is supplied to the ids argument of svydesign().
After weighting using weightthem(), glm_weightit() should be used as the modeling function to fit generalized linear models. It correctly produces robust standard errors that account for estimation of the weights, if possible. See WeightIt::glm_weightit() for details. Otherwise, svyglm() should be used rather than glm() in order to correctly compute standard errors.
For Cox models, coxph() will produce approximately correct standard errors when used with weighting, but svycoxph() will produce more accurate standard errors when matching is used.

We now want to compare treatment effect estimates for treat when computed (a) using coxph (survival package) and (b) svycoxph (survey package). More information on estimating treatment effects after matching is provided in https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html#survival-outcomes

# coxph result
coxph_results <- with(
  data = mimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat, 
               weights = weights, 
               cluster = subclass,
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results

# svycoxph result
svycoxph_results <- with(
  data = mimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high)

svycoxph_results

3.5.0.1 Summary

rbind(coxph_results, svycoxph_results)

   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7023964 0.04327771 0.6450011 0.7648989
2   survey treat 0.7023964 0.04329082 0.6450152 0.7648822

# coxph result
coxph_results <- with(
  data = wimids_data,
  expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat,
               weights = weights, 
               robust = TRUE
               )
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survival") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

coxph_results

# svycoxph result
svycoxph_results <- with(
  data = wimids_data,
  expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
  cluster = TRUE
  ) |> 
  pool() |> 
  tidy(exponentiate = TRUE, conf.int = TRUE) |> 
  mutate(package = "survey") |> 
  select(package, term, estimate, std.error, conf.low, conf.high) 

svycoxph_results

3.5.0.2 Summary

rbind(coxph_results, svycoxph_results)

   package  term  estimate  std.error  conf.low conf.high
1 survival treat 0.7064623 0.03554309 0.6588960 0.7574625
2   survey treat 0.7064623 0.03554796 0.6589064 0.7574505

3.6 References

Therneau TM. A package for survival analysis in r. 2024. Available at: https://CRAN.R-project.org/package=survival.

Lumley T. Survey: Analysis of complex survey samples. 2024.

Buuren S van, Groothuis-Oudshoorn K. Mice: Multivariate imputation by chained equations in r. 2011; 45: 1–67. doi:10.18637/jss.v045.i03.

Pishgar F, Greifer N, Leyrat C, Stuart E. MatchThem: Matching and Weighting after Multiple Imputation. R Journal 2021; 13: 292–305. doi:10.32614/RJ-2021-073.

Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible summary tables with the gtsummary package. 2021; 13: 570–580. doi:10.32614/RJ-2021-053.

Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. American Journal of Epidemiology 2014; 179: 764–774. doi:10.1093/aje/kwt312.

Weberpals J, Raman S, Shaw P, et al. A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records. Clinical Epidemiology 2024; Volume 16: 329–343. doi:10.2147/clep.s436131.

Stekhoven DJ, Bühlmann P. MissForestnon-parametric missing value imputation for mixed-type data. Bioinformatics 2012; 28: 112–118. doi:10.1093/bioinformatics/btr597.

Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics 1983: 499–503.

3.7 Session info

Script runtime: 1.10 minutes.

pander::pander(subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion)))

	package	loadedversion
cobalt	cobalt	4.6.1
dplyr	dplyr	1.1.4
encore.analytics	encore.analytics	0.2.1
furrr	furrr	0.3.1
future	future	1.68.0
ggplot2	ggplot2	4.0.1
gsDesign	gsDesign	3.8.0
gt	gt	1.2.0
gtsummary	gtsummary	2.5.0
here	here	1.0.2
MatchThem	MatchThem	1.2.1
Matrix	Matrix	1.7-3
mice	mice	3.19.0
parallelly	parallelly	1.46.0
ranger	ranger	0.17.0
survey	survey	4.4-8
survival	survival	3.8-3
yaml	yaml	2.3.12

pander::pander(sessionInfo())

R version 4.5.1 (2025-06-13)

Platform: x86_64-pc-linux-gnu

locale: LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8 and LC_IDENTIFICATION=C

attached base packages: grid, stats, graphics, grDevices, datasets, utils, methods and base

other attached packages: ggplot2(v.4.0.1), gt(v.1.2.0), yaml(v.2.3.12), encore.analytics(v.0.2.1), gsDesign(v.3.8.0), cobalt(v.4.6.1), furrr(v.0.3.1), future(v.1.68.0), ranger(v.0.17.0), parallelly(v.1.46.0), gtsummary(v.2.5.0), here(v.1.0.2), survey(v.4.4-8), Matrix(v.1.7-3), MatchThem(v.1.2.1), mice(v.3.19.0), survival(v.3.8-3) and dplyr(v.1.1.4)

loaded via a namespace (and not attached): tidyselect(v.1.2.1), farver(v.2.1.2), S7(v.0.2.1), smd(v.0.8.0), fastmap(v.1.2.0), digest(v.0.6.39), rpart(v.4.1.24), lifecycle(v.1.0.4), magrittr(v.2.0.4), compiler(v.4.5.1), sass(v.0.4.10), rlang(v.1.1.6), tools(v.4.5.1), knitr(v.1.51), labeling(v.0.4.3), htmlwidgets(v.1.6.4), xml2(v.1.5.1), r2rtf(v.1.2.0), RColorBrewer(v.1.1-3), withr(v.3.0.2), purrr(v.1.2.0), nnet(v.7.3-20), jomo(v.2.7-6), xtable(v.1.8-4), globals(v.0.18.0), scales(v.1.4.0), iterators(v.1.0.14), MASS(v.7.3-65), cli(v.3.6.5), rmarkdown(v.2.30), crayon(v.1.5.3), reformulas(v.0.4.3), generics(v.0.1.4), sessioninfo(v.1.2.3), commonmark(v.2.0.0), minqa(v.1.2.8), DBI(v.1.2.3), pander(v.0.6.6), stringr(v.1.6.0), splines(v.4.5.1), assertthat(v.0.2.1), parallel(v.4.5.1), base64enc(v.0.1-3), mitools(v.2.4), vctrs(v.0.6.5), WeightIt(v.1.5.1), boot(v.1.3-31), glmnet(v.4.1-10), sandwich(v.3.1-1), jsonlite(v.2.0.0), litedown(v.0.9), mitml(v.0.4-5), listenv(v.0.10.0), locfit(v.1.5-9.12), foreach(v.1.5.2), tidyr(v.1.3.2), glue(v.1.8.0), nloptr(v.2.2.1), pan(v.1.9), chk(v.0.10.0), codetools(v.0.2-20), stringi(v.1.8.7), shape(v.1.4.6.1), gtable(v.0.3.6), lme4(v.1.1-38), tibble(v.3.3.0), pillar(v.1.11.1), htmltools(v.0.5.9), R6(v.2.6.1), Rdpack(v.2.6.4), rprojroot(v.2.1.1), evaluate(v.1.0.5), lattice(v.0.22-7), markdown(v.2.0), cards(v.0.7.1), tictoc(v.1.2.1), rbibutils(v.2.4), backports(v.1.5.0), MatchIt(v.4.7.2), broom(v.1.0.11), simsurv(v.1.0.0), renv(v.1.1.5), cardx(v.0.3.1), Rcpp(v.1.1.0), nlme(v.3.1-168), xfun(v.0.55), forcats(v.1.0.1), fs(v.1.6.6), zoo(v.1.8-15) and pkgconfig(v.2.0.3)

pander::pander(options('repos'))

repos:

Table continues below

CRAN

https://cran.rstudio.com

PositPackageManager

https://packagemanager.posit.co/cran/linux/noble/latest