
Simulates an artifical EHR-derived analysis-ready oncology dataset
Source:R/simulate_data.R
simulate_data.RdParameterized function to quickly create an EHR-derived analytic cohort for analytic code development.
Details
This function simulates a cohort of patients with oncology data. The cohort is simulated using a Weibull distribution for time to event and a logistic distribution for treatment assignment. The function also allows for missingness to be imposed on the data. The function is parameterized to allow for the number of patients to be simulated, the seed for reproducibility, and whether to include a patient id variable.
Examples
if (FALSE) { # \dontrun{
library(encore.analytics)
# Original uniform missingness
data_uniform <- simulate_data(
n_total = 3500,
seed = 41,
include_id = FALSE,
imposeNA = TRUE,
propNA = .33
)
# Now creates variable missingness across columns with average of 0.33
data_variable <- simulate_data(
n_total = 3500,
seed = 41,
include_id = FALSE,
imposeNA = TRUE,
propNA = .33
)
head(data_variable)
} # }