6 Agreement Metrics

Agreement metrics for comparing RCT and RWE results

In this vignette, we demonstrate how to use the agreement_metrics() and smd_agreement() functions from the encore.analytics package to quantitatively assess the agreement between results from randomized controlled trials (RCTs) and their real-world evidence (RWE) emulations. These metrics are particularly important in the context of trial emulation studies, where we aim to understand how well RWE analyses can reproduce or complement RCT findings.

6.1 Background

The use of real-world evidence for regulatory and clinical decision-making has gained significant attention, particularly with the FDA’s Real World Evidence Program¹. However, a key question remains: How well do RWE studies align with RCT results when attempting to answer similar clinical questions? Recent research by² in the RCT-DUPLICATE initiative has shown that when RWE studies can closely emulate trial design elements, they can achieve high concordance with RCT results, with correlation coefficients as high as 0.93 (95% CI: 0.79-0.97) in well-emulated studies.

6.2 Overview

The agreement_metrics() function provides a comprehensive framework to evaluate the concordance between RCT and RWE results using three complementary metrics that have been validated in large-scale emulation studies²:

Statistical significance agreement
Estimate agreement
Standardized mean difference (SMD) agreement

6.3 Methodology

6.3.1 Types of Agreement

The package implements three complementary approaches to assess agreement:

Statistical Significance Agreement: Evaluates whether the RCT and RWE results align in terms of both direction and statistical significance.
Estimate Agreement: Determines whether the RWE point estimate falls within the confidence interval of the RCT result.
SMD Agreement: Calculates a standardized mean difference that accounts for both the magnitude of difference between estimates and their uncertainty.

6.3.2 Mathematical Details

The SMD calculation, which is particularly important for quantifying agreement, follows this methodology:

For estimates \(\theta_{RCT}\) and \(\theta_{RWE}\) with their respective variances, the SMD is calculated as:

\[SMD = \frac{\theta_{RCT} - \theta_{RWE}}{\sqrt{Var(\theta_{RCT}) + Var(\theta_{RWE})}}\]

where variances are derived from confidence intervals assuming normal distribution:

\[Var(\theta) = \left(\frac{upper - lower}{2 \times 1.96}\right)^2\]

The default threshold for SMD agreement is ±1.96, corresponding to α=0.05.

6.4 Example Application

Let’s walk through examples of using these functions:

library(dplyr)
library(tibble)
library(encore.analytics)

6.4.1 Simple Comparison

First, let’s look at a simple comparison between one RCT and RWE result:

# Create example data
x <- tribble(
  ~Analysis, ~rct_estimate, ~rct_lower, ~rct_upper, ~rwe_estimate, ~rwe_lower, ~rwe_upper,
  "Main analysis", 0.87, 0.78, 0.97, 0.82, 0.76, 0.87
  )

# Calculate agreement metrics
agreement_metrics(x, analysis_col = "Analysis")

Analysis	HR (95% CI)		Statistical significance agreement	Estimate agreement	SMD
Analysis	RCT	RWE	Statistical significance agreement	Estimate agreement	SMD
Main analysis	0.87 (0.78 - 0.97)	0.82 (0.76 - 0.87)	Yes	Yes	Yes (0.90)
Abbreviations: CI = Confidence interval, HR = Hazard ratio, RCT = Randomized controlled trial, RWE = Real-world evidence, SMD = standardized mean difference (based on log hazard ratios)

6.4.2 Multi-Database Comparison

Now let’s examine agreement across multiple databases:

# Create multi-database example
x_multi <- tribble(
  ~Analysis, ~Database, ~rct_estimate, ~rct_lower, ~rct_upper, ~rwe_estimate, ~rwe_lower, ~rwe_upper,
  "Main analysis", "Database 1", 0.87, 0.78, 0.97, 0.82, 0.76, 0.87,
  "Main analysis", "Database 2", 0.50, 0.40, 0.60, 2.00, 1.80, 2.20,
  "Main analysis", "Database 3", 0.80, 0.70, 0.90, 1.50, 1.40, 1.60
  )

# Calculate agreement metrics with grouping
agreement_metrics(x_multi, 
                 analysis_col = "Analysis", 
                 group_col = "Database")

Analysis	HR (95% CI)		Statistical significance agreement	Estimate agreement	SMD
Analysis	RCT	RWE	Statistical significance agreement	Estimate agreement	SMD
Database 1
Main analysis	0.87 (0.78 - 0.97)	0.82 (0.76 - 0.87)	Yes	Yes	Yes ( 0.90)
Database 2
Main analysis	0.50 (0.40 - 0.60)	2.00 (1.80 - 2.20)	No	No	No (-12.01)
Database 3
Main analysis	0.80 (0.70 - 0.90)	1.50 (1.40 - 1.60)	No	No	No ( -8.66)
Abbreviations: CI = Confidence interval, HR = Hazard ratio, RCT = Randomized controlled trial, RWE = Real-world evidence, SMD = standardized mean difference (based on log hazard ratios)

6.4.3 Detailed SMD Calculation

To understand the SMD calculation in detail:

# Calculate SMD for one comparison
smd <- smd_agreement(
  rct_estimate = log(0.87),
  rct_lower = log(0.78),
  rct_upper = log(0.97),
  rwe_estimate = log(0.82),
  rwe_lower = log(0.76),
  rwe_upper = log(0.87)
  )

print(paste("SMD value:", round(smd, 2)))

[1] "SMD value: 0.9"

6.5 Interpretation

The output table from agreement_metrics() provides a comprehensive view of agreement:

RCT and RWE Estimates: Shows point estimates and confidence intervals
Statistical Agreement: Indicates if results agree in direction and significance
Estimate Agreement: Shows if RWE estimate falls within RCT confidence interval
SMD Agreement: Provides standardized difference with threshold-based assessment

Results should be interpreted considering:

Clinical relevance of differences
Quality and characteristics of both RCT and RWE studies
Context-specific tolerance for disagreement

6.6 Technical Details

Important considerations when using these functions:

All estimates should be positive (e.g., hazard ratios, odds ratios)
Estimates are log-transformed for SMD calculation
Confidence intervals are assumed to be 95% intervals
The default SMD threshold of 1.96 corresponds to α=0.05

6.7 Interpretation Guidelines

When interpreting agreement metrics,² suggest several important considerations:

Context Matters: Agreement thresholds may vary depending on the clinical context and the feasibility of emulating specific trial design elements.
Design Emulation: Higher agreement is typically observed when RWE studies can closely emulate key trial design elements (population, intervention, comparator, outcome, and timing).
Multiple Metrics: Using multiple agreement metrics provides a more complete picture than any single metric alone. For example:
- Statistical significance agreement captures directional alignment
- Estimate agreement ensures magnitude compatibility
- SMD provides a standardized measure of difference accounting for uncertainty
Limitations: Discrepancies between RCT and RWE results may arise from:
- Residual confounding in RWE studies
- Different patient populations
- Measurement challenges in real-world data
- Random variation

6.8 References

US Food and Drug Administration. Framework for FDA’s real-world evidence program. 2018. Available at: https://www.fda.gov/media/120060/download.

Wang SV, Schneeweiss S, Franklin JM, et al. Emulation of randomized clinical trials with nonrandomized database analyses: Results of 32 clinical trials. JAMA 2023; 329: 1376–1385. doi:10.1001/jama.2023.4221.

6.9 Session info

Script runtime: minutes.

pander::pander(subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion)))

	package	loadedversion
dplyr	dplyr	1.1.4
encore.analytics	encore.analytics	0.2.1
tibble	tibble	3.3.0

pander::pander(sessionInfo())

R version 4.5.1 (2025-06-13)

Platform: x86_64-pc-linux-gnu

locale: LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, datasets, utils, methods and base

other attached packages: encore.analytics(v.0.2.1), tibble(v.3.3.0) and dplyr(v.1.1.4)

loaded via a namespace (and not attached): jsonlite(v.2.0.0), compiler(v.4.5.1), renv(v.1.1.5), Rcpp(v.1.1.0), tidyselect(v.1.2.1), xml2(v.1.5.1), stringr(v.1.6.0), assertthat(v.0.2.1), tidyr(v.1.3.2), yaml(v.2.3.12), fastmap(v.1.2.0), R6(v.2.6.1), commonmark(v.2.0.0), generics(v.0.1.4), knitr(v.1.51), htmlwidgets(v.1.6.4), MASS(v.7.3-65), pander(v.0.6.6), pillar(v.1.11.1), rlang(v.1.1.6), stringi(v.1.8.7), litedown(v.0.9), xfun(v.0.55), fs(v.1.6.6), sass(v.0.4.10), smd(v.0.8.0), cli(v.3.6.5), withr(v.3.0.2), magrittr(v.2.0.4), tictoc(v.1.2.1), digest(v.0.6.39), markdown(v.2.0), base64enc(v.0.1-3), lifecycle(v.1.0.4), vctrs(v.0.6.5), evaluate(v.1.0.5), glue(v.1.8.0), sessioninfo(v.1.2.3), gt(v.1.2.0), rmarkdown(v.2.30), purrr(v.1.2.0), tools(v.4.5.1), pkgconfig(v.2.0.3) and htmltools(v.0.5.9)

pander::pander(options('repos'))

repos:

Table continues below

CRAN

https://cran.rstudio.com

PositPackageManager

https://packagemanager.posit.co/cran/linux/noble/latest