---
title: "Simple Two-Stage Estimation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Simple Two-Stage Estimation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(trtswitch)
library(dplyr, warn.conflicts = FALSE)
```

# Introduction
The two-stage estimation (TSE) approach first estimates the effect of treatment switching on survival, then uses this estimate to derive counterfactual survival times if switching had not occurred. A key requirement of the TSE approach is that treatment switching must occur at or after a disease-related "secondary baseline" time point, typically defined by disease progression.

The TSE method assumes the no unmeasured confounding condition, meaning treatment switching must be independent of potential outcomes, provided all relevant patient characteristics at the secondary baseline are measured and included in the model. The simple TSE method further assumes that no time-dependent confounding exists between the secondary baseline and the time of switching.

# Estimation of $\psi$
The simple TSE method involves applying an accelerated failure time (AFT) model that compares post-progression survival in control group switchers with post-progression survival in control group non-switchers. Prognostic variables measured at the secondary baseline are included to account for differences between the groups. The treatment effect of switching is estimated as a time ratio and used to adjust the survival times of switchers.

Although $\psi$ is estimated solely from control group patients who experienced disease progression, it will also be applied to adjust the survival times of patients who switched treatments before disease progression, under the assumption that there are only a limited number of such cases.

# Estimation of Hazard Ratio
Once $\psi$ has been estimated, we can derive an adjusted data set and fit a (potentially stratified) Cox proportional hazards model to the adjusted data set to obtain an estimate of the hazard ratio. The confidence interval for the hazard ratio can be derived by bootstrapping the entire adjustment and subsequent model-fitting process.

# Example
We start by preparing the data.
```{r data}
# the eventual survival time
shilong1 <- shilong %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(-c("ps", "ttc", "tran"))

# the last value of time-dependent covariates before pd
shilong2 <- shilong %>%
  filter(pd == 0 | tstart <= dpd) %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(bras.f, id, ps, ttc, tran)

# combine baseline and time-dependent covariates
shilong3 <- shilong1 %>%
  left_join(shilong2, by = c("bras.f", "id"))
```

Next we apply the simple TSE method. 
```{r analysis}
fit1 <- tsesimp(
  data = shilong3, time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = FALSE)
```

We can examine the Weibull AFT model fits and the corresponding value of $\hat{\psi}$. 
```{r aft}
# control group
fit1$fit_aft[[1]]$fit$parest[, c("param", "beta", "sebeta", "z")]

fit1$psi

# experimental group
fit1$fit_aft[[2]]$fit$parest[, c("param", "beta", "sebeta", "z")]

fit1$psi_trt
```

Now we fit the outcome Cox model and compare the treatment hazard ratio estimate with the reported. 
```{r cox}
fit1$fit_outcome$parest[, c("param", "beta", "sebeta", "z")]

c(fit1$hr, fit1$hr_CI)
```

Finally, to ensure the uncertainty is accurately represented, the entire adjustment process and subsequent survival modeling can be bootstrapped.
````{verbatim}

```{r boot}
fit2 <- tsesimp(
  data = shilong3, time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = TRUE, n_boot = 1000, seed = 12345)

c(fit2$hr, fit2$hr_CI)
```

````