%\VignetteIndexEntry{Multi-state modelling with flexsurv} %\VignetteEngine{knitr::knitr} % \documentclass[article,shortnames]{jss} \documentclass[article,shortnames,nojss,nofooter]{jss} \usepackage{bm} \usepackage{tabularx} \usepackage{graphics} \usepackage{alltt} \usepackage[utf8]{inputenc} <>= options(width=60) options(prompt="R> ") library(knitr) opts_chunk$set(fig.path="flexsurv-") render_sweave() @ %% need no \usepackage{Sweave.sty} \author{Christopher H. Jackson \\ MRC Biostatistics Unit, Cambridge, UK} \title{Flexible parametric multi-state modelling with flexsurv} \Plainauthor{Christopher Jackson, MRC Biostatistics Unit} \Address{ Christopher Jackson\\ MRC Biostatistics Unit\\ Cambridge Institute of Public Health\\ Robinson Way\\ Cambridge, CB2 0SR, U.K.\\ E-mail: \email{chris.jackson@mrc-bsu.cam.ac.uk} } \Abstract{ A \emph{multi-state model} represents how an individual moves between multiple states in continuous time. Survival analysis is a special case with two states, ``alive'' and ``dead''. \emph{Competing risks} are a further special case, where there are multiple causes of death, that is, one starting state and multiple possible destination states. This vignette describes the two forms of multi-state or competing risks models that can be implemented in \pkg{flexsurv}. Section~\ref{sec:causespec} describes models based on \emph{cause-specific hazards}. Section~\ref{sec:mixture} describes a less commonly-used approach based on \emph{mixture models}. Cause-specific hazards models tend to be faster to fit, whereas the parameters of the mixture models are easier to interpret. } \Keywords{multi-state models, multistate models, competing risks} \begin{document} \section{Overview} \label{sec:multistate} This vignette describes multi-state models for data where there are a series of event times $t_{1},\dots, t_{n}$ for an individual, corresponding to changes of state. The last of these may be an observed or right-censored event time. Note \emph{panel data} are not considered here --- that is, observations of the state of the process at an arbitrary set of times \citep{kalbfleisch:lawless}. In panel data, we do not necessarily know the time of each transition, or even whether transitions of a certain type have occurred at all between a pair of observations. Multi-state models for that type of data (and also exact event times) can be fitted with the \pkg{msm} package for \proglang{R} \citep{msmjss}, but are restricted to (piecewise) exponential event time distributions. Knowing the exact event times enables much more flexible models, which \pkg{flexsurv} can fit. The \pkg{flexsurv} package provides two general frameworks for multi-state modelling. \begin{enumerate} \item The most common framework is based on \emph{cause-specific hazards} of competing risks. This is an extension of standard survival modelling, and is explained in Section~\ref{sec:causespec}. \item An alternative approach is based on \emph{mixture models}. For example, suppose there are three competing states that an individual in state $r$ may move to next. People are then classified into three \emph{mixture components}, defined by the state the person moves to. The probabilities of each next state, and the distribution of the time of moving to each state, are estimated jointly. These quantities are easier to interpret than cause specific hazards. This approach was originally described by \citet{larson1985mixture}. In Section~\ref{sec:mixture} we explain how to implement it using the \code{flexsurvmix} function. \end{enumerate} \section{Multi-state modelling using cause-specific hazards} \label{sec:causespec} Given that an individual is in state $X(t)$ at time $t$, their next state, and the time of the change, are governed by a set of \emph{transition intensities} or \emph{transition hazards} \[q_{rs}(t,\mathbf{z}(t),\mathcal{F}_t) = \lim_{\delta t \rightarrow 0} \Prob(X(t+\delta t) = s | X(t) = r, \mathbf{z}(t), \mathcal{F}_t) / \delta t \] for states $r, s = 1,\dots,R$, which for a survival model are equivalent to the hazard $h(t)$. The intensity represents the instantaneous risk of moving from state $r$ to state $s$, and is zero if the transition is impossible. It may depend on covariates $\mathbf{z}(t)$, the time $t$ itself, and possibly also the ``history'' of the process up to that time, $\mathcal{F}_t$: the states previously visited or the length of time spent in them. \paragraph{Alternative time scales} In semi-Markov (``clock-reset'') models, $q_{rs}(t)$ is defined as a function of the time $t$ since entry into the current state. Any software to fit survival models can also fit this kind of multi-state model, as the following sections will explain. In an inhomogeneous Markov model, $t$ represents the time since the beginning of the process (that is, a ``clock-forward'' scale is used), but the intensity $q_{rs}(t)$ does not depend further on $\mathcal{F}_t$. Again, standard survival modelling software can be used, with the additional requirement that it can deal with left-truncation or \emph{counting process} data, which \code{survreg}, for example, does not currently support. These approaches are equivalent for competing risks models, since there is at most one transition for each individual, so that the time since the beginning of the process equals the time spent in the current state. Therefore no left-truncation is necessary. Note also that in a clock-reset model, the time since the beginning of the process may enter the model as a covariate. Likewise, in a clock-forward model, the time spent in the current state may enter as a covariate, in which case the model is no longer Markov. \paragraph{Example} For illustration, consider a simple three-state example, previously studied by \citet{heng:paper}. Recipients of lung transplants are are risk of bronchiolitis obliterans syndrome (BOS). This was defined as a decrease in lung function to below 80\% of a baseline value defined in the six months following transplant. A three-state ``illness-death'' model represents the risk of developing BOS, the risk of dying before developing BOS, and the risk of death after BOS. BOS is assumed to be irreversible, so there are only three allowed transitions (Figure \ref{fig:bosmsm}), each with an intensity function $q_{rs}(t)$. \begin{figure}[h] \centering \scalebox{0.6}{\includegraphics{bosmsm.pdf}} \caption{Three-state multi-state model for bronchiolitis obliterans syndrome (BOS).} \label{fig:bosmsm} \end{figure} \subsection{Representing multi-state data as survival data} \label{sec:multistate:data} \citet{andersen:keiding} and \citet{putter:mstate} explain how to implement multi-state models by manipulating the data into a suitable form for survival modelling software --- an overview is given here. For each permitted $r \rightarrow s$ transition in the multi-state model, there is a corresponding ``survival'' (time-to-event) model, with hazard rates defined by $q_{rs}(t)$. For a patient who enters state $r$ at time $t_{j}$, their next event at $t_{j+1}$ is defined by the model structure to be one of a set of competing events $s_1,\ldots,s_{n_r}$. This implies there are $n_r$ corresponding survival models for this state $r$, and $\sum_r n_r$ models over all states $r$. In the BOS example, there are $n_1=2,n_2=1$ and $n_3=0$ possible transitions from states 1, 2 and 3 respectively. The data to inform the $n_r$ models from state $r$ consists firstly of an indicator for whether the transition to the corresponding state $s_1,\ldots,s_{n_r}$ is observed or censored at $t_{j+1}$. If the individual moves to state $s_k$, the transitions to all other states in this set are censored at this time. This indicator is coupled with: \begin{itemize} \item (for a semi-Markov model) the time elapsed $dt_{j} = t_{j+1} - t_{j}$ from state $r$ entry to state $s$ entry. The ``survival'' model for the $r \rightarrow s$ transition is fitted to this time. \item (for an inhomogeneous Markov model) the start and stop time $(t_j,t_{j+1})$, as in \S\ref{sec:censtrunc}. The $r \rightarrow s$ model is fitted to the right-censored time $t_{j+1}$ from the \emph{start of the process}, but is conditional on not experiencing the $r \rightarrow s$ transition until after the state $r$ entry time. In other words, the $r \rightarrow s$ transition model is \emph{left-truncated} at the state $r$ entry time. \end{itemize} In this form, the outcomes of two patients in the BOS data are <<>>= library(flexsurv) bosms3[18:22, ] @ Each row represents an observed (\code{status = 1}) or censored (\code{status = 0}) transition time for one of three time-to-event models indicated by the categorical variable \code{trans} (defined as a factor). Times are expressed in years, with the baseline time 0 representing six months after transplant. Values of \code{trans} of 1, 2, 3 correspond to no BOS$\rightarrow$BOS, no BOS$\rightarrow$death and BOS$\rightarrow$death respectively. The first row indicates that the patient (\code{id} 7) moved from state 1 (no BOS) to state 2 (BOS) at 0.17 years, but (second row) this is also interpreted as a censored time of moving from state 1 to state 3, potential death before BOS onset. This patient then died, given by the third row with \code{status} 1 for \code{trans} 3. Patient 8 died before BOS onset, therefore at 8.2 years their potential BOS onset is censored (fourth row), but their death before BOS is observed (fifth row). The \pkg{mstate} \proglang{R} package \citep{mstate:cmpb,mstate:jss} has a utility \code{msprep} to produce data of this form from ``wide-format'' datasets where rows represent individuals, and times of different events appear in different columns. \pkg{msm} has a similar utility \code{msm2Surv} for producing the required form given longitudinal data where rows represent state observations. \subsection{Multi-state model likelihood with cause-specific hazards} \label{sec:multistate:lik} After forming survival data as described above, a multi-state model can be fitted by maximising the standard survival model likelihood~(given at the start of the main \pkg{flexsurv} vignette), $l(\bm{\theta} | \mathbf{x}) = \prod_i l_i(\bm{\theta}|x_i)$, where $\mathbf{x}$ is the data, and $i$ now indexes multiple observations for multiple individuals. This can also be written as a product over the $K=\sum_r n_r$ transitions $k$, and the $m_k$ observations $j$ pertaining to the $k$th transition. The transition type will typically enter this model as a categorical covariate --- see the examples in the next section. \begin{equation} \label{eq:lik:multistate} l(\bm{\theta} | \mathbf{x}) = \prod_{k=1}^K \prod_{j=1}^{m_k} l_{jk}(\bm{\theta}|\mathbf{x}_{jk}) \end{equation} Therefore if the parameter vector $\bm{\theta}$ can be partitioned as $(\bm{\theta}_1|\ldots|\bm{\theta}_K)$, independent components for each transition $k$, the likelihood becomes the product of $K$ independent transition-specific likelihoods \citep{andersen:keiding}. The full multi-state model can then be fitted by maximising each of these independently, using $K$ separate calls to a survival modelling function such as \code{flexsurvreg}. This can give vast computational savings over maximising the joint likelihood for $\bm{\theta}$ with a single fit. For example, \citet{francesca:smmr} used \pkg{flexsurv} to fit a parametric multi-state model with 21 transitions and 84 parameters for over 30,000 observations, which was computationally impractical via the joint likelihood, whereas it only took about a minute to perform 21 transition-specific fits. On the other hand, if any parameters are constrained between transitions (e.g. if hazards are proportional between transitions, or the effects of covariates on different transitions are the same) then it is necessary to maximise the joint likelihood~(\ref{eq:lik:multistate}) with a single call. \subsection{Fitting parametric multi-state models with cause-specific hazards} \label{sec:multistate:fitting} \paragraph{Joint likelihood} Three multi-state models are fitted to the BOS data using \code{flexsurvreg}, firstly using a single likelihood maximisation for each model. The first two use the ``clock-reset'' time scale. \code{crexp} is a simple time-homogeneous Markov model where all transition intensities are constant through time, so that the clock-forward and clock-reset scales are identical. The time to the next event is exponentially-distributed, but with a different rate $q_{rs}$ for each transition type \code{trans}. \code{crwei} is a semi-Markov model where the times to BOS onset, death without BOS and the time from BOS onset to death all have Weibull distributions, with a different shape and scale for each transition type. \code{cfwei} is a clock-forward, inhomogeneous Markov version of the Weibull model: the 1$\rightarrow$2 and 1$\rightarrow$3 transition models are the same, but the third has a different interpretation, now the time \emph{from baseline} to death with BOS has a Weibull distribution. <<>>= crexp <- flexsurvreg(Surv(years, status) ~ trans, data = bosms3, dist = "exp") crwei <- flexsurvreg(Surv(years, status) ~ trans + shape(trans), data = bosms3, dist = "weibull") cfwei <- flexsurvreg(Surv(Tstart, Tstop, status) ~ trans + shape(trans), data = bosms3, dist = "weibull") @ \paragraph{Semi-parametric equivalents} The equivalent Cox models are also fitted using \code{coxph} from the \pkg{survival} package. These specify a different baseline hazard for each transition type through a function \code{strata} in the formula, so since there are no other covariates, they are essentially non-parametric. Note that the \code{strata} function is not currently understood by \code{flexsurvreg} --- the user must say explicitly what parameters, if any, vary with the transition type, as in \code{crwei}. <<>>= crcox <- coxph(Surv(years, status) ~ strata(trans), data = bosms3) cfcox <- coxph(Surv(Tstart, Tstop, status) ~ strata(trans), data = bosms3) @ In all cases, if there were other covariates, they could simply be included in the model formula. Typically, covariate effects will vary with the transition type, so that an interaction term with \code{trans} would be included. Some post-processing might then be needed to combine the main covariate effects and interaction terms into an easily-interpretable quantity (such as the hazard ratio for the $r,s$ transition). Alternatively, \pkg{mstate} has a utility \code{expand.covs} to expand a single covariate in the data into a set of transition-specific covariates, to aid interpretation \citep[see][]{mstate:jss}. \paragraph{Transition-specific models} In this small example, the joint likelihood can be maximised easily with a single function call, but for larger models and datasets, this may be unfeasible. A more computationally-efficient approach is to fit a list of transition-specific models, as follows. <<>>= mod_nobos_bos <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==1), data = bosms3, dist = "weibull") mod_nobos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==2), data = bosms3, dist = "weibull") mod_bos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==3), data = bosms3, dist = "weibull") @ We then define a matrix \code{tmat} describes the transition structure of the multi-state model , as a matrix of integers whose $r,s$ entry is $i$ if the $i$th transition type is $r,s$, and has \code{NA}s on the diagonal and where the $r,s$ transition is disallowed. <<>>= tmat <- rbind(c(NA, 1, 2), c(NA, NA, 3), c(NA, NA, NA)) crfs <- fmsm(mod_nobos_bos, mod_nobos_death, mod_bos_death, trans = tmat) @ The three transition models are then grouped together, and the transition matrix is attached, using the \code{fmsm} function. This constructs an R object, \code{crfs}, that fully describes the multistate model. The \code{fmsm} object can be supplied to the output and prediction functions described in the subsequent sections, instead of a single \code{flexsurvreg} object. However, this approach is not possible if there are constraints in the parameters across transitions, such as common covariate effects. \begin{sloppypar} Any parametric distribution can be employed in a multi-state model, just as for standard survival models, with \code{flexsurvreg}. Spline models may also be fitted with \code{flexsurvspline}, and if hazards are assumed proportional, they are expected to give similar results to the Cox model. Since \pkg{flexsurv} version 2.0, different parametric families can be used for different transitions, though in earlier versions, the same family had to be used. \end{sloppypar} \subsection{Obtaining cumulative transition-specific hazards} Multi-state models can be characterised by their cumulative $r \rightarrow s$ transition-specific hazard functions $H_{rs}(t) = \int_0^t q_{rs}(u)du$. For \emph{semi-parametric} multi-state models fitted with \code{coxph}, the function \code{msfit} in \pkg{mstate} \citep{mstate:cmpb,mstate:jss} provides piecewise-constant estimates and covariances for $H_{rs}(t)$. For the Cox models for the BOS data, <<>>= require("mstate") mrcox <- msfit(crcox, trans = tmat) mfcox <- msfit(cfcox, trans = tmat) @ \pkg{flexsurv} provides an analogous function \code{msfit.flexsurvreg} to produce cumulative hazards from \emph{fully-parametric} multi-state models in the same format. This is a short wrapper around \code{summary.flexsurvreg(..., type = "cumhaz")}, previously mentioned in \S\ref{sec:plots}. The difference from \pkg{mstate}'s method is that hazard estimates can be produced for any grid of times \code{t}, at any level of detail and even beyond the range of the data, since the model is fully parametric. The argument \code{newdata} can be used in the same way to specify a desired covariate category, though in this example there are no covariates in addition to the transition type. The name of the (factor) covariate indicating the transition type can also be supplied through the \code{tvar} argument, in this case it is the default, \code{"trans"}. <<>>= tgrid <- seq(0, 14, by = 0.1) mrwei <- msfit.flexsurvreg(crwei, t = tgrid, trans = tmat) mrexp <- msfit.flexsurvreg(crexp, t = tgrid, trans = tmat) mfwei <- msfit.flexsurvreg(cfwei, t = tgrid, trans = tmat) @ These can be plotted (Figure \ref{fig:bos:cumhaz}) to show the fit of the parametric models compared to the non-parametric estimates. Both models appear to fit adequately, though give diverging extrapolations after around 6 years when the data become sparse. The Weibull clock-reset model has an improved AIC of \Sexpr{round(crwei$AIC)}, compared to \Sexpr{round(crexp$AIC)} for the exponential model. For the $2\rightarrow 3$ transition, the clock-forward and clock-reset models give slightly different hazard trajectories. <>= cols <- c("black", "#E495A5", "#86B875", "#7DB0DD") # colorspace::rainbow_hcl(3) plot(mrcox, xlab = "Years after baseline", lwd = 3, xlim = c(0, 14), cols = cols[1:3]) for (i in 1:3){ lines(tgrid, mrexp$Haz$Haz[mrexp$Haz$trans == i], col = cols[i], lty = 2, lwd = 2) lines(tgrid, mrwei$Haz$Haz[mrwei$Haz$trans == i], col = cols[i], lty = 3, lwd = 2) } lines(mfcox$Haz$time[mfcox$Haz$trans == 3], mfcox$Haz$Haz[mfcox$Haz$trans == 3], type = "s", col = cols[4], lty = 1, lwd = 2) lines(tgrid, mfwei$Haz$Haz[mfwei$Haz$trans == 3], col = cols[4], lty = 3, lwd = 2) legend("topleft", inset = c(0, 0.2), lwd = 2, col = cols[4], c("2 -> 3 (clock-forward)"), bty = "n") legend("topleft", inset = c(0, 0.3), c("Non-parametric", "Exponential", "Weibull"), lty = c(1, 2, 3), lwd = c(3, 2, 2), bty = "n") @ \begin{figure}[h] \includegraphics{flexsurv-cumhaz-1} \caption{Cumulative hazards for three transitions in the BOS multi-state model (clock-reset), under non-parametric, exponential and Weibull models. For the $2\rightarrow 3$ transition, an alternative clock-forward scale is shown for the non-parametric and Weibull models.} \label{fig:bos:cumhaz} \end{figure} \subsection{Prediction from parametric multi-state models with cause-specific hazards} The \emph{transition probabilities} of the multi-state model are the probabilities of occupying each state $s$ at time $t > t_0$, given that the individual is in state $r$ at time $t_0$. \[ P(t_0, t) = \Prob(X(t) = s | X(t_0) = r) \] \paragraph{Markov models} For a time-inhomogeneous Markov model, these are related to the transition intensities via the Kolmogorov forward equation \[ \frac{d P(t_0,t)}{dt} = P(t_0,t) Q(t) \] with initial condition $P(t_0,t_0) = I$ \citep{cox:miller}. This can be solved numerically, as in \citet{titman:nonhomog}. This is implemented in the function \code{pmatrix.fs}, using the \pkg{deSolve} package \citep{deSolve}. This returns the full transition probability matrix $P(t_0,t)$ from time $t_0=0$ to a time or set of times $t$ specified in the call. Under the Weibull model, the probability of remaining alive and free of BOS is estimated at 0.3 at 5 years and 0.09 at 10 years: <<>>= pmatrix.fs(cfwei, t = c(5, 10), trans = tmat) @ Confidence intervals can be obtained by simulation from the asymptotic distribution of the maximum likelihood estimates --- see \code{help(pmatrix.fs)} for full details. A similar function \code{totlos.fs} is provided to estimate the expected total amount of time spent in state $s$ up to time $t$ for a process that starts in state $r$, defined as $\int_{u=0}^tP(0,u)_{rs}du$. \paragraph{Semi-Markov models} For semi-Markov models, the Kolmogorov equation does not apply, since the transition intensity matrix $Q(t)$ is no longer a deterministic function of $t$, but depends on when the transitions occur between time $t_0$ and $t$. Predictions can then be made by simulation. The function \code{sim.fmsm} simulates trajectories from parametric semi-Markov models by repeatedly generating the time to the next transition until the individual reaches an absorbing state or a specified censoring time. This requires the presence of a function to generate random numbers from the underlying parametric distribution --- and is fast for built-in distributions which use vectorised functions such as \code{rweibull}. \code{pmatrix.simfs} calculates the transition probability matrix by using \code{sim.fmsm} to simulate state histories for a large number of individuals, by default 100000. Simulation-based confidence-intervals are also available in \code{pmatrix.simfs}, at an extra computational cost, and the expected total length of stay in each state is available from \code{totlos.simfs}. <>= pmatrix.simfs(crwei, trans = tmat, t = 5) pmatrix.simfs(crwei, trans = tmat, t = 10) @ \paragraph{Prediction via mstate} Alternatively, predictions can be made by supplying the cumulative transition-specific hazards, calculated with \code{msfit.flexsurvreg}, to functions in the \pkg{mstate} package. For Markov models, the solution to the Kolmogorov equation \citep[e.g.,][]{aalen:process} is given by a product integral, which can be approximated as \[ P(t_0, t) = \prod_{i=0}^{m-1} \left\{ I + Q(t_i)dt \right\} \] where a fine grid of times $t_0,t_1,\ldots,t_m=t$ is chosen to span the prediction interval, and $Q(t_i)dt$ is the increment in the cumulative hazard matrix between times $t_i$ and $t_{i+1}$. $Q$ may also depend on other covariates, as long as these are known in advance. In \pkg{mstate}, these can be calculated with the \code{probtrans} function, applied to the cumulative hazards returned by \code{msfit}. For Cox models, the time grid is naturally defined by the observed survival times, giving the Aalen-Johansen estimator \citep{andersen}. Here, the probability of remaining alive and free of BOS is estimated at 0.27 at 5 years and 0.17 at 10 years. <<>>= ptc <- probtrans(mfcox, predt = 0, direction = "forward")[[1]] round(ptc[c(165, 193),], 3) @ For parametric models, using a similar discrete-time approximation was suggested by \citet{cook:lawless}. This is achieved by passing the object returned by \code{msfit.flexsurvreg} to \code{probtrans} in \pkg{mstate}. It can be made arbitrarily accurate by choosing a finer resolution for the grid of times when calling \code{msfit.flexsurvreg}. <<>>= ptw <- probtrans(mfwei, predt = 0, direction = "forward")[[1]] round(ptw[ptw$time %in% c(5, 10),], 3) @ \code{pstate1}--\code{pstate3} are close to the first rows of the matrices returned by \code{pmatrix.fs}. The discrepancy from the Cox model is more marked at 10 years when the data are more sparse (Figure \ref{fig:bos:cumhaz}). A finer time grid would be required to achieve a similar level of accuracy to \code{pmatrix.fs} for the point estimates, at the cost of a slower run time than \code{pmatrix.fs}. However, an advantage of \code{probtrans} is that standard errors are available more cheaply. For semi-Markov models, \pkg{mstate} provides the function \code{mssample} to produce both simulated trajectories and transition probability matrices from semi-Markov models, given the estimated piecewise-constant cumulative hazards \citep{fiocco:mstatepred}, produced by \code{msfit} or \code{msfit.flexsurvreg}, though this is generally less efficient than \code{pmatrix.simfs}. In this example, 1000 samples from \code{mssample} give estimates of transition probabilities that are accurate to within around 0.02. However with \code{pmatrix.simfs}, greater precision is achieved by simulating 100 times as many trajectories in a shorter time. <>= mssample(mrcox$Haz, trans = tmat, clock = "reset", M = 1000, tvec = c(5, 10)) mssample(mrwei$Haz, trans = tmat, clock = "reset", M = 1000, tvec = c(5, 10)) @ \subsection{Next-state probabilities} \label{sec:nextstate} In a multi-state situation we are usually interested in the probability that a person in one state will move to a specific state next, rather than to the other competing states. In the BOS example we might want to estimate the probability that a patient will die before getting BOS, or get BOS before dying. In a mixture multi-state model (Section~\ref{sec:mixture}), these are explicit parameters of the model. In a multi-state model parameterised by cause-specific hazards, these probabilities are related to the hazards through the Kolmogorov forward equation. To obtain these, we consider the full multi-state model as a set of \emph{competing risks submodels}. There is one submodel for each state a person can transition from (the "transient" states of the model). In the BOS example, there is one submodel for the transitions from no BOS (with two competing risks: BOS and death), and a second submodel for the single transition from BOS to death. The probability that the the next state after $r$ is $s$ can then be obtained in practice as the transition probability $P(X(t)=s | X(t_0)=r)$, under the submodel for transient state $r$, for a very large time $t$. \code{pmatrix.fs} can be used for this. <<>>= tmat_nobos <- rbind("No BOS"=c(NA,1,2), "BOS"=c(NA,NA,NA),"Death"=c(NA,NA,NA)) crfs_nobos <- fmsm(mod_nobos_bos, mod_nobos_death, trans = tmat_nobos) pmatrix.fs(crfs_nobos, from=1, t=100000)["No BOS",c("BOS","Death")] @ In a time-homogeneous Markov model, i.e. where the times from state $r$ to state $s$ are all exponentially distributed with a constant rate $\lambda_{rs}$, the probability that the next state after $r$ is $s$ is simply $\lambda_{rs} / \sum_u \lambda_{ru}$, where the sum is taken over all possible next states after $r$. We can check the result from \code{pmatrix.fs} agrees with this: <<>>= modexp_nobos_bos <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==1), data = bosms3, dist = "exponential") modexp_nobos_death <- flexsurvreg(Surv(years, status) ~ 1, subset=(trans==2), data = bosms3, dist = "exponential") crfs_nobos <- fmsm(modexp_nobos_bos, modexp_nobos_death, trans=tmat_nobos) pmatrix.fs(crfs_nobos, from=1, t=100000)["No BOS",c("BOS","Death")] rate12 <- modexp_nobos_bos$res["rate","est"] rate13 <- modexp_nobos_death$res["rate","est"] rate12 / (rate12 + rate13) @ \subsection{Distribution of the time to the next state} \label{sec:nexttime} Under the cause-specific hazards model, the time until the next observed transition can be considered to equal the \emph{minimum} of a set of latent times --- one latent time for each potential transition whose cause-specific hazard defines the multi-state model. The distribution of this time is not generally known analytically, given a set of parametric distributions defining the cause-specific hazards. For example, if there were two competing latent event times $T_1$, $T_2$ with cause-specific hazards distributed as Gamma$(a_1,b_1)$ and Weibull$(a_2,b_2)$ respectively, then the equivalent mixture model would be specified by the conditional distributions of $T_1|T_1 T_2$, which wouldn't have a standard form. Instead this time can be computed using simulation, via \code{sim.fmsm}. The function \code{simfinal_fmsm} wraps \code{sim.fmsm} to summarise the distribution of the time to each absorbing state in a multi-state model, conditionally on that state occurring. If this is applied to a \emph{competing-risks submodel} for state $r$ of a multi-state model, this simply produces the time to the \emph{next} state in the full model, since the absorbing states of the submodel define the potential next states after state $r$. This is done here to calculate the mean, median and interquartile range for the time to BOS (given survival) and the time to death (given no BOS before death). The function also produces estimates of the probability of each of these states, though as we saw in Section~\ref{sec:nextstate}, this is available more efficiently via \code{pmatrix.fs}. Note the number of simulated individuals $M$ can be increased for more accuracy, and optionally the $B$ argument can be supplied to obtain simulation-based confidence intervals around the estimates. <<>>= crfs_nobos <- fmsm(mod_nobos_bos, mod_nobos_death, trans = tmat_nobos) simfinal_fmsm(crfs_nobos, probs = c(0.25, 0.5, 0.75), M=10000) @ \subsection{Obtaining probabilities of, and times to, a final absorbing state} \label{sec:ultimate} As the previous section mentioned, \code{simfinal_fmsm} can be applied to any multi-state model with an absorbing state, to estimate the distribution of the time from the start of the process to the absorbing state. If there are multiple absorbing states, it can also estimate the probability that the individual ends up in each one. For the BOS model, there is only one absorbing state, death, so the function returns a probability of 1 that the patient will eventually die, and summaries of the distribution of the time from transplant to death. <<>>= simfinal_fmsm(crfs, probs = c(0.25, 0.5, 0.75), M=10000) @ \code{simfinal_fmsm} requires a semi-Markov multi-state model, or a simple competing risks model, constructed through \code{fmsm}. Though for a simple competing risks model, \code{pmatrix.fs} can be used to get the probability of each competing state, as in~\ref{sec:nextstate}. \section{Multi-state modelling using mixtures} \label{sec:mixture} \subsection{Definitions} A ``mixture'' multi-state model, first described for competing risks models by \citet{larson1985mixture} is described in terms of: \begin{itemize} \item the probability $\pi_{rs}$ that the next state after state $r$ is state $s$ \item the distribution of the time $T_{rs}$ from state $r$ entry to state $s$ entry, given that the next state after $r$ is state $s$. \end{itemize} In other words, the transition intensity is defined by \[ q_{rs}(t) = \left\{ \begin{array}{ll} q^*_{rs}(t) & \mbox{if } I_{r} = s \\ 0 & \mbox{if } I_{r} \neq s\\ \end{array} \right. \] where $I_{r}$ is a latent categorical variable that determines which transition will happen next for an individual in state $r$, governed by probabilities $\pi_{r,s} = P(I_{r} = s)$, with $\sum_{s \in \mathcal{S}_r} \pi_{r,s} = 1$ over the potential next state $\mathcal{S}_r$ after state $r$. The transition intensity $q^*_{rs}(t) $ is defined by the hazard function of a parametric distribution that governs the time $S_{rs}$ from state $r$ entry until the transition to state $s$, \emph{given that this is the transition that occurs}. The mixture model describes a mechanism where the transition that happens is determined ``in advance" at time zero, whereas in the cause-specific hazards model, the risks of different events ``compete" with each other as time proceeds. However, in practice, both models can represent situations where individuals can experience any of the competing events. In the mixture model, we only specify a distribution for the time to the event \emph{that happens first} among the competing risks, and do not model events that don't occur. As discussed by \citet{cox1959analysis}, if the time-to-event distributions are specified correctly in both frameworks, then they can both represent the same process. In practice however, they will differ. As discussed in Section~\ref{sec:nexttime}, given a particular parametric form for a set of cause-specific hazards, the form of the distribution for the \emph{minimum} of the potential event times, the one that actually happens, won't be available. The main advantage of the mixture model is that it is parameterised in terms of quantities that have an easy, direct interpretation. In the mixture model, we specify a parametric distribution with density $f_{r,s}(|\theta_{r,s})$ (and CDF $F_{r,s}()$) for the time of transition to state $s$ for a person in state $r$, conditionally on this transition being the one that occurs. We have data consisting of state transitions, indexed by $j$, experienced by individuals $i$. This can either be \begin{itemize} \item \emph{an exact transition time:} $\{y_{i,j}, r_{i,j}, s_{i,j}, \delta_{i,j}=1\}$, where a transition to state $s_{i,j}$ is known to occur at a time $y_{i,j}$ after entry to state $r_{i,j}$. \item \emph{right censoring} $\{y_{i,j},r_{i,j},\delta_{i,j}=2\}$, where an individual's follow-up ends while they are in state $r_{i,j}$, at time $y_{i,j}$ after entering this state, thus the next state and the time of transition to it are unknown. \end{itemize} Therefore, for an exact time of transition to a known state $s_{i,j}$, i.e. $\delta_{i,j}=1$, the likelihood contribution is simply \[ l_{i,j} = \pi_{r_{i,j}s_{i,j}} f_{r_{i,j},s_{i,j}}(y_{i,j} | \theta_{r_{i,j},s_{i,j}}) \] For observations $j$ of right-censoring at $y_{i,j}$, the state that the person will move to, and the time of that transition, is unknown. Thus it is unknown which of the distributions $f_{r,s}$ the transition time will obey, and the likelihood contribution is of the form of a mixture model, that sums over the set $\mathcal{S}_r$ of potential next states after $r$: \[ l_{i,j} = \sum_{s \in \mathcal{S}_r} \pi_{r_{i,j}s} (1 - F_{r_{i,j},s}(y_{i,j} | \theta_{r_{i,j},s})) \] \subsection{Data for mixture competing risks models} The required data format for the mixture model is shown for the BOS data in the object \code{bosmx3}. This has \begin{itemize} \item one row for each observed event time \item one row for each ``end of follow-up" time where we know an individual was still at risk of making a transition, but we don't know which transition will occur. \end{itemize} Recall that in the data \code{bosms3} used to implement the cause-specific hazards model, a individual ``censored" at time $t$ contributed a row in the data \emph{for each} of the events that might have occurred after $t$. The difference from \code{bosmx3} is that for these individuals, \code{bosmx3} only has \emph{one} row per ``censored" individual. For example, patient 5 is censored at 11 years, when they were still at risk of either BOS or death. Also we do not include censoring times for events competing with events that were observed, e.g. when patient 4 below got BOS (state 2) at year 2, thus were ``censored" at the same time for the transition from state 1 (no BOS) to state 3 (death). The process of transforming the cause-specific dataset to the mixture dataset is shown below. Multiple censoring records, for different destination states, are condensed to a single censoring record. Each row of \code{bosmx3} then corresponds to a transition. A new column called \code{event} is created, which should be a factor that identifies the terminal event of the transition, which takes the value \code{NA} in cases of censoring where the transition that will happen is unknown. Columns not needed for the mixture model are removed. Informative labels for the factor levels of \code{event} are added to identify the states. <<>>= bosms3[bosms3$id %in% c(4,5),] bosmx3 <- bosms3 bosmx3$Tstart <- bosmx3$Tstop <- bosmx3$trans <- NULL bosmx3 <- bosmx3[!(bosmx3$status==0 & duplicated(paste(bosmx3$id, bosmx3$from))),] bosmx3$event <- ifelse(bosmx3$status==0, NA, bosmx3$to) bosmx3$event <- factor(bosmx3$event, labels=c("BOS","Death")) bosmx3$to <- NULL bosmx3[bosmx3$id %in% c(4,5),] @ \subsection{Fitting mixture competing risks models} The \code{flexsurvmix} function fits a mixture model to competing risks data. To fit a full multi-state mixture model, we fit one mixture competing risks model for each transient state in the multi-state structure. The models are then grouped together (Section~\ref{sec:fmixmsm}) to form the full multi-state model. The mixture model for the event following transplant (BOS or death before BOS, the 1-2 and 1-3 transitions in the multi-state structure) is fitted as follows. A Weibull distribution is fitted for the time to BOS given BOS onset, and an exponential distribution is fitted for the time to death given death before BOS onset. These distributions are specified in the \code{dists} argument. The same distributions as in \code{flexsurvreg} are supported (including custom distributions, in the same way). <<>>= bosfs <- flexsurvmix(Surv(years, status) ~ 1, event=event, data=bosmx3[bosmx3$from==1,], dists = c(BOS="weibull", Death="exponential")) bosfs @ The printed results object shows the two event probabilities $\pi_{rs}$ in the first two rows, and the parameters of the time-to-event distributions in the remaining rows. The maximum likelihood estimates are in column \code{est}, and estimates on a (multinomial) logit or log transformed scale are in column \code{est.t}, with standard errors on the transformed scale in \code{se}. Covariates can be included on the event probabilities through multinomial logistic regression. For illustration, we just simulate some fake covariate data in the variable \code{x}. The log odds ratio for this covariate on the odds of death (with the first category, BOS here, always taken as the baseline) is shown in the row with \code{terms} labelled \code{prob2(x)}. <<>>= set.seed(1) bosmx3$x <- rnorm(nrow(bosmx3),0,1) bosfsp <- flexsurvmix(Surv(years, status) ~ 1, event=event, data=bosmx3[bosmx3$from==1,], dists = c(BOS="weibull", Death="exponential"), pformula = ~ x) bosfsp @ Covariates may also be included on any parameter of any time-to-event distribution, as in \code{flexsurvreg}. If the covariate is named on the right hand side of the \code{Surv} formula in the first argument, then it is included on the location parameter of all distributions. <<>>= bosfst <- flexsurvmix(Surv(years, status) ~ x, event=event, data=bosmx3[bosmx3$from==1,], dists = c(BOS="weibull", Death="exponential")) @ The first argument may be a list of formulae, to indicate that different covariates are included on the location parameter for different events. <<>>= flexsurvmix(list(Surv(years, status) ~ x, Surv(years, status) ~ 1), event=event, data=bosmx3[bosmx3$from==1,], dists = c(BOS="weibull", Death="exponential")) @ An argument \code{anc} is used to supply covariates on ancillary parameters (e.g. shape parameters). This must be a list of length matching the number of competing events, each component being a named list in the format used for the \code{anc} argument in \code{flexsurvreg}. If there are no covariates for a particular component, just specify a null formula on the location parameter (as below, \code{rate=~1}). <<>>= flexsurvmix(Surv(years, status) ~ 1, event=event, data=bosmx3[bosmx3$from==1,], dists = c(BOS="weibull", Death="exponential"), anc = list(BOS=list(shape=~x), Death=list(rate=~1))) @ \subsection{Predictions from mixture competing risks models} A few functions have been supplied to extract estimates and confidence intervals for interpretable quantities, for given covariate values. Here we extract estimates of various quantities for covariate values of \code{x=0} and \code{x=1}. These values are provided in a data frame \code{nd} that will be supplied as the \code{newdata} argument to various extractor functions. The first extractor function \code{probs_flexsurvmix} gives the fitted event probabilities, by event and covariate value. This is applied here to the fitted model \code{bosfsp} that included a covariate on the event probabilities. All these functions take an argument \code{B} which specifies the number of samples to use for calculating bootstrap-like confidence limits --- increase this for more accurate limits. <<>>= nd <- data.frame(x=c(0,1)) probs_flexsurvmix(bosfsp, newdata=nd, B=50) @ \code{mean_flexsurvmix} extracts the mean time to each event conditionally on that event occurring, from the mean of the fitted time-to-event distribution. \code{quantile_flexsurvmix} extracts the quantiles of these distributions, e.g. the interquartile range in this example summarises the variability between individuals for this time. <<>>= mean_flexsurvmix(bosfst, newdata=nd) quantile_flexsurvmix(bosfst, B=50, newdata=nd, probs=c(0.25, 0.5, 0.75)) @ \code{p_flexsurvmix} extracts the probability of being in each of the states at a time $t$ in the future, shown for $t=5,10$ here. The \code{startname} argument supplies an informative name to the ``starting" state that the model \code{bosfst} describes transitions from. <<>>= p_flexsurvmix(bosfst, t=c(5, 10), B=10, startname="No BOS") @ \subsection{Forming a mixture multi-state model} \label{sec:fmixmsm} We supplement the mixture model for the event following state 1 (no BOS) with a second model \code{bosfst_bos} for the events following BOS (the other transient state in the multi-state model). Although this is fitted with \code{flexsurvmix}, there is only one potential next state after BOS, which is death, so internally this just fits a standard survival model using \code{flexsurvreg}. The two mixture models are then coupled together using the \code{fmixmsm} function, which returns an object containing the entire multi-state model, here named \code{bosfmix}. This object contains attributes listing the potential pathways that an individual can take through the states. These pathways are identified automatically by naming the arguments to \code{fmixmsm} with the label of the state that each model describes transitions from --- here the first model \code{bosfst} describes transitions from ``No BOS" and the second describes transitions from BOS. (Note we redefine the \code{event} factor so that empty levels are dropped) <<>>= bdat <- bosmx3[bosmx3$from==2,] bdat$event <- factor(bdat$event) bosfst_bos <- flexsurvmix(Surv(years, status) ~ x, event=event, data=bdat, dists = c(Death="exponential")) bosfmix <- fmixmsm("No BOS"=bosfst, "BOS"=bosfst_bos) @ Now we can predict outcomes from the full multi-state model. All these functions work by simulating a large population of individuals from the model, by default \code{M=10000}, so increase this for more accuracy. A cut-off time \code{t=1000} is also applied to the simulated data that assumes people have all reached the absorbing state by this time -- increase this if necessary. These functions currently require models to have at least one absorbing state, and no ``cycles" in their transition structure. The function \code{ppath_fmixmsm} returns the probability of following each of the potential pathways through the model. For this example, they are the same for both covariate values \code{x=0} and \code{x=1}. Setting \code{final=TRUE} aggregates the pathway probabilities by the final (absorbing) state, returning the probability of ending in each of the potential final states, which is trivially 1 in this case for the single absorbing state of death. <<>>= ppath_fmixmsm(bosfmix, B=20) ppath_fmixmsm(bosfmix, B=20, final=TRUE) @ To estimate the mean time to the final state for individuals following each pathway, use \code{meanfinal_fmixmsm}, and for the quantiles of the distribution of this time over individuals (by default the median and a 95\% interval), use \code{qfinal_fmixmsm}. Supplying \code{final=TRUE} in these functions summarises the mean time to the final state, averaged over all potential pathways (according to the probability of following those pathways). Again, covariate values can be supplied in the \code{newdata} argument if needed. <<>>= meanfinal_fmixmsm(bosfmix, B=10) qfinal_fmixmsm(bosfmix, B=10) meanfinal_fmixmsm(bosfmix, B=10, final=TRUE) @ \subsection{Goodness of fit checking} The fit of mixture competing risks models can be checked by comparing predictions of the state occupancy probabilities (as returned by \code{p_flexsurvmix}) with nonparametric estimates of these quantities \citep{aalen:johansen}. This is only supported for models with no covariates or only factor covariates (where subgroup-specific predictions are compared with stratified nonparametric estimates). The function \code{ajfit_flexsurvmix} derives these estimates from both the mixture model and the Aalen-Johansen method, and collects them in a tidy data frame ready for plotting, e.g. with the \pkg{ggplot2} package. The mixture models fit nicely here. A \code{start} argument is supplied to give an informative label to the starting state in the plots. Note we are plotting not probability of \emph{being in} a state at time $t$, but the probability of \emph{having made} the respective transition --- here we are checking the competing risks submodels one at time. <>= aj <- ajfit_flexsurvmix(bosfs, start="No BOS") bosfs_bos <- flexsurvmix(Surv(years, status) ~ 1, event=event, data=bdat, dists = c(Death="exponential")) aj2 <- ajfit_flexsurvmix(bosfs_bos, start="BOS") library(ggplot2) ggplot(aj, aes(x=time, y=val, lty=model, col=state)) + geom_line() + xlab("Years after transplant") + ylab("Probability of having moved to state") ggplot(aj2, aes(x=time, y=val, lty=model, col=state)) + xlab("Years after transplant") + ylab("Probability of having moved to state") + geom_line() @ Checking against Aalen-Johansen estimates is also supported for cause-specific hazards models that are grouped together using \code{fmsm}. Thus we can compare cause-specific hazards and mixture models that have been fitted to the same data. We do this here for the \code{fmsm} object \code{crfs_nobos} created in Section~\ref{sec:nextstate}. <>= library(dplyr) aj3 <- ajfit_fmsm(crfs_nobos) %>% filter(model=="Parametric") %>% mutate(model = "Cause specific hazards") %>% mutate(state = factor(state, labels=c("No BOS","BOS","Death"))) %>% full_join(aj) ggplot(aj3, aes(x=time, y=val, lty=model, col=state)) + xlab("Years after transplant") + ylab("Probability of having moved to state") + geom_line() @ \section{Comparison of frameworks for parametric multi-state modelling} As the mixture model is described by the probabilities of observable events and the times to those events, it is somewhat easier to interpret then the cause-specific hazards model. While those quantities can be computed under the cause-specific hazards model, they require potentially-expensive simulation. An advantage of the cause-specific hazards model is that it tends to be faster to \emph{fit}, if the transition-specific models are fitted independently. Another competing-risks modelling framework not implemented here is typically referred to as \emph{subdistribution} hazard modelling --- essentially covariate effects are applied to the probabilities of occupying different states at time $t$, rather than to the cause-specific hazards. This leads to covariate effects that are easier to interpret compared to, e.g. cause-specific hazard ratios. The method has been implemented semiparametrically~\citep{fine1999proportional} and parametrically~\citep{lambert2017flexible}. A further framework referred to as ``vertical" modelling \citep{nicolaie2010vertical} involves factorising the joint distribution of events and event times as $P(time)P(event|time)$, whereas the mixture modelling framework uses $P(event)P(time|event)$. \bibliography{flexsurv} \end{document}