---
title: "policy_eval"
output:
rmarkdown::html_vignette:
fig_caption: true
toc: true
toc_depth: 2
vignette: >
%\VignetteIndexEntry{policy_eval}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: ref.bib
---
```{r lib}
library(polle)
```
This vignette is a guide to `policy_eval()` and some of the associated S3 methods.
The purpose of `policy_eval` is to estimate (evaluate) the value of a user-defined policy or a policy learning algorithm.
For details on the methodology, see the associated paper [@nordland2023policy].
We consider a fixed two-stage problem as a general setup and simulate data using `sim_two_stage()` and create a `policy_data` object using `policy_data()`:
```{r simdata}
d <- sim_two_stage(n = 2e3, seed = 1)
pd <- policy_data(d,
action = c("A_1", "A_2"),
baseline = c("B", "BB"),
covariates = list(L = c("L_1", "L_2"),
C = c("C_1", "C_2")),
utility = c("U_1", "U_2", "U_3"))
pd
```
## Evaluating a user-defined policy
User-defined policies are created using `policy_def()`. In this case we define a simple static policy always
selecting action `'1'`:
```{r poldef}
p1 <- policy_def(policy_functions = '1', reuse = TRUE, name = "(A=1)")
```
As we want to apply the same policy function at both stages we set `reuse = TRUE`.
`policy_eval()` implements three types of policy evaluations: Inverse
probability weighting estimation, outcome regression estimation, and doubly
robust (DR) estimation. As doubly robust estimation is a combination of the two
other types, we focus on this approach. For details on the implementation see
Algorithm 1 in [@nordland2023policy].
```{r polevalp1}
(pe1 <- policy_eval(policy_data = pd,
policy = p1,
type = "dr"))
```
`policy_eval()` returns an object of type `policy_eval` which prints like a `lava::estimate` object. The policy value estimate and variance are available via `coef()` and `vcov()`:
```{r coef}
coef(pe1)
vcov(pe1)
```
### Working with `policy_eval` objects
The `policy_eval` object behaves like an `lava::estimate` object, which can also be directly accessed using `estimate()`.
`estimate` objects makes it easy to work with estimates with an iid decomposition given by the influence curve/function, see the [estimate vignette](https://CRAN.R-project.org/package=lava/vignettes/influencefunction.html).
The influence curve is available via `IC()`:
```{r ic}
IC(pe1) |> head()
```
Merging `estimate` objects allow the user to get inference for transformations of the estimates via the Delta method.
Here we get inference for the average treatment effect, both as a difference and as a ratio:
```{r estimate_merge}
p0 <- policy_def(policy_functions = 0, reuse = TRUE, name = "(A=0)")
pe0 <- policy_eval(policy_data = pd,
policy = p0,
type = "dr")
(est <- merge(pe0, pe1))
estimate(est, function(x) x[2]-x[1], labels="ATE-difference")
estimate(est, function(x) x[2]/x[1], labels="ATE-ratio")
```
### Nuisance models
So far we have relied on the default generalized linear models for the nuisance g-models and Q-models.
As default, a single g-model trained across all stages using the state/Markov type history, see the
`policy_data` vignette. Use `get_g_functions()` to get access to the
fitted model:
```{r get_nuisance}
(gf <- get_g_functions(pe1))
```
The g-functions can be used as input to a new policy evaluation:
```{r pe_pol}
policy_eval(policy_data = pd,
g_functions = gf,
policy = p0,
type = "dr")
```
or we can get the associated predicted values:
```{r pred_gfun}
predict(gf, new_policy_data = pd) |> head(6)
```
Similarly, we can inspect the Q-functions using `get_q_functions()`:
```{r get_qfun}
get_q_functions(pe1)
```
Note that a model is trained for each stage. Again, we can predict from the Q-models using `predict()`.
Usually, we want to specify the nuisance models ourselves using the `g_models` and `q_models` arguments:
```{r gq_models, cache = TRUE}
pe1 <- policy_eval(pd,
policy = p1,
g_models = list(
g_sl(formula = ~ BB + L_1, SL.library = c("SL.glm", "SL.ranger")),
g_sl(formula = ~ BB + L_1 + C_2, SL.library = c("SL.glm", "SL.ranger"))
),
g_full_history = TRUE,
q_models = list(
q_glm(formula = ~ A * (B + C_1)), # including action interactions
q_glm(formula = ~ A * (B + C_1 + C_2)) # including action interactions
),
q_full_history = TRUE)
```
Here we train a super learner g-model for each stage using the full available history and a generalized linear model for the Q-models. The `formula` argument is used to construct the model frame passed to the model for training (and prediction).
The valid formula terms depending on `g_full_history` and `q_full_history` are available via `get_history_names()`:
```{r history_names}
get_history_names(pd) # state/Markov history
get_history_names(pd, stage = 1) # full history
get_history_names(pd, stage = 2) # full history
```
Remember that the action variable at the current stage is always named `A`. Some models like `glm` require interactions to be specified via the model frame. Thus, for some models, it is important to include action interaction terms for the Q-models.
## Evaluating a policy learning algorithm
The value of a learned policy is an important performance measure, and `policy_eval()` allow for
direct evaluation of a given policy learning algorithm. For details, see Algorithm 4 in [@nordland2023policy].
In `polle`, policy learning algorithms are specified using `policy_learn()`, see the associated vignette. These functions can be directly evaluated in `policy_eval()`:
```{r pe_pl}
policy_eval(pd,
policy_learn = policy_learn(type = "ql"))
```
In the above example we evaluate the policy estimated via Q-learning. Alternatively,
we can first learn the policy and then pass it to `policy_eval()`:
```{r pe_manual_pol_eval}
p_ql <- policy_learn(type = "ql")(pd, q_models = q_glm())
policy_eval(pd,
policy = get_policy(p_ql))
```
## Cross-fitting
A key feature of `policy_eval()` is that it allows for easy cross-fitting of the nuisance models as well the learned policy.
Here we specify two-fold cross-fitting via the `M` argument:
```{r crossfits}
pe_cf <- policy_eval(pd,
policy_learn = policy_learn(type = "ql"),
M = 2)
```
Specifically, both the nuisance models and the optimal policy are fitted on each training fold.
Subsequently, the doubly robust value score is calculated on the validation folds.
The `policy_eval` object now consists of a list of `policy_eval` objects associated with each fold:
```{r pe_crossfits}
pe_cf$folds$fold_1 |> head()
pe_cf$cross_fits$fold_1
```
In order to save memory, particularly when cross-fitting, it is possible not to save the nuisance models via the
`save_g_functions` and `save_q_functions` arguments.
## Parallel processing via `future.apply`
It is easy to parallelize the cross-fitting procedure via the `future.apply` package:
```{r demo_future, eval = FALSE}
library(future.apply)
plan("multisession") # local parallel procession
library("progressr") # progress bar
handlers(global = TRUE)
policy_eval(pd,
policy_learn = policy_learn(type = "ql"),
q_models = q_rf(),
M = 20)
plan("sequential") # resetting to sequential processing
```
# SessionInfo
```{r sessionInfo}
sessionInfo()
```
# References