--- title: "Get Started" output: rmarkdown::html_vignette bibliography: references.bib vignette: > %\VignetteIndexEntry{Get Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This vignette details how you can set up and execute a basic power analysis for a bivariate random intercept cross-lagged panel model (RI-CLPM) using the `powRICLPM` package. Throughout, an illustrating example will be used in which we wish to detect a small cross-lagged effect $\beta_{2}$ (defined here as the effect of $a_{1}^{*}$ to $b_{2}^{*}$, where $a_{1}^{*}$ and $b_{2}^{*}$ denote the latent within-unit components of $a_{1}$ and $b_{2}$, respectively) of 0.2 (standardized). For the design of our power analysis we follow the steps in the strategy as described in @mulder_power_2023. Various extensions are available for this basic power analysis, and are described in the Vignette [Extensions](https://jeroendmulder.github.io/powRICLPM/articles/extensions.html). # Step 1: Determine experimental conditions Before performing the power analysis, you must first determine the *experimental conditions* of interest. Experimental conditions (or: simulation conditions) are defined by characteristics of the study design that can impact statistical power. This includes, among others, characteristics like the *sample size* and the *number of repeated measures*. Decide on the number of repeated measures that will be used in the simulations, as well as the range of sample sizes over which you want to simulate the power. For this example, we take a sample size range from 100 to 1000 first, increasing with steps of 100. Let the numbers of repeated measures range from 3 to 5. If these experimental conditions do not lead to the desired amount of power for detecting the small cross-lagged effect, the ranges can be extended later. # Step 2: Choose population parameter values Next, determine population parameter values for generating data from the RI-CLPM. This requires the specification of: - `Phi`: Standardized autoregressive and cross-lagged effects for the within-unit components of the model. These values are collected in a matrix, *with columns representing predictors and rows representing outcomes*. - `within_cor`: A correlation for the within-unit components. - `ICC`: The proportion of variance at the between-unit level (relative to the total variance). - `RI_cor`: The correlation between the random intercepts. For our example, the parameter values are set to: ```{r step2} Phi <- matrix(c(.4, .1, .2, .3), ncol = 2, byrow = T) # The .2 refers to our standardized cross-lagged effect of interest within_cor <- 0.3 ICC <- 0.5 RI_cor <- 0.3 ``` If you are unsure if you have specified the `Phi` matrix as intended, you can use the `check_Phi()` function to give you a summary of how the effects in your `Phi` are interpreted. ```{r setup, message=FALSE} library(powRICLPM) ``` ```{r step2-check} # Check `Phi` argument check_Phi(Phi) ``` # Steps 3-5: Perform the power analysis Steps 3 to 5 are automated by the `powRICLPM()` function. As input, you must provide: - the desired power level using the `target_power` argument, - the range of sample sizes to simulate the power for using the `search_lower`, `search_upper`, and `search_step` arguments (alternatively, you can specify this directly by providing a vector of sample sizes to the `sample_size` argument), - the number of time points for the simulated data using the `time_points` argument, - the population values `Phi`, `within_cor`, `ICC`, and `RI_cor`, and - the number of Monte Carlo replications we want to perform per experimental condition in the `reps` argument. You can optionally specify: - `alpha`: A numeric value denoting the significance criterion (default: 0.05). - `seed`: An integer to control the starting point of the random number generator. This is important to use if you want to replicate the results. When no seed it specified, a random seed will be generated and reported back to you. Options to extend this basic power analysis setup are described in the Vignette [Extensions](https://jeroendmulder.github.io/powRICLPM/articles/extensions.html) Now, we can perform the power analysis by running: ```{r analysis, eval = F} # Set number of replications n_reps <- 100 output <- powRICLPM( target_power = 0.8, search_lower = 500, search_upper = 1000, search_step = 50, time_points = c(3, 4), ICC = ICC, RI_cor = RI_cor, Phi = Phi, within_cor = 0.3, reps = n_reps ) ``` ## Parallel processing using `future` Performing a Monte Carlo power analysis with a large number of replications, and across multiple experimental conditions can be time-consuming. To speed up the process, it is recommended to perform the power analysis *across simulation conditions* in parallel (i.e., on multiple cores). To this end, the `powRICLPM()` function has implemented `future`’s parallel processing capabilities. Load the `future` package, and use its `plan()` function to change the power analysis execution from *sequential* (i.e., single-core, the default), to *multisession* (i.e., multicore). Use the `workers` argument to specify how many cores you want to use. Next, run the `powRICLPM` analysis, and the power analysis will run on the specified number of cores. This can result in a significant reduction of computing time. For more information on other parallel execution strategies using futures, see `?future::plan()`. ## Progress bar using `progressr` It can be useful to get an approximation of the progress of the `powRICLPM` analysis while running the code, especially when running the analysis in parallel. `powRICLPM()` has implemented progress notifications using the `progressr` package. Simply put, there are two options through which you can get progress notifications: - You can subscribe to progress updates from a specific expression by wrapping this expression with `with_progress({...})`. - You can subscribe to progress updates from everywhere by running `handlers(global = T)`. Implementing the `with_progress({...})` option, as well as parallel execution of the `powRICLPM` analysis, results in the below code for the example: ```{r future-setup, eval = F} # Load `future` and `progressr` packages library(future) library(progressr) # Check how many cores are available future::availableCores() # Plan powRICLPM analysis to run on 1 core less than number of available cores plan(multisession, workers = 7) # For the case of 8 available cores # Run the powRICLPM analysis with_progress({ # Subscribe to progress updates output <- powRICLPM( target_power = 0.8, search_lower = 500, search_upper = 1000, search_step = 50, time_points = c(3, 4), ICC = ICC, RI_cor = RI_cor, Phi = Phi, within_cor = 0.3, reps = n_reps ) }) # Revert back to sequential execution of code plan(sequential) ``` For more information about progress notification options using `progressr` for end-users, including auditory and email updates, see [https://progressr.futureverse.org](https://progressr.futureverse.org). # Step 6: Summarize results The `powRICLPM()` function creates a `powRICLPM` object: A list with results, upon which we can call `print()`, `summary()`, `give()`, and `plot()` functions to print, summarize, extract results, and visualize the results, respectively. `print()` outputs a textual summary of the power analysis design contained within the object it was called upon. It does not output any performance metrics computed by the power analysis. `summary()` can be used in one of four ways. First, summary can be used simply like `print()` to get information about the design of the power analysis (the different experimental conditions), as well as the number of problems the occurred per condition (e.g., non-convergence, fatal estimation errors, or inadmissible results). Second, by specifying the `parameter = "..."` argument in `summary()`, the function will print the results specifically for that parameter across all experimental conditions. Third, if you specify a specific experimental condition using `summary()`'s `sample_size`, `time_points`, `ICC` and `reliability` arguments, performance measures are outputted for all parameters in that experimental condition. The interpretation of the various performance measures available is explained in the function documentation [`?summary.powRICLPM()`](https://jeroendmulder.github.io/powRICLPM/reference/summary.powRICLPM.html). ```{r summary, eval = F} # Summary of study design summary(output) # Summary of results for a specific parameter, across simulation conditions summary(output, parameter = "wB2~wA1") # Summary of all parameter for a specific simulation condition summary(output, sample_size = 500, time_points = 4, ICC = 0.5, reliability = 1) ``` `give()` extracts various bits of information from an `powRICLPM` object. The exact information to be extracted is given by the `what = "..."` argument: 1. `what = "conditions"` gives the different experimental conditions per row, where each condition is defined by a unique combination of sample size, number of time points and ICC. 2. `what = "estimation_problems"` gives the proportion of fatal errors, inadmissible values, or non-converged estimations (columns) per experimental conditions (row). 3. `what = "results"` gives the average estimate `average`, minimum estimate `minimum`, standard deviation of parameter estimates `SD`, the average standard error `SEavg`, the mean square error `MSE`, the average width of the confidence interval `accuracy`, the coverage rate `coverage`, and the proportion of times the *p*-value was lower than the significance criterion `power`. It requires setting the `parameter = "..."` argument. 4. `what = "names"` gives the parameter names contained within the `powRICLPM` object. ```{r give, eval = F} # Extract experimental conditions give(output, what = "conditions") # Extract estimation problems give(output, what = "estimation_problems") # Extract results for cross-lagged effect "wB2~wA1" give(output, what = "results", parameter = "wB2~wA1") # Extract parameter names give(output, what = "names") ``` Finally, `plot()` creates a `ggplot2`-plot for a specific parameter (specified using the `parameter = "..."` argument) with sample size on the x-axis, the simulated power on the y-axis, lines grouped by number of time-points, and plots wrapped by proportion of between-unit variance. `plot()` returns a `ggplot2` object that can be fully customized using `ggplot2` functionality. For example, you can change the scales, add titles, change geoms, etc. More information about options in the `ggplot2` framework can be found at [https://ggplot2-book.org/index.html](https://ggplot2-book.org/index.html). In the below example, I add a title and change the labels on the x-axis: ```{r plot, eval = FALSE} # Create basic plot of powRICLPM object p <- plot(output, parameter = "wB2~wA1") p # Adjust plot aesthetics p2 <- p + ggplot2::labs( title = "Power analysis for RI-CLPM", caption = paste0("Based on ", n_reps, " replications.") ) + ggplot2::scale_color_discrete("Time points") + ggplot2::guides( color = ggplot2::guide_legend(title = "Time points", nrow = 1), shape = ggplot2::guide_legend(title = "Reliability", nrow = 1), fill = "none" ) + ggplot2::scale_x_continuous( name = "Sample size", breaks = seq(500, 1000, 50), guide = ggplot2::guide_axis(n.dodge = 2) ) p2 ``` # References