Plan Sample Size

library(Keng)

The significance of the unique effect of one or a set of predictors in the regression model is determined by the PRE (Proportional Reduction in Error, also called partial eta_squared in ANOVA, or partial R_squared in regression), number of parameters in the regression model, and sample size. As a result, given PRE, number of parameters in the regression model, and expected statistical power, we can plan the sample size for one or a set of predictors to reach the expected statistical power (usually 0.80) and the expected significance level (usually 0.05). So, power_lm() comes.

power_lm()

To plan sample size for one or a set of predictors, the following information are needed and passed to power_lm() as arguments:

Note that power_lm() follows Aberson’s (2019), and the planed sample size is more conservative than other statistical software like G*power. However, the difference is small and negligible.

Application

Given that regression analysis is equivalent to t-test and ANOVA, power_lm() could plan the sample size for perhaps all common research designs.

A set of predictors

You may be interested in the power and required sample size of the full regression model, you could treat all predictors as a set. Suppose your regression model has m predictors, in this case the Model C is the intercept-only model, hence PC = 1, and PA = m + 1.

A continuous predictor

You may be interested in the power and required sample size of one continuous predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.

Moderation model

You may be interested in the two-way moderation model. In the two-way moderation model, the focal predictor is actually the two-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.

You may be interested in the three-way moderation model. In the three-way moderation model, the focal predictors are two two-way interaction terms and one three-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 3.

One-sample t -test or a intercept-only regression

If you are interested in the difference between the mean of one group and 0, you may turn to the one-sample t -test. Or, you could establish a intercept-only model. Then the focal parameter is the intercept. In this case PA = 1, PC = 0.

Note that in this case you must use the CORRECT PRE to yield correct power and planned sample size. Do not compute PRE from Cohen’s one-sample d ; instead, compute PRE from the t value of the one-sample t -test. You could also compute the correct PRE using compare_lm() function.

Two-sample t -test or a binary predictor

If you are interested in the difference between two groups (e.g., experimental vs control), you may turn to the t -test. Or, you could treat the group variable as a binary predictor and conduct regression analysis. Then the focal predictor is the binary group predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.

ANOVA or A multicategorical predictor

If you are interested in the difference between multiple groups, you may turn to ANOVA. Or, you could treat the group variable as a multicategorical independent variable. Then you could code it using a coding schema like dummy coding. No matter which coding schema you use, for a multicategorical independent variable with j levels, it should be coded into (j - 1) predictors, which are the set of focal predictors. Suppose your regression model has m predictors, among which there are (j - 1) codes, in this case PA = m + 1, PC = (m + 1) - (j - 1).

ANOVA concerning repeated measures

If you are interested in the outcome that were repeatedly measured, you may turn to repeated-measures-ANOVA. In essence, repeated-measures-ANOVA computes the difference score of interest (contrasts), and then conduct between-factor ANOVA. Similarly, you could compute the difference score of interest, and then conduct regression analysis.

A special case is there is no between-subject factor. Under this circumstance, treat the difference score as the outcome and establish a intercept-only model like one-sample t -test.

Reference

Aberson, C. L. (2019). Applied power analysis for the behavioral sciences. Routledge.