CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R

Maintainer:	Ben Bolker, Julia Piaskowski, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer
Contact:	bolker at mcmaster.ca
Version:	2024-05-08
URL:	https://CRAN.R-project.org/view=MixedModels
Source:	https://github.com/cran-task-views/MixedModels/
Contributions:	Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
Citation:	Ben Bolker, Julia Piaskowski, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer (2024). CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R. Version 2024-05-08. URL https://CRAN.R-project.org/view=MixedModels.
Installation:	The packages from this task view can be installed automatically using the ctv package. For example, `ctv::install.views("MixedModels", coreOnly = TRUE)` installs all the core packages or `ctv::update.views("MixedModels")` installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.

Contributors: Maintainers plus Michael Agronah, Matthew Fidler, Thierry Onkelinx

Mixed (or mixed-effect) models are a broad class of statistical models used to analyze data where observations can be assigned a priori to discrete groups, and where the parameters describing the differences between groups are treated as random (or latent) variables. They are one category of multilevel, or hierarchical models; longitudinal data are often analyzed in this framework. In econometrics, longitudinal or cross-sectional time series data are often referred to as panel data and are sometimes fitted with mixed models. Mixed models can be fitted in either frequentist or Bayesian frameworks.

This task view only includes models that incorporate continuous (usually although not always Gaussian) latent variables. This excludes packages that handle hidden Markov models, latent Markov models, and finite (discrete) mixture models (some of these are covered by the Cluster task view). Dynamic linear models and other state-space models that do not incorporate a discrete grouping variable are also excluded (some of these are covered by the TimeSeries task view). Bioinformatic applications of mixed models hosted on Bioconductor are mostly excluded as well.

Basic model fitting

Linear mixed models

Linear mixed models (LMMs) make the following assumptions:

The expected values of the responses are linear combinations of the fixed predictor variables and the random effects.
The conditional distribution of the responses is Gaussian (equivalently, the errors are Gaussian).
The random effects are normally distributed.

Frequentist:

The most commonly used packages and/or functions for frequentist LMMs are:

nlme: nlme::lme() provides REML or ML estimation. Allows multiple nested random effects, and provides structures for modeling heteroscedastic and/or correlated errors. Wald estimates of parameter uncertainty.
lme4: lmer4::lmer() provides REML or ML estimation. Allows multiple nested or crossed random effects, can compute profile confidence intervals and conduct parametric bootstrapping.
mbest: fits large nested LMMs using a fast moment-based approach.

Bayesian:

Most Bayesian R packages use Markov chain Monte Carlo (MCMC) estimation: MCMCglmm, rstanarm, and brms; the latter two packages use the Stan infrastructure. blme, built on lme4, uses maximum a posteriori (MAP) estimation. bamlss provides a flexible set of modular functions for Bayesian regression modeling.

Generalized linear mixed models

Generalized linear mixed models (GLMMs) can be described as hierarchical extensions of generalized linear models (GLMs), or as extensions of LMMs to different response distributions, typically in the exponential family. The random-effect distributions are typically assumed to be Gaussian on the scale of the linear predictor.

Frequentist:

MASS: MASS::glmmPQL() fits via penalized quasi-likelihood.
lme4: lme4::glmer() uses Laplace approximation and adaptive Gauss-Hermite quadrature; fits negative binomial as well as exponential-family models.
glmmTMB uses Laplace approximation; allows some correlation structures; fits some non-exponential families (Beta, COM-Poisson, etc.) and zero-inflated/hurdle models.
GLMMadaptive uses adaptive Gauss-Hermite quadrature; fits exponential family, negative binomial, beta, zero-inflated/hurdle/censored Gaussian models, user-specified log-densities.
hglm fits hierarchical GLMs using h-likelihood (sensu Nelder, Lee and Pawitan (2017)
glmm fits GLMMs using Monte Carlo likelihood approximation.
glmmEP fits probit mixed models for binary data by expectation propagation.
mbest fits large nested GLMMs using a fast moment-based approach.
galamm fits a wide variety of models (heteroscedastic, mixed response types, factor loadings, etc.)
glmmrBase uses MCMC and Laplace approximations to Gaussian, binomial, Poisson, Beta, Gamma responses with flexible correlation structures

Bayesian:

Most Bayesian mixed model packages use some form of Markov chain Monte Carlo (or other Monte Carlo methods).

MCMCglmm: Gibbs sampling. Exponential family, multinomial, ordinal, zero-inflated/altered/hurdle, censored, multimembership, multi-response models. Pedigree (animal/kinship/phylogenetic) models.
rstanarm Hamiltonian Monte Carlo (based on Stan); designed for lme4 compatibility.
brms: Hamilton Monte Carlo. Linear, robust linear, count data, survival, response times, ordinal, zero-inflated/hurdle/censored data.
bamlss: optimization and derivative-based Metropolis-Hastings/slice sampling. Wide range of distributions and link functions.

The following packages (in addition to bamlss) find maximum a posteriori fits to Bayesian (G)LMMs by optimization:

blme wraps lme4 to add prior distributions.
INLA uses integrated nested Laplace approximation to fit GLMMs using a wide range of latent models (especially for spatial estimation), priors, and distributions.
- inlabru facilitates spatial modeling using integrated nested Laplace approximation via the R-INLA package. Additionally, extends the GAM-like model class to more general nonlinear predictor expressions and implements a log-Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data.
- inlatools provides tools to set sensible priors and check the dispersion and distribution of INLA models.

vglmer estimates GLMMs by variational Bayesian methods.

Nonlinear mixed models

Nonlinear mixed models incorporate arbitrary nonlinear responses that cannot be accommodated in the framework of GLMMs. Only a few packages can accommodate generalized nonlinear mixed models (i.e., parametric nonlinear mixed models with non-Gaussian responses). However, many packages allow smooth nonparametric components (see “Additive models” below). Otherwise, users may need to implement GNLMMs themselves in a more general hierarchical modeling framework.

Frequentist:

nlme::nlme() from nlme and lmer4::nlmer() from lme4 fit nonlinear mixed effects models by maximum likelihood.
nlmixr2::nlmixr2() from nlmixr2 fits nonlinear mixed effects model by first order conditional estimation (focei) maximum likelihood approximation (a different approximation than nlme:nlme() and lmer4:nlmer()), and allows generalized likelihood as well as a selection of built-in link functions.
gnlmm() and gnlmm3() from repeated fit GNLMMs by Gauss-Hermite integration.
saemix and nlmixr2 both use a stochastic approximation of the EM algorithm to fit a wide range of GNLMMs.

Bayesian:

brms supports GNLMMs.

Generalized estimating equations

General estimating equations (GEEs) are an alternative approach to fitting clustered, longitudinal, or otherwise correlated data. These models produce estimates of the marginal effects (averaged across the group-level variation) rather than conditional effects (conditioned on group-level information).

geepack, gee, and geeM are standard GEE solvers, providing GEE estimation of the parameters in mean structures with possible correlation between the outcomes.
geesmv: GEE estimator using the original sandwich variance estimator proposed by Liang and Zeger (1986), and eight types of variance estimators for improving the finite small-sample performance.
multgee is a GEE solver for correlated nominal or ordinal multinomial responses.
glmtoolbox handles a wide variety of model types (GLMs, beta-binomial and negative binomial, zero-inflation and zero-alteration, mixed models) via GEEs

Specialized models/tasks

Additive models (models incorporating smooth functional components such as regression splines or Gaussian processes; also known as semiparametric models): gamm4, mgcv, brms, lmeSplines, bamlss, gamlss, LMMsolver, R2BayesX, GLMMRR, glmmTMB, galamm.
Big data/distributed computation: lmmpar, mbest. See also MixedModels.jl (Julia), diamond (Python).
Bioinformatics/quantitative genetics: MCMC.qpcr, QGglmm, CpGassoc (methylation studies).
Censored data (response data known only up to lower/upper bounds): brms and nlmixr2 (general), ARpLMEC (censored Gaussian, autoregressive errors). Censored Gaussian (Tobit) responses: GLMMadaptive, MCMCglmm, gamlss.
Denominator degree-of-freedom computation: Satterthwaite and/or Kenward-Roger corrections are computed by lmerTest, pbkrtest, glmmrBase
Differential equations (fitting DEs with group-structured parameters; this category overlaps considerably with pharmacokinetic modeling): mixedsde for stochastic DEs. Ordinary DEs can be run with nlmixr2 using the “focei” or “saem” (EM) methods, or using the nlme package; see also the DifferentialEquations and Pharmacokinetics task views.
Doubly hierarchical GLMs: dhglm, mdhglm (multivariate)
Factor analytic, latent variable, and structural equation models: lavaan, nlmm,sem, piecewiseSEM, semtree, and blavaan; see also the Psychometrics task view.
Flexible correlation structures: brms, glmmTMB, sommer, glmmrBase, regress
Kinship-augmented models (responses where individuals have a known family relationship): pedigreemm, coxme, kinship2, LMMsolver, MCMCglmm, sommer, rrBLUP, BGLR, lme4GS, lme4qtl, cpgen, QTLRel.
Location-scale models: nlme, glmmTMB, brms, mgcv [with family chosen from one of the *ls/*lss options] all allow modeling of the dispersion/scale component.
Missing values: mice, CRTgeeDR, JointAI, mdmb, pan; see also the MissingData task view.
Multiple membership models: (Bayesian) MCMCglmm, brms, rmm; (frequentist) lmerMultiMember (can also fit the Bradley-Terry model)
Multinomial responses: bamlss, R2BayesX, MCMCglmm, mgcv, mclogit.
Multivariate responses/multi-trait analysis: (multiple dependent variables; the response variables may or may not be constrained to be from the same family) MCMCglmm, MegaLMM, brms, sommer, INLA. Many mixed-effect packages allow fitting of (homogeneous) multivariate responses by “melting” the data (converting to long format) and treating each observation in the original data as a cluster.
Non-Gaussian random effects: brms, repeated, spaMM.
Ordinal-valued responses (responses measured on an ordinal scale): ordinal, GLMMadaptive, multgee (frequentist); MCMCglmm, brms (Bayesian), cplm (both)
Over-dispersed models: aod, aods3.
Panel data: in econometrics, panel data typically refers to subjects (individuals or firms) that are sampled repeatedly over time. The theoretical and computational approaches used by econometricians overlap with mixed models (e.g., see here). The plm package can fit mixed-effects panel models; see also the Econometrics task view.
Quantile regression: lqmm, qrLMM, qrNLMM.
Phylogenetic models: pez, phyr, MCMCglmm, brms.
Repeated measures: (packages with specialized covariance structures for handling repeated measures) nlme, mmrm, glmmTMB, LMMsolver, repeated, mmrm
Regularized/penalized models (regularization or variable selection by ridge, lasso, or elastic net penalties): splmm fits LMMs for high-dimensional data by imposing penalty on both the fixed effects and random effects for variable selection. glmmLasso fits GLMMs with L1-penalized (LASSO) fixed effects. bamlss implements LASSO-like penalization for generalized additive models.
Robust/heavy-tailed estimation (downweighting the importance of extreme observations): robustlmm, robustBLME (Bayesian robust LME), CRTgeeDR for the doubly robust inverse probability weighted augmented GEE estimator. Some packages (brms, bamlss, GLMMadaptive, glmmTMB, mgcv with family = "scat", nlmixr2) allow heavy-tailed response distributions such as Student-t.
Skewed data/response transformation: skewlmm fits a scale mixture of skew-normal linear mixed models using expectation-maximization (EM). nlmixr2 can fit skewed data with dynamic transform of both sides with both coxBox() and yeoJohnson() transformations with maximum likelihood or the EM method “saem”. bcmixed fits Box-Cox-transformed LMMs and provides inferences for differences between treatment levels. boxcoxmix fits Box-Cox transformed LMMs and logistic mixed models.
Spatial models: nlme (with corStruct functions), CARBayesST, sphet, spind, spaMM, glmmfields, glmmTMB, inlabru (spatial point processes via log-Gaussian Cox processes), brms, LMMsolver, bamlss; see also the Spatial and SpatioTemporal CRAN task views.
Sports analytics: mvglmmRank, multivariate generalized linear mixed models for ranking sports teams.
Survival analysis: coxme.
Tree-based models: glmertree, semtree, gpboost
Weighted models: WeMix (linear and logit models with weights at multiple levels)
Zero-inflated models: (frequentist) glmmTMB, cplm, mgcv (zi Poisson only), GLMMadaptive; (Bayesian): MCMCglmm, brms, bamlss.
Zero-one inflated Beta regression: brms, zoib, glmmTMB (zero-inflated only). Ordered beta regression is an alternative framework to address the same type of data: ordbetareg, glmmTMB

Hierarchical modeling frameworks

These packages do not directly provide functions to fit mixed models, but instead implement interfaces to general-purpose sampling and optimization toolboxes that can be used to fit mixed models. While models require extra effort to set up, and often require programming in a domain-specific language other than R, these frameworks are more flexible than most of the other packages listed here.

Interfaces to JAGS/OpenBUGS: R2jags, rjags, R2OpenBUGS (BUGS language).
Interfaces to Stan (C++ extensions): rstan, cmdstanr, rethinking (ulam() function).
Other frameworks: TMB (automatic differentiation and Laplace approximation via C++ extensions), RTMB (simplified R interface to TMB), tmbstan, nimble, greta (R interface to TensorFlow).

Model diagnostics and summary statistics

Model diagnostics

general: HLMdiag (diagnostic tools for hierarchical (multilevel) linear models), rockchalk, performance, multilevelTools, merTools (for models fitted using lme4), ggResidpanel, mlmtools, DHARMa.
influential data points: influence.ME, influence.SEM.
residuals: DHARMa.

Summary statistics

Correlations: iccbeta (intraclass correlation), rptR (repeatabilities)
R² calculations: r2glmm (R² and partial R²), MuMIn (r.squaredGLMM() function), partR2, performance (r2() function), rr2, mlmtools, mlmhelpr (Note that there are many different methods for computing R² values for (G)LMMs: see e.g. Nakagawa, Johnson and Schielzeth (2017), Jaeger et al. (2017).). Many of these packages also compute intra-class correlations.
Information criteria: cAIC4 (conditional AIC) , blmeco (WAIC).
Robust variance-covariance estimates: clubSandwich, merDeriv, mlmhelpr, glmmrBase

Derivatives

The first and second derivatives of log-likelihood with respect to parameters can be useful for various model evaluation tasks (e.g., computing sensitivities, robust variance-covariance matrices, or delta-method variances).

lmeInfo, merDeriv.

Data sets

Many packages include small example data sets (e.g., lme4, nlme). These packages provide previously described data sets often used in evaluating mixed models.

mlmRev: examples from the Multilevel Software Comparative Reviews.
SASmixed: data sets from *SAS System for Mixed Models
StroupGLMM: R scripts and data sets for Generalized Linear Mixed Models.
blmeco: Data and functions accompanying Bayesian Data Analysis in Ecology using R, BUGS and Stan.
nlmeU: Data sets, functions and scripts described in Linear Mixed-Effects Models: A Step-by-Step Approach.
VetResearchLMM: R scripts and data sets for Linear Mixed Models. An Introduction with applications in Veterinary Research.
languageR: R scripts and data sets for Analyzing Linguistic Data: A practical introduction to statistics using R.
nlmixr2data: includes the data sets for testing nlmixr2 against commercial competitors like ‘NONMEM’ and ‘Monolix’

Model presentation and prediction

Functions and frameworks for convenient and tabular and graphical output of mixed model results:

Tables: huxtable, broom.mixed, rockchalk, parameters, modelsummary.
Figures/visualization: dotwhisker, sjPlot, rockchalk, mlmtools

Convenience wrappers

These functions provide convenient frameworks to fit and interpret mixed models.

Model fitting: multilevelmod, ez, mixlm, afex, and nimble.
Model summaries: broom.mixed, insight
Variable selection & model averaging: LMERConvenienceFunctions, MuMIn, glmulti (see, e.g., maintainer’s blog or here for use with mixed models). mlmhelpr
Centering/scaling predictors at the population or group level: mlmhelpr, mlmtools, arm::standardize()

Inference and model selection

Power analysis and simulation

These topics are closely related because there are few available analytical methods for computing statistical power for mixed models; power usually needs to be estimated by simulation.

Power: longpower, pass.lme, simr, powerEQTL (powerLME function), mixedpower
Simulation: faux (archived); simulate() in lme4 (for formula arguments), glmmTMB::simulate_new(); rxode2, mrgsolve, PKPDsim (ODE/pharmacokinetic models)

Model selection

cAIC4 (cAIC4::stepcAIC), buildmer, MuMIn, StatisticalModels (GLMERSelect).

Commercial software interfaces

Mplus: MplusAutomation.
ASReml-R: asremlPlus.
babelmixr2 allows nlmixr2 models to be translated and run in either the commercial tool Monolix or NONMEM and then reads the results back in to create a standardized nlmixr2 fit object. This fit object runs the diagnostics in nlmixr2 and compares them to the ones output in the commercial software to “validate” the fit object against the output of the commercial tool. It also interfaces with free tools such as PKNCA for automatically using observed pharmacokinetic (PK) data for initial estimates of PK models.

CRAN packages

Core:	brms, broom.mixed, geepack, glmmTMB, lavaan, lme4, MCMCglmm, multilevelmod, nlme, sommer.
Regular:	afex, aod, aods3, ARpLMEC, asremlPlus, babelmixr2, bamlss, bcmixed, BGLR, blavaan, blme, blmeco, boot.pval, boxcoxmix, buildmer, cAIC4, car, CARBayesST, CLME, clubSandwich, coxme, CpGassoc, cplm, CRTgeeDR, DHARMa, dhglm, dotwhisker, effects, emmeans, ez, galamm, gamlss, gamm4, gee, geeM, geesmv, ggeffects, ggResidpanel, glmertree, glmm, GLMMadaptive, glmmEP, glmmfields, glmmLasso, glmmrBase, GLMMRR, glmtoolbox, glmulti, gpboost, greta, hglm, HLMdiag, huxtable, iccbeta, influence.ME, influence.SEM, inlabru, insight, JointAI, kinship2, languageR, lmeInfo, LMERConvenienceFunctions, lmeresampler, lmerTest, lmeSplines, lmmpar, longpower, lqmm, marginaleffects, MarginalMediation, margins, MASS, mbest, mclogit, MCMC.qpcr, mdhglm, mdmb, merDeriv, merTools, mgcv, mice, mixedsde, mixlm, mlmhelpr, mlmRev, mlmtools, mmrm, modelsummary, MplusAutomation, mrgsolve, multgee, multilevelTools, MuMIn, mvctm, mvglmmRank, nimble, nlmeU, nlmixr2, nlmixr2data, nlmm, ordbetareg, ordinal, pan, parameters, partR2, pass.lme, pbkrtest, pedigreemm, performance, pez, phyr, piecewiseSEM, PKNCA, PKPDsim, plm, powerEQTL, QGglmm, qrLMM, qrNLMM, QTLRel, R2BayesX, r2glmm, R2jags, R2OpenBUGS, regress, repeated, rjags, RLRsim, robustBLME, robustlmm, rockchalk, rptR, rr2, rrBLUP, rstan, rstanarm, RTMB, RVAideMemoire, rxode2, saemix, SASmixed, sem, semtree, simr, sjPlot, skewlmm, spaMM, sphet, spind, splmm, StroupGLMM, TMB, tmbstan, varTestnlme, VetResearchLMM, vglmer, WeMix, zoib.
Archived:	faux.

Other resources

CRAN Task View: Cluster
CRAN Task View: DifferentialEquations
CRAN Task View: Econometrics
CRAN Task View: MissingData
CRAN Task View: Pharmacokinetics
CRAN Task View: Psychometrics
CRAN Task View: Spatial
CRAN Task View: SpatioTemporal
CRAN Task View: TimeSeries
GitHub Project: cmdstanr
GitHub Project: cpgen
GitHub Project: inlatools
GitHub Project: lme4GS
GitHub Project: lme4qtl
GitHub Project: lmerMultiMember
GitHub Project: LMMsolver
GitHub Project: MegaLMM
GitHub Project: mixedpower
GitHub Project: rethinking
GitHub Project: rmm
GitHub Project: StatisticalModels