Updated: 2024-11-12
The data generating process DGP
provides the ability to
simulate data from a variety of genetic architectures, including the
baseline model, allelic sum models, and allelic SKAT models.
The baseline allelic series model is: \[ \mathbb{E}(Y|N,X) = \sum_{l=1}^{L}N_{l}\beta_{l}
+ X'\beta_{X} \] where \(Y\)
is the phenotype, \(L\) is the number
of annotation categories, \(N_{l}\) is
the number of variants in annotation category \(l\), and \(X\) is a vector of covariates with
associated coefficient \(\beta_{X}\).
To simulate from the baseline model, the aggregation method
is set to "none"
. n
is the sample size, and
snps
in the number of variants. prop_anno
specifies the proportion of variants in each annotation category. \(L = 3\) annotation categories are adopted
by default. beta
is the per-variant effect size in each
annotation category. weights
is redundant with
beta
in the case of the baseline model, but these arguments
have distinct functions in the allelic sum and max models.
The allelic series sum model is: \[ \mathbb{E}(Y|N,X) =
\left(\sum_{l=1}^{L}N_{l}w_{l}\right)\beta + X'\beta_{X} \]
To simulate from the sum model, the aggregation method
is
set to "sum"
. In contrast to the baseline model,
beta
is now a scalar multiplier for the allelic sum burden
\(\sum_{l=1}^{L}N_{l}w_{l}\), while
weights
specifies the annotation category weights \((w_{1}, \dots, w_{L})\).
The allelic series max model is: \[ \mathbb{E}(Y|N,X) = \left(\max_{l=1}^{L}N_{l}w_{l}\right)\beta + X'\beta_{X} \]
To simulate from the max model, the aggregation method
is set to "max"
.
The generative version of the allelic series SKAT model is: \[ \begin{gathered} \mathbb{E}(Y|G,X) = \sum_{j=1}^{J}G_{j}\beta_{j} + X'\beta_{X}, \\ \beta_{j} = r_{j}\gamma_{j}\beta_{A_{j}} \end{gathered} \]
Here \(G_{j}\) is genotype at the
\(J\)th rare variant and \(\beta_{j}\) is the corresponding effect
size. The effect size of each variant are the product of a random sign
\(r_{j} \in \{-1, 1\}\), a scalar
frailty \(\gamma_{j} \sim \Gamma(\alpha,
\alpha)\) with mean 1 and variance \(\alpha^{-1}\), and \(\beta_{A_{j}}\) is the mean absolute effect
size for variant \(j\)’s annotation
category \(A_{j} \in \{1, \dots, L\}\).
To simulate from the SKAT model, the aggregation method
is
set to "none"
and the random_signs
argument is
set to TRUE
. The mean absolute effect sizes are set via
beta
. The variance of the frailty \(\gamma_{j}\) is specified with
random_var
. While the annotation category
weights
are not explicitly required for generation from the
SKAT model, weights
should be provided because the number
of annotation categories \(L\) is
inferred from the length of the weight vector.
Data can be simulated from any of the baseline, sum, max, or SKAT models simply by changing the lengths of annotation category proportions, effect sizes, and weights. These arguments should all be updated to reflect the number of annotation categories \(L\).
# Baseline model, 2 categories.
data <- AllelicSeries::DGP(
method = "none",
n = 100,
snps = 300,
prop_anno = c(0.6, 0.4),
beta = c(1, 2),
weights = c(1, 1)
)
# Baseline model, 4 categories.
data <- AllelicSeries::DGP(
method = "none",
n = 100,
snps = 300,
prop_anno = c(0.4, 0.3, 0.2, 0.1),
beta = c(1, 2, 3, 4),
weights = c(1, 1, 1, 1)
)
Real-data annotations or genotypes can be provided by specifying
anno
or geno
.
To simulate binary phenotypes from a probit model, set
binary = TRUE
. The binary phenotype is generated by first
constructing a latent normal phenotype Z
then dichotomizing
\(Y = \mathbb{I}(Z > 0)\).
The range of minor allele frequencies can be specified with
maf_range
. By default, variants have MAFs in the range of
0.1% to 0.5%. Genotypes are generated in such a way that the minor
allele count is always at least 1, guaranteeing that no empty variants
will be present.
The proportion of causal variants can be modulated with
prop_causal
. By default, all variants are causal for the
phenotype (unless beta
is set to zero). If
prop_causal < 1
, then a corresponding proportion of
variants is removed from the causal set.