---
title: "The Jacquard package"
author:
- Jan Graffelman - Dpt. of Statistics, Universitat Politecnica de Catalunya; Dpt. of Biostatistics, University of Washington
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: Jacquard.bib
vignette: >
%\VignetteIndexEntry{The Jacquard package}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(warnings = FALSE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(fig.width = 6, fig.height = 6)
```
## Introduction
This document explains the basic functionality of the **Jacquard** package which provides functions for the estimation of the nine condensed Jacquard coefficients (@Jacquard) for biallelic genetic marker data using a constrained least squares approach (@Graffelman2024).
Outline:
1. [Installation](#installation)
2. [Estimation of the Jacquard coefficients](#coefficients)
3. [Estimation of derived relatedness parameters](#derived)
4. [Alternative estimation procedures](#mom)
5. [References](#refs)
## 1. Installation
The package can be installed from CRAN with the instructions
```{r preinstall}
#install.packages("Jacquard")
library(Jacquard)
```
## 2. Estimation of Jacquard the coefficients
We illustrate the estimation of the Jacquard coefficients using (0,1,2) coded genotype data available in the data set `SimulatedPedigree`, of which we show a small part
```{r data}
data(SimulatedPedigree)
SimulatedPedigree[1:5,1:10]
```
This matrix contains `r formatC(nrow(SimulatedPedigree), digits=0, format = "f")` individuals, spanning seven generations, in its rows and pedigree
information plus genotype data of 20,000 SNPs in its columns. We first
separate the pedigree information from the genotype information.
```{r}
Xped <- SimulatedPedigree[,1:5]
Xgen <- as.matrix(SimulatedPedigree[,6:ncol(SimulatedPedigree)])
```
We next determine the nine joint genotype counts for all pairs of individuals, storing these counts in a list object with nine lower triangular matrices, using function `JointGenotypeCounts`. For efficiency,
we here load the precalculated joint counts.
```{r}
#GTC <- JointGenotypeCounts(Xgen)
data(GTC)
names(GTC)
```
E.g., the counts of the major homozygote pairs for the first five individuals are
```{r}
GTC[[1]][1:5,1:5]
```
We create a vector with the minor allele frequency of each SNP.
```{r}
mafvec <- mafvector(Xgen)
mafvec[1:5]
```
We proceed to estimate the Jacquard coefficients by constrained least squares with function `Jacquard.cls`. For the sake of illustration, we take the first three founders of the pedigree, and subset their joint genotype counts
```{r echo = TRUE}
ii <- 1:3
Xped[ii,]
GTCsubset <- list(length = 9)
for (k in 1:9) {
GTCsubset[[k]] <- matrix(numeric(3^2), ncol = 3)
GTCsubset[[k]] <- GTC[[k]][ii,ii]
}
```
We set random initial values for the Jacquard coefficients
```{r}
set.seed(123)
delta.init <- runif(9)
delta.init <- delta.init/sum(delta.init)
```
And estimate the pairwise Jacquard coefficients of the three pairs.
```{r echo = TRUE}
output <- Jacquard.cls(GTCsubset,mafvec=mafvec,
eps=1e-06,
delta.init=delta.init)
Delta.cls <- output$delta
```
Convergence of the solver can be checked by looking at the field `convergence`, where 0 indicates proper convergence.
```{r}
output$convergence
```
Particular estimates of Jacquard coefficients can be extracted from the `Delta.cls` list object, which is a list of nine matrices. E.g., $\Delta_9$ of the first pair of individuals (1,2) can be extracted by
```{r}
Delta.cls[[9]][1,2]
```
All nine estimates of the Jacquard coefficients can be obtained with function `DeltaPair`
```{r}
DeltaPair(Delta.cls,1,2)
```
A pairwise list of the nine Jacquard coefficients of each pair can be obtained with the function `PairwiseList`.
```{r echo = TRUE}
PairwiseList(Delta.cls)
```
The full set of all pairwise Jacquard coefficients can be calculated
with the instructions below. This result is also available in the precalculated
data object `DeltaSimulatedPedigree`.
```{r}
#DeltaSimulatedPedigree <- Jacquard.cls(GTC,mafvec=mafvec,
# eps=1e-06,
# delta.init=delta.init)$delta
data(DeltaSimulatedPedigree)
```
Boxplots of the estimated Jacquard coefficients can be obtained with function `BoxplotDelta`. Separate boxplots are shown for the diagonals of $\Delta_1$ and
$\Delta_7$.
```{r}
BoxplotDelta(DeltaSimulatedPedigree)
```
## 3. Estimation of derived relatedness parameters
The set of five identifiable relatedness parameters dicussed by @Csuros, among them coancestry and inbreeding coefficients, can be calculated with
the function `CalculateTheta`.
```{r}
Theta <- CalculateTheta(DeltaSimulatedPedigree)
```
E.g., estimates of the kinship coefficients of the first five (founder) individuals are obtained by
```{r}
Theta[[1]][1:5,1:5]
```
Individual inbreeding coefficients can be obtained from ...
```{r}
diag(Theta[[2]])[1:5]
diag(DeltaSimulatedPedigree[[1]])[1:5]
```
Boxplots of identifiable relatedness parameters can be made with `BoxplotTheta`
```{r}
BoxplotTheta(Theta)
```
## 4. Alternative estimation procedures
There are many estimators for coancestry and inbreeding. Function `CalculateThetaMom` calculates moment estimators for the five identifiable
relatedness parameters defined by @Csuros.
```{r mom, echo = FALSE}
KS.mom <- CalculateMom(Xgen[1:10,],mafvec,verbose=FALSE)
```
E.g., moment estimators of the kinship coefficients of the first five individuals
are given by
```{r}
KS.mom[[1]][1:5,1:5]
```
## 5. References