--- title: 'clusterMI: Cluster Analysis with Missing Values by Multiple Imputation' output: bookdown::html_document2: base_format: rmarkdown::html_vignette bibliography: biblio.bib vignette: > %\VignetteIndexEntry{clusterMI: Cluster Analysis with Missing Values by Multiple Imputation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r,echo=FALSE} options(digits=2) ``` ```{r setup,message=FALSE, warning=FALSE} library(clusterMI) ``` `clusterMI` is an R package to perform clustering with missing values. Missing values are addressed by multiple imputation. The package offers various multiple imputation methods dedicated to clustered individuals (@Audigier21). In addition, it allows pooling results both in terms of partition and instability (@Audigier22). Among applications, these functionalities can be used to choose a number of clusters with missing values. # Wine data set The `wine` data set (@wine) is the result of a chemical analysis of 177 Italian wine samples from three different cultivars. Each wine is described by 13 continuous variables. This data set will be used to illustrate the `clusterMI` package. For achieving this goal, missing values will be added, and the cultivar variable will be omitted. ## Full data set ```{r, message=FALSE} require(stargazer) set.seed(123456) data(wine) stargazer(wine, type = "text") table(wine$cult) ``` ## Adding missing values Missing values are artificially added according to a missing completely at random mechanism so that each value of the data set is missing with a probability of 1/3 (independently to the values themselves). ```{r} ref <- wine$cult # "True" partition nb.clust <- 3 # Number of clusters wine.na <- wine wine.na$cult <- NULL # Remove the reference partition wine.na <- prodna(wine.na, pct = 1/3) ``` ```{r} # proportion of missing values colMeans(is.na(wine.na)) # proportion of incomplete individuals mean(apply(is.na(wine.na), 1, any)) ``` # Multiple imputation The `clusterMI` package offers various multiple imputation methods dedicated to clustered individuals. They can be divided into two categories: joint modelling (JM) imputation and fully conditional specification (FCS). The first assumes a joint distribution for all variables and imputation is performed using this model. The second proceeds variable per variable in a sequential manner by regression. FCS methods are more time consuming, but also more flexible, allowing a better fit of the imputation model. Multiple imputation is performed with the `imputedata` function. We start by presenting JM approaches in Section (\@ref(JM)), while FCS approaches will be presented in Section (\@ref(FCS)). ## Joint modelling imputation {#JM} The package proposes two JM methods: JM-GL and JM-DP. Both are based on a multivariate gaussian mixture model. - `JM-GL` is implemented in the `mix` package (@mixpackage). Initially, this method is dedicated to the imputation of mixed data, but it can be used by considering the partition variable as fully incomplete categorical variable. The method assumes constant variance in each cluster (for continuous data). - `JM-DP` is a joint modelling method implemented in the R packages `DPImputeCont` (@Kimpackage), `NPBayesImputeCat` (@NPBayesImputeCat), `MixedDataImpute` (@Mixed) for continuous, categorical or mixed data respectively. Such a method has the advantage to automatically determine a number of clusters (but this number needs to by bounded by `nb.clust`). Furthermore, it allows various covariance matrices according to the clusters (for continuous data). JM-GL is the default imputation method used in `imputedata`. To perform multiple imputation with this default method, we proceed as follows: ```{r, warning=FALSE, results='hide'} m <- 20 # Number of imputed data sets res.imp.JM <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m) ``` and we specify `method = "JM-DP"` for imputation using the other JM method: ```{r, warning=FALSE, eval=FALSE} res.imp <- imputedata(data.na = wine.na, method = "JM-DP", nb.clust = nb.clust, m = m) ``` ### Convergence {-} Both imputation methods consist in a data-augmentation algorithm alternating data imputation and drawing from a posterior distribution. The `m` imputed datasets are obtained by keeping one imputed dataset every `L` iterations. The first `Lstart` iterations consists in a burn-in period, which is required to reach convergence to the posterior distribution (expected from incomplete data). The `L` iterations between successive draws guarantee an independance between imputed values. `Lstart` and `L` can be checked by graphical investigations. For achieving this goal, we track the between inertia for each variable over successive iterations of the data-augmentation algorithm (available in the `res.conv` output from the `imputedata` function). In practice, we run imputation for a large number of imputed datasets `m` by keeping all intermediate imputed datatsets, i.e. at each iteration (`L = 1`) from the first (`Lstart = 1`): ```{r,echo=TRUE, results='hide'} res.imp.JM.conv <- imputedata(data.na = wine.na, method = "JM-GL", nb.clust = nb.clust, m = 800, Lstart = 1, # number of iterations for the burn-in period L = 1 # number of iterations between each draw ) ``` and then plot the successive between inertia values for the four first variables (as an example): ```{r,echo=TRUE, eval=FALSE} res.conv <- res.imp.JM.conv$res.conv res.conv.ts <- ts(t(res.conv)) # conversion as time-series object plot(res.conv.ts[, 1:4]) # diagnostic from the 4 first variables ``` ```{r,echo=FALSE, message=FALSE,fig.align='center',results='hide', fig.width=7} res.conv <- res.imp.JM.conv$res.conv res.conv.ts <- ts(t(res.conv)) # conversion as time-series object plot(res.conv.ts[, 1:4], nc = 4, main="", mar.multi = c(0, 4.1, 0, .1), xlab = "L") # diagnostic from the 4 first variables ``` Here, convergence seems to be reached at 400 iterations, meaning `Lstart = 400` is a suitable choice. Next, the number of iterations between successive draws can be checked by visualising the autocorrelograms. Here, an autocorrelogram represents the correlation between the vector of the successive between inertia and its shifted version for several lags. We seek to find a lag `L` sufficiently large to avoid correlation between the vectors of between inertia. Such graphics can be obtained as follows: ```{r, eval=FALSE} Lstart <- 400 # extraction of summaries after Lstart iterations for the 4 first variables res.conv.ts <- res.conv.ts[Lstart:nrow(res.conv.ts), 1:4] apply(res.conv.ts, 2, acf) ``` ```{r, message=FALSE,results='hide',fig.align='center',echo=FALSE,fig.width=7} Lstart <- 400 res.conv.ts <- res.conv.ts[Lstart:nrow(res.conv.ts), 1:4] # extraction of summaries after Lstart iterations for the 4 first variables res.acf <- apply(res.conv.ts, 2, acf,plot=FALSE) oldpar <- par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow = c(1,4), mar = c(4, 2, 3, 1) + 0.1) mapply(FUN = plot, res.acf, main = names(res.acf), MoreArgs = list(xlab = "L")) ``` Following such graphics `L = 20` (the default value) seems sufficient. Thus, the imputation step can be rerun as follows: ```{r, warning=FALSE, results='hide'} Lstart <- 400 L <- 20 res.imp.JM <- imputedata(data.na = wine.na, nb.clust = nb.clust, Lstart = Lstart, L = L, m = m) ``` Note that the imputation also requires a pre-specified number of clusters (`nb.clust`). Here it is tuned to 3 (corresponding to the number of varieties). We explain in Section \@ref(nbclust) how it can be tuned. Furthermore, the number of imputed data sets is tuned to `m = 20` which is generally enough (this choice will be discussed in Section \@ref(nbtab)). ## Fully conditional specification {#FCS} Fully conditional specification methods consist in a variable per variable imputation. The two fully conditional imputation methods proposed are `FCS-homo` and `FCS-hetero` (@Audigier21). They essentially differ by the assumption about the covariance in each cluster (constant or not respectively). To perform multiple imputation, we proceed as follows: ```{r, warning=FALSE, results='hide'} maxit <- 20 # Number of iterations for FCS imputation, should be larger in practice res.imp.FCS <- imputedata(data.na = wine.na, method = "FCS-homo", nb.clust = nb.clust, maxit = maxit, m = m) ``` With FCS methods, the `imputedata` function alternates cluster analysis and imputation given the partition of individuals. When the cluster analysis is performed, the `imputedata` function calls the `mice` function from the mice R package (@mice). The `mice` package proposes various methods for imputation. By default, `imputedata` uses the default method used in `mice` (predictive mean matching for continuous data), but others can be specified by tuning the `method.mice` argument. For instance, for imputation under the normal model, use ```{r, eval=FALSE} imputedata(data.na = wine.na, method = "FCS-homo", nb.clust = nb.clust, maxit = maxit, m = m, method.mice = "norm") ``` `FCS-hetero` allows imputation of continuous variables according to linear mixed models (various methods are available in the `micemd` R package @micemd). Furthermore, contrary to FCS-homo, FCS-hetero updates the partition without assuming constant variance in each cluster. ### Convergence FCS imputation consists in imputing each variable sequentially several times. Many iterations can be required (`maxit` argument). For checking convergence, the within and between inertia of each imputed variable can be plotted at each iteration, as proposed by the `choosemaxit` function ```{r conv, fig.height = 7, fig.width = 7, fig.align = "center", results='hide'} choosemaxit(res.imp.FCS) ``` Note that by default, only the five first imputed data sets are plotted (corresponding to the number of curves plotted for each variable). The `plotm` argument can be tuned to modify which curves should be drawn. In this case, the number of iterations could be potentially increased. For achieving this goal, the imputation should be rerun by increasing the `maxit` argument as follows: ```{r, eval = FALSE} res.imp <- imputedata(data.na = wine.na, method = "FCS-homo", nb.clust = nb.clust, maxit = 100, m = m) choosemaxit(res.imp) ``` For computational reasons, convergence diagnostic can be achieved by decreasing the number of imputed datasets `m`. When the number of iterations `maxit` will be chosen, then multiple imputation with a larger value for `m` could be considered. ### Specifying imputation models Fully conditional imputation methods are quickly limited when the number of variables is large since imputation models become overfit. To address this issue, we can use penalised regression as proposed in `mice` by specifying `method.mice = lasso.norm` for instance. Another way consists in specifying conditional imputation models by tuning the `predictmat` argument. This argument is a binary matrix where each row indicates which explanatory variables (in column) should be used for imputation. #### The `varselbest` procedure To tune this matrix in an automatic way, the `varselbest` function proposes to perform variable selection following @BarHen22. Briefly, `varselbest` performs variable selection on random subsets of variables and then, combines them to recover which explanatory variables are related to the response. More precisely, the outlines of the algorithm are as follows: let consider a random subset of `sizeblock` among p variables. Then, any selection variable scheme can be applied (lasso, stepwise and knockoff are proposed by tuning the `method.select` argument). By resampling (`B` times) a sample of size `sizeblock` among the p variables, we may count how many times a variable is considered as significantly related to the response and how many times it is not. We need to define a threshold (`r`) to conclude if a given variable is significantly related to the response (by default, `r` = 0.3). The main advantage of this function is that it handles both missing values and high-dimensional data. By default, the `varselbest` function performs variable selection by knockoff (@Barber15) based on `B` = 200 bootstrap subsets. The threshold `r` is tuned at 0.3 allowing to omit only variables very poorly predictive. The choices of `B` and `r` are discussed in next sections. Since the method is time consuming, the function allows parallel computing by tuning the `nnodes` argument. In the next example, the imputation model for the variable `alco` is obtained using the algorithm previously described. ```{r varselecho, eval=FALSE} nnodes <- 2 # Number of CPU cores used for parallel computation. # Use parallel::detectCores() to choose an appropriate number # variable selection to impute the "alco" variable B <- 50 # number of bootstrap subsets, should be increased in practice res.varsel <- varselbest(res.imputedata = res.imp.FCS, B = B, listvar = "alco", nnodes = nnodes, graph = FALSE) res.varsel$predictormatrix["alco", ] ``` ```{r varsel, eval=TRUE, echo=FALSE} nnodes <- 2 # Number of CPU cores used for parallel computation. # Use parallel::detectCores() to choose an appropriate number # variable selection to impute the "alco" variable B <- 50 # number of bootstrap subsets, should be increased in practice # res.chooseB<-chooseB(res.varsel) # sink(file = "C:/Users/vince/OneDrive - LECNAM/Recherche/MI_clustering/Rpackage/vignettes/sink/chooseB.txt");dput(res.chooseB);sink() # # res.varsel.light<-res.varsel # res.varsel.light$res.varsel$alco$res.detail<- rep(NA,length(res.varsel.light$res.varsel$alco$res.detail)) # str(res.varsel.light$res.varsel$alco$res.detail) # res.varsel.light$call$res.imputedata<-NULL # res.varsel.light$res.varsel$alco$res$listvarblock<-NULL # res.varsel.light$res.varsel$alco$call$knockoff.arg<-NULL # res.varsel.light$res.varsel$alco$call$glmnet.arg<-NULL # res.varsel.light$res.varsel$alco$call$stepwise.arg<-NULL # sink(file = "C:/Users/vince/OneDrive - LECNAM/Recherche/MI_clustering/Rpackage/vignettes/sink/resvarsel.txt");dput(res.varsel.light);sink() res.varsel <- list(predictormatrix = structure(c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0), dim = c(13L, 13L), dimnames = list( c("alco", "malic", "ash", "alca", "mg", "phe", "fla", "nfla", "pro", "col", "hue", "ratio", "prol"), c("alco", "malic", "ash", "alca", "mg", "phe", "fla", "nfla", "pro", "col", "hue", "ratio", "prol"))), res.varsel = list(alco = list( res = list(garde = c(malic = 23, ash = 22, alca = 20, mg = 21, phe = 20, fla = 21, nfla = 21, pro = 20, col = 19, hue = 21, ratio = 20, prol = 22), effectif = c(malic = 6, ash = 8, alca = 9, mg = 7, phe = 2, fla = 7, nfla = 3, pro = 9, col = 18, hue = 15, ratio = 12, prol = 8), proportion = c(malic = 0.260869565217391, ash = 0.363636363636364, alca = 0.45, mg = 0.333333333333333, phe = 0.1, fla = 0.333333333333333, nfla = 0.142857142857143, pro = 0.45, col = 0.947368421052632, hue = 0.714285714285714, ratio = 0.6, prol = 0.363636363636364), selection = c("ash", "alca", "mg", "fla", "pro", "col", "hue", "ratio", "prol" ), failure = c(malic = 0, ash = 0, alca = 0, mg = 0, phe = 0, fla = 0, nfla = 0, pro = 0, col = 0, hue = 0, ratio = 0, prol = 0)), res.detail = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), call = list( nnodes = 2, X = structure(c(1.71, NA, NA, NA, 1.87, 2.15, 1.35, NA, 1.73, 1.87, NA, 1.92, NA, 3.1, 3.8, NA, 1.6, 1.81, NA, NA, 1.9, 1.5, NA, 1.83, 1.53, 1.65, NA, 1.71, NA, NA, 3.98, NA, 4.04, 1.68, NA, NA, 1.67, 1.7, 1.97, 0.94, NA, 1.36, NA, NA, 1.21, NA, NA, 1.51, 1.67, 1.09, 1.88, NA, 3.87, NA, NA, 1.13, NA, NA, 1.61, NA, NA, 1.51, NA, NA, NA, 1.41, NA, 2.08, NA, NA, NA, 1.29, NA, 2.68, 1.39, NA, NA, 2.4, 4.43, NA, 4.31, 2.16, 2.13, 4.3, 1.35, 2.99, 3.55, 1.24, NA, 5.51, NA, 2.81, 2.56, NA, NA, 3.88, 4.61, 3.24, 2.67, NA, 5.19, 4.12, 3.03, 1.68, 1.67, NA, NA, 3.45, 2.76, NA, 2.58, 4.6, 2.39, NA, 3.91, NA, 2.59, 4.1, 2.43, 2.14, 2.67, NA, 2.45, 2.61, NA, 2.41, 2.39, 2.38, 2.7, 2.72, 2.62, 2.56, 2.65, NA, 2.52, 2.61, NA, NA, NA, NA, 2.36, NA, 2.7, 2.55, 2.51, NA, 2.12, 2.59, 2.29, 2.1, 2.44, 2.12, 2.04, NA, NA, 2.3, 2.68, 1.36, NA, NA, 2.16, 2.53, 2.56, NA, 1.75, 2.67, 2.6, 2.3, NA, NA, 2.4, 2, NA, 2.51, 2.32, 2.58, NA, 2.3, 2.32, NA, 2.26, 2.28, 2.74, 1.98, NA, 1.7, NA, NA, 2.28, 1.94, 2.7, 2.92, 2.5, 2.2, 1.99, 2.42, NA, 2.13, 2.39, 2.17, NA, 2.38, NA, 2.4, 2.36, 2.25, 2.54, 2.64, 2.61, 2.7, 2.35, 2.72, 2.35, 2.2, 2.48, NA, 2.48, 2.28, 2.32, 2.38, NA, NA, 2.64, 2.54, NA, 2.35, 2.3, NA, 2.69, NA, 2.28, NA, 2.48, 2.26, 2.37, 2.74, 15.6, NA, NA, 21, NA, 17.6, 16, 16, 11.4, NA, NA, NA, 20, 15.2, 18.6, NA, 17.8, 20, 16.1, 17, NA, NA, 19.1, NA, NA, NA, 13.2, 16.2, 18.8, NA, NA, 17, NA, NA, 12.4, NA, NA, 16.3, 16.8, 10.6, NA, 16.8, 19, 19, 18.1, NA, 16.8, NA, 30, 21, NA, 18, NA, 19, NA, 24, 22.5, 18, 22.8, NA, NA, 22, NA, 18, 21.5, 16, 18, 17.5, 18.5, 20.5, 22.5, 19, 20, NA, NA, 21.5, 20.8, NA, NA, 21.5, 21, 21, 28.5, 22, NA, 20, 21.5, NA, NA, 25, NA, 21, NA, 23.5, 20, NA, NA, NA, 22, 18.5, 22, 19.5, NA, 25, NA, NA, NA, 18.5, NA, 22.5, 24.5, 25, 19.5, NA, 23, 20, NA, 24.5, 127, 100, 101, NA, 96, 121, NA, 89, 91, 102, 112, 120, NA, 116, 102, NA, 95, NA, 93, NA, 107, 101, 106, 104, NA, NA, NA, 117, NA, 101, NA, 107, 111, 101, 92, 111, NA, 118, 102, 88, 101, 100, 87, 104, 98, NA, 151, 86, 139, 101, NA, 112, NA, 86, NA, 78, 85, NA, 90, 70, 81, NA, NA, NA, 134, 85, NA, 97, 88, 85, 84, 92, 94, 103, 84, 85, 86, 96, 102, 86, NA, 85, 92, 80, 122, NA, 106, NA, 89, NA, 101, NA, 89, 97, NA, 112, NA, 92, NA, 98, NA, 89, 97, 98, 89, NA, 106, 106, 90, 88, 105, 112, 86, 91, 102, NA, 120, NA, 2.8, 2.65, 2.8, 2.8, 2.5, 2.6, 2.98, 2.6, 3.1, NA, 2.85, 2.8, NA, NA, 2.41, 2.61, 2.48, 2.53, NA, 2.4, 2.95, 3, NA, 2.42, 2.95, 2.45, NA, 3.15, 2.45, 3.25, 2.64, 3, 2.85, 3.1, 2.72, 3.88, NA, 3.2, 3, 1.98, 2.05, 2.02, 3.5, 1.89, NA, 2.53, NA, 2.95, NA, NA, NA, NA, 2.83, NA, 2.2, NA, 1.65, NA, NA, 2.2, 1.6, 1.45, 1.38, 3.02, NA, 2.55, NA, NA, NA, 2.2, NA, 2.36, 2.74, 1.75, 2.56, 2.46, NA, 2.9, NA, NA, 2.86, NA, 2.13, 2.1, 1.51, NA, 1.7, NA, 1.38, NA, NA, NA, 1.4, NA, 2, 1.38, 1.7, 1.93, 1.48, NA, NA, 1.8, 1.9, NA, NA, 1.83, NA, 1.39, 1.35, NA, 1.55, NA, 1.39, 1.68, NA, NA, 1.65, 2.05, NA, 2.76, NA, NA, 2.52, 2.51, NA, 2.76, NA, 3.64, 2.91, 3.14, 3.4, NA, 2.41, 2.88, 2.37, 2.61, 2.94, NA, 2.97, 3.25, 3.19, 2.69, NA, 2.43, 3.04, 3.29, 2.68, 3.56, 2.63, 3, 2.65, NA, NA, 3.74, 2.9, NA, 3.23, 0.57, NA, 1.41, NA, 1.75, 2.65, 1.3, NA, 2.86, 2.89, 2.14, 1.57, NA, NA, 2.26, 2.53, 1.58, 1.59, 2.21, 1.69, NA, NA, 1.25, 1.46, NA, 0.99, NA, 2.99, 2.17, 1.36, 1.92, 1.76, NA, 2.92, 2.03, 2.29, 2.17, 1.6, NA, NA, NA, 3.03, 2.65, 2.24, 1.75, NA, NA, 1.2, 0.58, 0.47, NA, 0.6, 0.5, NA, 0.52, 0.8, NA, 0.65, 0.76, 1.36, 0.83, 0.63, NA, 0.58, 1.31, 1.1, NA, 0.6, NA, 0.68, 0.47, 0.84, 0.96, NA, 0.7, NA, 0.69, 0.68, 0.76, 0.28, NA, 0.3, 0.39, NA, 0.31, 0.22, 0.29, 0.43, 0.29, 0.3, NA, NA, 0.17, 0.25, 0.27, 0.26, NA, 0.34, 0.27, NA, 0.29, NA, 0.42, NA, NA, 0.2, 0.34, NA, NA, 0.32, 0.28, 0.3, 0.21, 0.17, 0.32, 0.21, 0.26, NA, 0.28, NA, 0.53, NA, 0.45, 0.37, NA, NA, NA, 0.21, 0.13, 0.34, 0.43, 0.43, 0.3, NA, 0.4, 0.61, 0.22, NA, NA, NA, 0.5, NA, 0.17, NA, NA, NA, NA, 0.29, NA, 0.48, 0.39, 0.29, NA, NA, 0.52, 0.3, 0.32, 0.43, 0.3, NA, NA, NA, NA, NA, NA, 0.17, 0.6, 0.53, 0.63, 0.53, 0.53, NA, NA, NA, NA, 0.47, 0.45, NA, 0.61, NA, NA, 0.63, 0.53, 0.52, 0.5, 0.6, 0.4, 0.41, NA, 0.39, NA, 0.48, NA, 0.43, 0.43, 0.53, NA, 2.29, 1.28, 2.81, 1.82, 1.98, 1.25, 1.85, 1.81, NA, 2.96, 1.46, NA, NA, 1.66, NA, 1.69, 1.46, 1.66, 1.45, 1.35, 1.76, 2.38, 1.95, NA, NA, 1.44, NA, NA, NA, NA, NA, 2.03, NA, 2.14, NA, 1.87, 1.62, NA, 1.66, 0.42, NA, NA, 1.87, 1.03, NA, NA, 2.5, 1.87, NA, NA, 1.15, NA, 1.95, 1.43, NA, NA, 1.62, NA, 1.56, 1.38, 1.64, 1.63, NA, 1.35, 1.56, 1.77, 2.81, 1.4, 1.35, NA, 1.63, 2.08, 2.49, NA, 1.04, NA, NA, 1.83, 1.71, NA, 2.91, 1.35, NA, NA, 0.94, NA, 0.84, 1.25, 0.8, 1.1, NA, 0.75, NA, 0.55, NA, 1.14, 0.86, 1.25, 1.26, NA, 1.55, 1.56, 1.14, 2.7, 2.29, NA, NA, 0.94, NA, 1.15, 1.54, 1.11, 0.64, 1.24, 1.41, 1.35, 1.46, 1.35, 5.64, 4.38, 5.68, NA, NA, 5.05, 7.22, 5.6, NA, 7.5, NA, NA, NA, 5.1, 4.5, NA, 3.93, 3.52, 4.8, 3.95, 4.5, 5.7, NA, NA, 5.4, 4.25, 5.1, 6.13, 4.28, 5.43, 4.36, 5.04, 5.24, 6.1, 7.2, NA, 5.85, NA, NA, NA, 3.27, 5.75, 4.45, 2.95, 4.6, 3.17, 2.85, 3.38, NA, 3.21, 3.8, NA, NA, 2.5, 3.9, NA, 4.8, 3.05, 2.45, 1.74, 2.4, 3.6, 3.05, 3.25, NA, 2.9, NA, NA, NA, 2.94, 3.3, 2.7, NA, NA, 2.9, 1.9, NA, 3.25, NA, NA, 2.8, NA, NA, NA, NA, NA, 5, 5.45, NA, 5, 4.92, NA, NA, 4.35, 4.4, 8.21, NA, 8.42, NA, 10.52, 7.9, NA, 7.5, 13, NA, NA, 5.58, NA, NA, 6.62, NA, 8.5, NA, 9.7, 7.3, NA, 9.3, 9.2, 1.04, 1.05, NA, NA, 1.02, NA, NA, 1.15, 1.25, 1.2, 1.28, NA, 1.13, NA, NA, 1.11, 1.09, 1.12, NA, 1.02, NA, 1.19, 1.09, NA, 1.25, NA, NA, 0.95, 0.91, 0.88, 0.82, NA, 0.87, NA, 1.12, NA, NA, 0.94, 1.07, NA, 1.25, 0.98, 1.22, NA, NA, 1.02, 1.28, 1.36, 1.31, NA, 1.23, 0.96, 1.19, 1.38, NA, 1.31, 0.84, NA, NA, 1.07, 1.08, 1.05, NA, NA, NA, NA, 1.42, 1.27, NA, 1.04, NA, 0.86, NA, NA, 0.93, 1.71, 0.95, 0.8, 0.92, 0.73, NA, 0.86, 0.97, 0.79, NA, 0.74, 0.78, 0.75, 0.75, 0.82, NA, NA, NA, 0.89, NA, 0.65, 0.54, 0.55, 0.48, 0.56, 0.6, 0.57, 0.67, 0.57, NA, 0.96, 0.87, 0.68, 0.7, 0.78, 0.74, 0.67, NA, NA, 0.7, 0.59, 0.6, 0.61, 3.92, 3.4, 3.17, 2.93, 3.58, 3.58, 3.55, 2.9, NA, 3, NA, 2.65, NA, 3.36, 3.52, 4, 3.63, NA, NA, 2.77, NA, 2.71, 2.88, 2.87, 3, NA, NA, 3.38, 3, NA, 3, 3.35, 3.33, 3.33, NA, 3.26, 3.2, NA, 2.84, 1.82, NA, 1.59, 2.87, NA, 2.3, NA, NA, NA, 3.5, 3.13, 2.14, 2.52, NA, NA, 3.14, 2.72, 2.01, NA, NA, 3.21, NA, 2.65, NA, NA, NA, 2.74, 2.83, 2.96, 2.77, 3.57, 2.42, 3.02, 3.26, 2.5, 3.19, 2.87, 3.33, 3.39, 3.12, NA, 3.64, NA, NA, NA, NA, 1.42, 1.29, 1.51, 1.27, 1.69, 2.15, NA, NA, NA, 2.05, 2, NA, 1.62, 1.47, 1.51, 1.48, 1.64, 1.73, 1.96, 1.78, NA, NA, 1.75, 1.68, 1.75, NA, NA, NA, NA, NA, 1.56, 1.62, 1.6, 1065, 1050, NA, 735, 1290, 1295, 1045, 1320, NA, 1547, 1310, 1280, NA, 845, 770, 1035, 1015, 845, 1195, 1285, 915, 1285, NA, NA, 1235, NA, 760, 795, 1035, 1095, 680, 885, 1080, 985, 1150, 1190, 1060, 970, 1270, NA, NA, NA, NA, NA, NA, NA, 718, 410, NA, 886, NA, NA, 463, 278, 714, NA, 515, NA, NA, 625, 480, 450, 495, 345, 625, 428, 406, NA, 562, 672, NA, 312, 680, NA, 385, NA, NA, NA, 365, 380, NA, 378, 466, 580, NA, NA, 600, NA, 720, 515, 590, 600, NA, 520, 550, NA, NA, NA, 480, 675, NA, 480, 880, 660, NA, 680, 570, NA, 615, NA, NA, NA, 470, NA, NA, 835, 840, NA), dim = c(118L, 12L), dimnames = list( NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12"))), Y = c(14.23, 13.2, 13.16, 13.24, 14.39, 14.06, 13.86, 13.75, 14.75, 14.38, 13.63, 14.3, 13.83, 13.64, 12.93, 13.71, 12.85, 13.5, 13.39, 13.3, 13.87, 13.73, 13.58, 13.68, 13.76, 13.05, 14.22, 13.56, 13.41, 13.88, 13.24, 13.05, 14.21, 13.9, 13.05, 13.82, 13.74, 14.22, 13.29, 12.37, 12.33, 12.64, 12.37, 12.17, 12.37, 13.34, 12.21, 13.86, 12.99, 11.96, 11.66, 11.84, 12.7, 12, 12.72, 12.08, 13.05, 11.84, 12.16, 12.08, 12.08, 12, 12.69, 11.62, 11.81, 12.29, 12.29, 12.08, 12.6, 12.51, 12.72, 12.22, 11.61, 11.76, 12.08, 11.03, 11.82, 11.45, 12.42, 13.05, 11.87, 12.07, 11.79, 12.04, 12.86, 12.88, 12.7, 12.51, 12.25, 12.53, 12.84, 12.93, 13.36, 13.52, 13.62, 12.25, 12.87, 13.32, 12.79, 13.23, 13.17, 13.84, 12.45, 14.34, 13.48, 13.69, 12.85, 12.96, 13.78, 13.73, 13.58, 13.4, 12.77, 14.16, 13.4, 13.27, 13.17, 14.13), B = 50, path.outfile = NULL, methods = "knockoff", sizeblock = 5, printflag = FALSE, r = c(alco = 0.3), seed = 1234567, nb.clust = 3, modelNames = NULL))), proportion = structure(c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.260869565217391, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.363636363636364, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.45, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.333333333333333, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0.1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0.333333333333333, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0.142857142857143, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0.45, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0.947368421052632, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0.714285714285714, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0.6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0.363636363636364, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0), dim = c(13L, 13L), dimnames = list(c("alco", "malic", "ash", "alca", "mg", "phe", "fla", "nfla", "pro", "col", "hue", "ratio", "prol"), c("alco", "malic", "ash", "alca", "mg", "phe", "fla", "nfla", "pro", "col", "hue", "ratio", "prol"))), call = list(data.na = NULL, listvar = "alco", nb.clust = NULL, nnodes = 2, sizeblock = 5, method.select = "knockoff", B = 50, r = 0.3, graph = FALSE, printflag = TRUE, path.outfile = NULL, mar = c(2.1, 4.1, 2.1, 0.6), cex.names = 0.7, modelNames = NULL)) res.varsel$predictormatrix["alco", ] ``` The function suggests considering the variables `r colnames(res.varsel$predictormatrix)[which(res.varsel$predictormatrix["alco", ]==1)]` to impute the `alco` variable. Then, imputation can be rerun by specifying the `predictmat` argument returned by the `varselbest` function ```{r,eval=FALSE} # multiple imputation with the new model res.imp.select <- imputedata(data.na = wine.na, method = "FCS-homo", nb.clust = nb.clust, maxit = maxit, m = m, predictmat = res.varsel$predictormatrix) ``` Note that for specifying all conditional imputation models you should use ```{r,eval=FALSE} varselbest(res.imputedata = res.imp.FCS, B = B, nnodes = nnodes) # (time consuming) ``` #### Convergence {#convb} The number of iterations `B` should be large so that the proportion of times a variable is selected becomes stable. The `chooseB` function plots the proportion according to the number of iterations. ```{r,eval=FALSE} res.B <- chooseB(res.varsel) ``` ```{r convb,fig.height = 4, fig.width = 4, fig.align = "center",echo=FALSE} res.chooseB <- list(alco = structure(c(1, 1, 0.5, 0.5, 0.333333333333333, 0.333333333333333, 0.25, 0.25, 0.25, 0.2, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.142857142857143, 0.125, 0.125, 0.125, 0.125, 0.125, 0.111111111111111, 0.1, 0.1, 0.1, 0.0909090909090909, 0.0833333333333333, 0.0833333333333333, 0.0833333333333333, 0.0833333333333333, 0.0833333333333333, 0.153846153846154, 0.214285714285714, 0.214285714285714, 0.266666666666667, 0.266666666666667, 0.266666666666667, 0.25, 0.235294117647059, 0.235294117647059, 0.235294117647059, 0.277777777777778, 0.277777777777778, 0.263157894736842, 0.263157894736842, 0.25, 0.25, 0.238095238095238, 0.227272727272727, 0.227272727272727, 0.260869565217391, 0.260869565217391, 0, 0, 0, 0, 0, 0.5, 0.333333333333333, 0.333333333333333, 0.25, 0.25, 0.25, 0.2, 0.2, 0.333333333333333, 0.285714285714286, 0.285714285714286, 0.285714285714286, 0.25, 0.25, 0.25, 0.25, 0.333333333333333, 0.3, 0.3, 0.363636363636364, 0.363636363636364, 0.363636363636364, 0.333333333333333, 0.333333333333333, 0.384615384615385, 0.384615384615385, 0.428571428571429, 0.4, 0.4, 0.375, 0.375, 0.352941176470588, 0.352941176470588, 0.352941176470588, 0.352941176470588, 0.388888888888889, 0.388888888888889, 0.388888888888889, 0.368421052631579, 0.35, 0.35, 0.380952380952381, 0.380952380952381, 0.380952380952381, 0.363636363636364, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.333333333333333, 0.333333333333333, 0.5, 0.4, 0.4, 0.4, 0.5, 0.5, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.5, 0.444444444444444, 0.444444444444444, 0.444444444444444, 0.444444444444444, 0.444444444444444, 0.4, 0.363636363636364, 0.363636363636364, 0.363636363636364, 0.416666666666667, 0.416666666666667, 0.384615384615385, 0.357142857142857, 0.357142857142857, 0.333333333333333, 0.333333333333333, 0.3125, 0.3125, 0.352941176470588, 0.352941176470588, 0.352941176470588, 0.352941176470588, 0.352941176470588, 0.388888888888889, 0.421052631578947, 0.421052631578947, 0.421052631578947, 0.45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0.2, 0.2, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.285714285714286, 0.375, 0.375, 0.375, 0.333333333333333, 0.4, 0.4, 0.4, 0.363636363636364, 0.363636363636364, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.384615384615385, 0.384615384615385, 0.357142857142857, 0.357142857142857, 0.357142857142857, 0.357142857142857, 0.333333333333333, 0.3125, 0.3125, 0.294117647058824, 0.294117647058824, 0.333333333333333, 0.333333333333333, 0.315789473684211, 0.315789473684211, 0.3, 0.3, 0.3, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.2, 0.2, 0.2, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.166666666666667, 0.142857142857143, 0.125, 0.125, 0.125, 0.111111111111111, 0.111111111111111, 0.1, 0.1, 0.1, 0.1, 0.0909090909090909, 0.0909090909090909, 0.0909090909090909, 0.0833333333333333, 0.0833333333333333, 0.0833333333333333, 0.0769230769230769, 0.0769230769230769, 0.0769230769230769, 0.0769230769230769, 0.0714285714285714, 0.0714285714285714, 0.0666666666666667, 0.125, 0.125, 0.117647058823529, 0.117647058823529, 0.111111111111111, 0.111111111111111, 0.111111111111111, 0.105263157894737, 0.105263157894737, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0.25, 0.25, 0.4, 0.333333333333333, 0.333333333333333, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.375, 0.333333333333333, 0.333333333333333, 0.4, 0.4, 0.363636363636364, 0.363636363636364, 0.363636363636364, 0.333333333333333, 0.307692307692308, 0.307692307692308, 0.285714285714286, 0.285714285714286, 0.285714285714286, 0.285714285714286, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.3125, 0.3125, 0.294117647058824, 0.294117647058824, 0.294117647058824, 0.294117647058824, 0.294117647058824, 0.277777777777778, 0.277777777777778, 0.315789473684211, 0.315789473684211, 0.35, 0.35, 0.333333333333333, 0.333333333333333, 0, 1, 0.5, 0.5, 0.333333333333333, 0.333333333333333, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.166666666666667, 0.166666666666667, 0.142857142857143, 0.142857142857143, 0.125, 0.125, 0.125, 0.111111111111111, 0.1, 0.1, 0.1, 0.0909090909090909, 0.0909090909090909, 0.0833333333333333, 0.0769230769230769, 0.0769230769230769, 0.0769230769230769, 0.0769230769230769, 0.0714285714285714, 0.0714285714285714, 0.0714285714285714, 0.0714285714285714, 0.0666666666666667, 0.0666666666666667, 0.0666666666666667, 0.0666666666666667, 0.0666666666666667, 0.0625, 0.0625, 0.0588235294117647, 0.0588235294117647, 0.111111111111111, 0.157894736842105, 0.157894736842105, 0.15, 0.15, 0.15, 0.142857142857143, 0, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.666666666666667, 0.666666666666667, 0.75, 0.6, 0.6, 0.6, 0.5, 0.5, 0.428571428571429, 0.375, 0.375, 0.375, 0.375, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.3, 0.363636363636364, 0.363636363636364, 0.333333333333333, 0.333333333333333, 0.307692307692308, 0.307692307692308, 0.357142857142857, 0.357142857142857, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4375, 0.4375, 0.470588235294118, 0.470588235294118, 0.470588235294118, 0.470588235294118, 0.444444444444444, 0.444444444444444, 0.473684210526316, 0.45, 0.45, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.875, 0.875, 0.888888888888889, 0.888888888888889, 0.888888888888889, 0.888888888888889, 0.9, 0.9, 0.9, 0.909090909090909, 0.916666666666667, 0.916666666666667, 0.916666666666667, 0.923076923076923, 0.923076923076923, 0.928571428571429, 0.928571428571429, 0.928571428571429, 0.928571428571429, 0.933333333333333, 0.9375, 0.9375, 0.9375, 0.9375, 0.941176470588235, 0.941176470588235, 0.941176470588235, 0.944444444444444, 0.944444444444444, 0.944444444444444, 0.947368421052632, 0.947368421052632, 0, 0, 0, 0, 0, 0, 0.333333333333333, 0.333333333333333, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.666666666666667, 0.666666666666667, 0.714285714285714, 0.714285714285714, 0.714285714285714, 0.625, 0.625, 0.666666666666667, 0.666666666666667, 0.7, 0.7, 0.727272727272727, 0.75, 0.75, 0.769230769230769, 0.769230769230769, 0.785714285714286, 0.785714285714286, 0.785714285714286, 0.733333333333333, 0.75, 0.75, 0.75, 0.705882352941177, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.684210526315789, 0.684210526315789, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.714285714285714, 1, 1, 1, 0.5, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.428571428571429, 0.428571428571429, 0.428571428571429, 0.5, 0.5, 0.555555555555556, 0.555555555555556, 0.6, 0.6, 0.6, 0.636363636363636, 0.636363636363636, 0.636363636363636, 0.666666666666667, 0.666666666666667, 0.666666666666667, 0.615384615384615, 0.642857142857143, 0.642857142857143, 0.642857142857143, 0.666666666666667, 0.666666666666667, 0.6875, 0.647058823529412, 0.647058823529412, 0.647058823529412, 0.611111111111111, 0.611111111111111, 0.631578947368421, 0.6, 0.6, 1, 1, 1, 0.5, 0.5, 0.333333333333333, 0.333333333333333, 0.25, 0.2, 0.2, 0.2, 0.333333333333333, 0.333333333333333, 0.285714285714286, 0.285714285714286, 0.25, 0.25, 0.222222222222222, 0.2, 0.2, 0.2, 0.272727272727273, 0.272727272727273, 0.272727272727273, 0.25, 0.25, 0.25, 0.230769230769231, 0.230769230769231, 0.285714285714286, 0.285714285714286, 0.333333333333333, 0.333333333333333, 0.3125, 0.3125, 0.294117647058824, 0.294117647058824, 0.277777777777778, 0.263157894736842, 0.263157894736842, 0.3, 0.3, 0.3, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.333333333333333, 0.363636363636364, 0.363636363636364, 0.363636363636364), dim = c(50L, 12L), dimnames = list( NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12")))) gridB.intern<-seq(1,length(res.varsel$res.varsel$alco$res.detail),ceiling(length(res.varsel$res.varsel$alco$res.detail)/100)) matprop<-res.chooseB$alco colnames(matprop) <- names(res.varsel$res.varsel$alco$res$proportion) matprop.plot <- as.data.frame(matprop) matprop.plot$id <- gridB.intern plot_data <- reshape2::melt(matprop.plot, id.var = "id", value.name = "proportion") linewidth <- 1;linetype <- "dotdash";xlab <- "B";ylab <- "Proportion" res.ggplot <- ggplot2::ggplot(plot_data, ggplot2::aes(x = id, y = proportion, group = variable, colour = variable)) + ggplot2::geom_line(linetype = linetype, linewidth = linewidth) + ggplot2::labs(x = xlab, y = ylab)+ ggplot2::labs(title = NULL) print(res.ggplot) ``` `B=50` iterations seems not enough. #### Tuning the threshold r{#tuner} To tune the threshold `r`, the vector of proportions can be graphically investigated. ```{r, fig.height = 4, fig.width = 4, fig.align = "center"} # check the variable importance round(res.varsel$proportion["alco",], 2) barplot(sort(res.varsel$proportion["alco",], decreasing=TRUE), ylab = "proportion", main = "alco", ylim = c(0, 1), las = 2, cex.names = .5) r <- 0.2 # a new threshold value (r = 0.3 by default) abline(h = r, col = 2, lty = 2) ``` Then, the predictor matrix can be easily updated at hand as follows ```{r, fig.height = 4, fig.width = 4, fig.align = "center"} predictormatrix <- res.varsel$predictormatrix predictormatrix[res.varsel$proportion>r] <- 1 predictormatrix[res.varsel$proportion<=r] <- 0 predictormatrix["alco", ] ``` By decreasing the threshold, more variables are used in the imputation model. A more automatic way to tune `r` consists in using the function `chooser`. `chooser` computes an optimal threshold using K-fold cross-validation. The call to `chooser` is highly time consuming, but the optimal value of `r` can be follows during the process through graphical outputs. By this way, the user can stop the process early. This can be achieved as follows: ```{r, eval=FALSE} chooser(res.varsel = res.varsel) ``` # Analysis and pooling After multiple imputation, cluster analysis and partition pooling can be done through the `clusterMI` function (@Audigier22). Next, kmeans clustering is applied on the imputed data sets as an example. ## K means clustering and other implemented methods kmeans is the clustering method used by default. ```{r, eval=FALSE} # kmeans clustering res.pool.kmeans <- clusterMI(res.imp.JM, nnodes = nnodes) ``` ```{r, echo=FALSE} res.pool.kmeans <- clusterMI(res.imp.JM, nnodes = nnodes, instability = FALSE,verbose = FALSE) res.pool.kmeans$instability<-list(U = c(0.0181100871102134, 0.00951395025880571, 0.0225299835879308, 0.0181744729200858, 0.0254525943693978, 0.0190544123216766, 0.0331624794849135, 0.0271001136220174, 0.0181504860497412, 0.0194470395152127, 0.0241068046963767, 0.0169953288726171, 0.0383171316752935, 0.0180330766317384, 0.0179093548794344, 0.0229781593233178, 0.0348604974119429, 0.031046585027143, 0.0191200605984093, 0.0279977275596516), Ubar = 0.023103017295796, B = 0.0721092390087769, Tot = 0.0952122563045729) ``` The `clusterMI` function returns a consensus partition (`part` object) as well as a instability measure (`instability` object). The `instability` object gathers the instability of each contributory partition (`U`), their average (`Ubar`), the between instability (`B`) and the total instability (`T`). ```{r} part <- res.pool.kmeans$part table(part) #compute cluster sizes table(part, ref) #compare the partition with the reference partition res.pool.kmeans$instability # look at instabilitiy measures ``` Among other clustering methods, `clusterMI` allows cluster analysis by k-medoids (`method.clustering = "pam"`), clustering large applications (`method.clustering = "clara"`), hierarchical clustering (`method.clustering = "hclust"`), fuzzy c-means (`method.clustering = "cmeans"`), or model-based method (`method.clustering = "mixture"`) ```{r, eval=FALSE} res.pool.all <- lapply(c("kmeans", "pam", "clara","hclust", "mixture", "cmeans"), FUN = clusterMI, nnodes = nnodes, output = res.imp.JM) ``` ## Custom clustering methods The user can also use custom clustering methods. For instance, to use reduced k-means, as implemented in the R package `clustrd`, analysis and pooling can be achieved as follows: ```{r, results='hide', message=FALSE, eval=FALSE} library(clustrd) res.ana.rkm <- lapply(res.imp.JM$res.imp, FUN = cluspca, nclus = nb.clust, ndim = 2, method= "RKM") # extract the set of partitions (as list) res.ana.rkm <- lapply(res.ana.rkm, "[[", "cluster") # pooling by NMF res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust) part.rkm <- res.pool.rkm$best$clust# extract the best solution based on several initialisations ``` Note that in this case, the instability is not computed. # Diagnostics ## Imputation model To check if the imputation model correctly fit the data, a classical way is to perform overimputation (@Blackwell15), as proposed by the `overimpute` function. Overimputation consists in imputing observed values several times (100 or more) and to compare the observed values with their imputed values. Overimputation is a time-consuming process. To limit the time required for achieving overimputation the user can: - use parallel computing by specifying the `nnodes` argument - perform imputation for a subset of individuals by tuning the `plotinds` argument - perform imputation for a subset of variables by tuning the `plotvars` argument In the next example, we use parallel computing and perform overimputation on the first variable (`alco`) for 20 individuals (at random) only. ```{r overimpecho, eval=FALSE} # Multiple imputation is rerun with more imputed data sets (m = 100) res.imp.over <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = 100, Lstart = Lstart, L = L, verbose = FALSE) # selection of 20 complete individuals on variable "alco" plotinds <- sample(which(!is.na(wine.na[, "alco"])), size = 20) res.over <- overimpute(res.imp.over, nnodes = nnodes, plotvars = "alco", plotinds = plotinds) ``` ```{r overimp, fig.height = 4, fig.width = 4, fig.align = "center", warning=FALSE, echo=FALSE, results='hide'} # Multiple imputation is rerun with more imputed data sets (m = 100) # sink(file = "C:/Users/vince/OneDrive - LECNAM/Recherche/MI_clustering/Rpackage/vignettes/sink/overimpute.txt");dput(res.over);sink() res.over <- list(res.plot = structure(list(var = c("alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco", "alco"), trueval = c(12.7, 13.82, 13.27, 12.85, 12.72, 12, 12.04, 14.38, 12.37, 14.16, 13.58, 13.32, 11.76, 11.84, 12.93, 12.99, 12.84, 13.86, 12.6, 13.69), xbar = c(11.7738376865082, 13.2781194335305, 12.7656078455516, 12.3550918675967, 12.1103766286982, 11.3488581416796, 11.689790703598, 13.8292914203422, 11.8838061588389, 13.4927187358782, 12.8831079064593, 12.7635311892978, 11.4102766462339, 11.6100843994868, 12.3771753207685, 12.1498426103087, 12.2019534817579, 12.8181064702831, 12.0241257849814, 12.9630596788976), binf = c(10.9002453672607, 12.5906496890719, 12.1071831918227, 11.7189914123926, 11.4744279355071, 10.5966941487251, 11.047877870576, 13.0803896346415, 11.1022261939148, 12.7714298904175, 12.0665951704424, 12.0436735061878, 10.6786684003405, 10.8910995794392, 11.7081349527227, 11.0226921594602, 11.4204289240371, 11.6007267438947, 11.2195525358362, 12.0458943601711), bsup = c(12.5277787644536, 13.7651267363568, 13.2746684227263, 12.8346923847886, 12.6520903164249, 11.9232169641688, 12.1702859416548, 14.3855364579454, 12.356405814087, 14.0423416960767, 13.4498872617714, 13.1819251692097, 12.253636795827, 12.3733229011399, 12.9137110119783, 12.8684270922574, 12.7028857537259, 13.7063102013102, 12.5359610362416, 13.5852137169506), pct = c(25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25), col = c("blue", "green", "green", "blue", "#FFFF00", "green", "green", "blue", "green", "#FFFF00", "green", "green", "#FFFF00", "#FFFF00", "green", "green", "green", "green", "green", "#FFFF00")), row.names = c("134", "53", "176", "24", "82", "81", "130", "15", "60", "173", "169", "149", "113", "78", "141", "74", "140", "72", "102", "162"), class = "data.frame"), res.values = structure(c(10.9027571590876, 13.3720282195999, 11.940000220621, 12.4588572744253, 11.3847526657941, 11.0424938170906, 10.8182234941578, 13.1903500550807, 11.9867297229711, 12.5651795406012, 11.9050384287735, 12.2377169380197, 10.9071143748198, 10.7402969152648, 11.8916671774033, 10.7483687696833, 11.9543405883846, 11.8711675999938, 11.1721688171028, 12.429957988662, 11.1565466858466, 12.8809103115834, 12.2345463355942, 12.2721010736648, 11.9472111163214, 10.8626662746027, 10.8245445791266, 12.9606544401327, 11.4494033653085, 12.3324693055764, 12.2247508157112, 12.3711588105178, 10.6999039049226, 11.4301330649025, 11.942147296416, 11.1830924000069, 11.5978085388801, 11.1797190594128, 10.3189861198363, 12.3448925493478, 11.6519750870664, 12.9569636903419, 12.3971591055247, 13.3424666171312, 11.1410690108762, 10.9547395412386, 10.752583046527, 13.1753255274174, 12.3526370463664, 12.8556473862018, 12.6389506786001, 12.5081245146474, 11.1326967435066, 11.7404285924661, 11.804289705235, 11.231969840932, 12.0644935613514, 11.3923367545699, 11.951192401443, 12.6474538555674, 11.2813390955753, 13.1568317275199, 12.1617912197155, 12.4175312612467, 11.4968467529353, 11.0735802190616, 11.4846702076383, 13.3450668311499, 11.6760747570171, 12.6259003728966, 12.2087322672767, 12.5099232030013, 10.7993155145507, 10.9446336618757, 11.5947927033276, 11.6562248489049, 11.9820537234751, 11.6136217365817, 11.3422621272226, 12.4483506938512, 11.5154559786991, 12.6617287782271, 12.3976159708104, 12.5139753530999, 10.9801009631126, 10.7272206892092, 11.6393462065375, 13.12419912975, 11.8331393781342, 12.2548919295972, 12.1628556514009, 12.3890623781872, 10.8075925631846, 11.6782327029972, 12.0454774003782, 11.7635608593472, 12.1198013060997, 11.5053532355346, 11.3427853118645, 12.2270863828367, 10.9389791121231, 12.3548805685626, 12.0804920188203, 12.3152732857473, 10.6284746679258, 10.6904719284285, 11.0495346026079, 12.7823014419389, 11.2945948786932, 12.2888226296922, 11.899856970866, 12.1292528234548, 11.0919860440238, 12.2448964036324, 11.592378092185, 10.3379373602849, 11.7483513951764, 11.5491467731465, 10.456342106865, 12.1091824959656, 10.9973071330217, 13.5565274002911, 12.1138559850733, 12.4510605498451, 11.5814711134874, 10.6786860450792, 11.321321943973, 13.6659792367535, 11.7867814410111, 13.1528951904566, 12.6934947356701, 12.6814821096393, 11.2423164537301, 11.5816002558534, 11.8583517913204, 12.3034144241126, 11.7972387880225, 11.9870278557281, 12.4595476322119, 12.6704047330414, 11.0166435151743, 13.2547846115944, 12.2530704584602, 12.295653117703, 11.5105762042341, 10.4285364572689, 11.5544516582806, 13.9761569043803, 10.8818425148185, 12.8743470690507, 12.1076348528391, 12.387423180391, 10.680099635907, 10.7893515369186, 12.1684328277367, 12.5178982030381, 12.0022366135973, 12.0210718763183, 11.243205097626, 12.5418117406201, 11.5382475868043, 13.0066416027229, 12.305845897045, 12.5917728314711, 11.5184073509155, 11.4466924685054, 11.7229535541997, 13.4763636438223, 11.5390539356132, 13.2433830958046, 12.5688021432062, 12.4986315452676, 10.674374693641, 10.9165365900694, 12.4093832932943, 11.61356586638, 12.0575268385451, 12.5348696356165, 11.776736157167, 12.5935269524206, 12.2105515075002, 13.5386501922408, 12.6905914840328, 12.1864757827785, 12.0187226427258, 11.2115725616238, 11.8710983681916, 14.178498874744, 11.9501026587433, 13.5959424857848, 13.3549989111154, 13.1062149915217, 11.350003270902, 11.4528135157338, 12.1272222769899, 12.4407177860254, 12.2405725336217, 13.3074765131803, 11.9110017389237, 13.0671295992395, 11.7102030709377, 13.2662502777398, 12.4742158741831, 13.1605647191523, 12.3669889518452, 11.2376514645056, 11.8126181570617, 14.2256180804511, 11.7595405720386, 13.4794584482731, 12.5640469903369, 12.8442209760747, 11.0549513982734, 11.7830016227482, 12.4367275436253, 12.2142645094865, 12.1223587673968, 12.8240698702957, 11.6549866597216, 13.2509496162664, 11.0033444391701, 12.9795358818914, 11.4362905256433, 12.1560774787869, 11.8087463788187, 11.1629406478309, 10.9110534800363, 13.6156895757381, 11.3845124557189, 12.9801486306881, 12.397977044426, 12.2549320444417, 11.4757538174958, 12.0196612907792, 11.4921618532943, 11.2980943330351, 11.6047941325197, 12.078514464844, 10.9735548389632, 12.5116854968119, 11.6348500432145, 13.2500040458792, 12.5294732441164, 12.0400041544148, 11.6269768512098, 11.5179463423713, 11.3703368849021, 13.8157074371326, 11.6238996221861, 13.3207245192497, 12.6338793906785, 12.6856602111593, 11.2962267944479, 11.4549055814343, 12.1370766943646, 11.4826859437553, 12.6142281219058, 12.461455769393, 11.5025903120235, 12.6140028381399, 11.5899707159458, 12.8087589815895, 13.6301197642354, 12.5252126065991, 11.5264778651642, 11.2917551130727, 11.8993331735439, 13.5971077313598, 11.6803042381458, 13.7878312703689, 12.8102288139545, 13.2090818374908, 11.5917602820632, 12.1193253919106, 11.9841579588378, 11.2749756756072, 11.8404247271721, 11.8176779786652, 11.8272557258861, 12.9570626709244, 11.7290907355413, 13.178446668331, 12.6199349612434, 12.3661842311701, 12.0033535113582, 11.2502761502918, 11.5851511757457, 13.5653942923956, 11.9003642162669, 13.3602312922933, 12.0160950400693, 12.8363782541617, 11.151508553576, 11.6010834134442, 12.2570910542175, 12.0714541849123, 12.1935774936786, 12.8126799630977, 10.9700918623647, 12.5832278506997, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.5393372050152, 12.8598755811367, 12.2655549570779, 11.9942309306618, 11.7879214676301, 11.2072814754551, 11.2832074063161, 12.9410445056666, 11.5732373736743, 13.1451743100533, 12.2351140394576, 12.5043327812142, 11.305589856707, 11.1820446405717, 11.7364705150715, 11.4724769671485, 12.023384737459, 12.2111508924653, 11.3737864894833, 12.2376297524099, 11.9434221817735, 12.7972746315102, 12.160229480043, 12.0830370357853, 11.9539766019123, 11.458629107955, 11.4986235378307, 13.6648084438481, 11.9413666581253, 12.8413416597816, 12.7047159195651, 12.8526096368328, 11.3659183919746, 11.4177042900882, 12.2733452511119, 11.4595899139866, 12.4716664424727, 12.3486328422377, 11.2313984655195, 13.2385806197948, 11.7155168167518, 13.4488008042901, 12.6768268944607, 12.1930971202147, 11.8380354697398, 11.7850179880391, 11.7551048425008, 13.8116912021305, 11.9252922922514, 13.5455986538724, 12.7995529384222, 12.4583704002942, 11.2856398576913, 11.5778880891754, 12.0491781362349, 11.7538659191687, 12.2437632042095, 12.257985685942, 11.9968723975725, 13.0763498668474, 11.928890459588, 13.3891871447889, 12.6832140375711, 12.5572164628705, 12.2948332058549, 11.3567333309965, 11.8757598789415, 14.2435914065749, 12.0367940873365, 13.1565149043356, 12.7466329968575, 12.5961641568106, 10.886527157571, 11.6105300885712, 12.2900706287732, 12.3386481464732, 12.4428581791742, 13.497502303532, 12.1738465420928, 13.2689340332475, 12.0142353271438, 13.4361783392807, 12.4916047376682, 12.7179953765663, 12.6294632743964, 11.3785327890413, 11.5399487312372, 13.5868678972884, 11.9642442096689, 13.2122093668972, 13.0436533686635, 12.9948545084114, 11.497327645192, 11.3300445532421, 12.6093737207955, 11.9238503927387, 12.2740188998421, 13.0652787126032, 11.6137624314142, 13.1663285677321, 11.4151313222507, 13.3730722802691, 12.0236173855206, 12.3950302873615, 11.5572082969642, 10.9226861586599, 11.6236658501974, 13.87835100956, 11.8858542298763, 12.9817961767259, 12.7029780014271, 12.8727059795524, 10.6599880489515, 11.5428406582753, 11.9118793467095, 11.9924066548883, 12.4136405871024, 12.1491196150975, 11.3786088492797, 12.7508473989825, 12.1309150970234, 12.9630506336348, 12.2168036383335, 12.5120079990011, 12.592084201283, 11.467061260114, 11.7797591365148, 13.9661922156025, 11.6668055294142, 13.5092594127873, 12.6257498867552, 12.6955170819408, 11.471330604252, 11.1802632853897, 12.4436421349477, 12.5049740175941, 12.0052728752578, 12.7739993151007, 11.4886356802424, 12.9207301098653, 12.1327163080888, 13.182613847294, 12.5831900444033, 12.6231724853152, 11.6979221215998, 11.6046854137197, 11.5314221127054, 13.7454250233543, 12.2220008401973, 13.5287868857394, 12.7957887739507, 12.8520083943061, 10.8006312464122, 11.5709366052856, 12.3531042482877, 12.604325880041, 12.4725862985276, 12.9397034339565, 11.8203005635883, 13.0619338586258, 12.0059504579831, 13.3748476419394, 12.7342793931063, 12.1063936804234, 11.936954463502, 11.0938008291524, 11.8723287746691, 13.8302405992103, 11.6184238315989, 13.834266941411, 12.959064402967, 13.1078417789627, 10.9420164120196, 11.2832680928059, 12.5542489205997, 12.1647309038926, 11.8917935787814, 12.6360810485249, 11.5368692430942, 12.5283546778987, 12.278560181478, 13.4157747179979, 13.036609372142, 12.6553212089659, 12.6570797878646, 11.818569941231, 11.8095749580922, 14.2562459719411, 12.1329654212563, 13.9660053836878, 13.3306063548038, 12.9242937867012, 11.27242881383, 11.4695720134184, 12.9145321209486, 12.6578274103185, 12.2412182876062, 13.3823046045775, 12.1455325244381, 13.488028235304, 12.2604604981762, 13.8322573779456, 12.8464852903872, 12.3851242903891, 12.5294125015219, 11.6523400525619, 11.7725478209598, 13.8968048330413, 11.9164871496124, 13.6553012626764, 13.3719003713921, 13.0993098104734, 11.5414286075817, 11.5578386068544, 12.5840269295835, 12.5005043043175, 12.4622048366594, 13.6609252976536, 12.2021373735729, 13.2810745286227, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 12.2730558037744, 13.4455380456546, 12.8580592427116, 12.4227873006222, 12.3182166527325, 11.6314185990482, 11.7366588517982, 14.4302975709319, 11.6605724196907, 13.5728036544708, 13.3434093886478, 12.8281657034197, 11.3617618378445, 11.7987227621715, 12.5842375691488, 12.3054484724115, 12.5361337178109, 13.2848308106348, 12.2650147658898, 13.3678836208855, 12.5229409289664, 13.642681531154, 12.9638388724753, 11.9679539478201, 12.4423262105083, 11.5560890296052, 11.7653391614463, 13.9039034605951, 11.9788520008251, 13.8203711764835, 13.2025549487427, 13.1051309189183, 11.5629914378106, 11.5638370521503, 12.6878928244435, 12.4830030773073, 12.4680960149703, 13.677441065139, 12.360091692288, 13.2414543064261, 10.352503187347, 12.9469351357265, 12.56784317289, 12.7944114690606, 11.6867688344442, 10.9345342131295, 12.2584473430186, 14.336828333391, 11.1297585338564, 13.0309632820284, 13.0539573124124, 12.678302186866, NA, 11.2247269982608, 12.4925946573358, 12.5413825862576, 12.2815224003922, 12.0223376932663, 12.1950262262004, 12.0615188869851, 11.9973624448043, 13.0401898026504, 12.6864141905704, 12.2538704477746, 11.8110887337087, 10.8287768994097, 11.7211658972285, 13.475954646048, 12.0256317651045, 13.0774406589333, 12.6671585944138, 12.5633273185913, 11.1970430736989, 11.4480809787562, 11.4842655965831, 11.6645491357858, 12.2471507499981, 12.9563215513965, 11.7136749041106, 12.7757229850282, 11.9850180694918, 13.4772713622811, 13.0467007044156, 11.9638558140376, 12.2658516596762, 11.1879093954532, 11.541822430248, 13.6418339900708, 12.063344733493, 13.4880172767453, 13.1956547544198, 12.6023610377398, 11.3860709596121, 11.2633938865502, 12.373221575826, 12.3300636390463, 12.3127051984223, 13.1551183311563, 11.6122546252578, 13.6798566641997, 12.0435741255817, 13.0961548429281, 12.4125487988855, 12.4422209250598, 12.136750981623, 11.4942145931276, 11.6090612705411, 14.0525975138166, 12.0935495956422, 13.4167645205786, 12.9123209024181, 12.9137144075419, 11.2727753694184, 11.1912562864006, 12.5989748105203, 12.2053738434396, 12.4326069218954, 13.3521521531052, 12.1196037569693, 13.0275777558195, 11.5280918171315, 13.4781847966433, 13.3608510975362, 12.0450172812986, 11.7101743778738, 11.0890576135307, 12.7647894869712, 13.7992933959691, 11.7199942783209, 13.725164496452, 13.41371309537, 13.2941290545865, 12.7529671324927, 11.1380505798721, 12.1994755922483, 11.2709862189576, 12.8824736068824, 11.5327380046769, 12.1410503973127, 12.8660797634563, 11.5212651429848, 12.8253692797878, 12.4019449718659, 12.3867358911517, 12.1979194545475, 11.6379727457369, 11.6077019416126, 13.5905997850023, 12.0715804702619, 13.4188494655234, 12.506887795053, 12.7818858724817, 11.1983171287643, 11.8222000230033, 12.6806849863954, 12.3624375014284, 12.52988999623, 13.0332278507231, 11.65699765626, 12.909518373947, 12.0303770140753, 13.277885293731, 12.7599780657718, 12.2938049065559, 12.3264302022841, 11.5960528930636, 11.5001367318458, 14.0248393084405, 12.29973324089, 13.6384019492204, 13.0357337837167, 12.9382331656775, 11.5437509895706, 11.6807472501863, 12.9420323614893, 12.4559762046665, 12.5501632641266, 13.4646547050552, 12.3289598669765, 13.1892612229751, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.7763759972509, 13.2296990083637, 12.3683010450742, 12.4849605810544, 12.2315201278412, 11.4543741803095, 11.687390988967, 14.092571208206, 11.8923416902844, 13.4673837778474, 12.9150257347454, 12.7251154174272, 11.3951856475693, 11.655561440774, 12.4871465295287, 12.3005584074156, 12.4431029881206, 13.1248511735857, 11.7457348988733, 12.9307927285351, 12.4332121776699, 13.8231722687901, 12.865132739627, 12.7453204053125, 12.1505714695303, 11.8775353610962, 11.7279528118611, 13.8839663750813, 11.7968040180193, 14.0152581205496, 13.3711343870114, 12.9898290415242, 11.5612285307566, 12.2084767356854, 12.5816001419491, 12.4109190903876, 12.7769744659107, 13.4039666881016, 12.0262871664718, 13.5433770572096, 12.474158795309, 13.650080892289, 13.1757120838082, 12.7643681205312, 12.4183314350571, 11.8973039417662, 11.8432467142943, 14.1574765565384, 12.0698382563766, 14.138284027763, 13.3971075685986, 13.175730839695, 11.2432820294525, 11.7466293111797, 12.6769124285635, 12.714875687724, 12.5233382044858, 13.6633016379722, 12.3843236382816, 13.4394569297175, 10.3904641527393, 12.9763629288308, 12.6938037118141, 12.1416736680642, 11.6321506663374, 10.6303972995042, 11.8395317561396, 13.9279162508224, 10.9540612751819, 13.0263508257069, 12.3688551071206, 11.8773068985806, 10.4774729299872, 10.4070885916904, 12.1234053684524, 11.5186334732531, 11.56852917772, 12.225396916979, 12.1518264649252, 12.2927227976272, 12.375993663366, 13.386098395214, 12.9673154900272, 12.5865443554421, 12.3849730096063, 11.6984736412274, 11.6392962126255, 13.9203909392538, 12.2171863933348, 13.8580139663386, 13.2031546044427, 13.0068180970987, 11.321926281871, 11.6192134131366, 12.7599387504826, 12.741833750858, 12.6440686826969, 13.5059255935867, 12.4018340133467, 13.3901014062342, 12.3735074589841, 13.7477910708867, 13.173768682418, 12.8411291779921, 12.4042052562892, 11.9335858140433, 12.0141719830504, 14.1197505973962, 12.2165518516319, 13.949680778888, 13.3103541606952, 13.1169084524435, 11.6832736010009, 11.6618547529502, 12.625724670541, 12.6962033352154, 12.6676118165849, 13.5260337581492, 12.2597016019142, 13.528290398636, 11.399491204466, 13.3155653984668, 12.95927552562, 12.3471256299659, 12.2148401007754, 11.1163544647587, 11.5598151525434, 14.0339392526704, 11.902431689342, 13.5192752578837, 12.8641934593137, 13.0155651020213, 11.319919968889, 11.2714658180592, 12.3204076711592, 11.7566720819774, 11.9805220054309, 12.233595272881, 12.1744476006393, 13.5641147087419, 11.3660037349744, 13.1138631550319, 12.9728488510597, 12.6616579908974, 12.6149860059304, 11.3622204152128, 11.8864487804353, 13.8159815766436, 12.2887570538584, 12.866632493346, 12.491446481754, 11.8999384294254, 12.2571384401065, 12.0022759417132, 11.7893223529602, 12.0142533084167, 11.2973065294882, 12.016540604046, 11.7144822112326, 12.9991048161478, 10.6446538211167, 13.5303691510813, 11.7251395488737, 11.6799844959407, 11.7475439226942, 11.1993614630968, 11.349821988388, 13.2338132782314, 11.1321872848558, 13.133557916753, 12.291217629472, 12.1719934055026, 10.3910814678633, 11.8226559439813, 11.9289238457939, 11.4105947746351, 11.9404431664303, 12.1490687661543, 11.5464505859303, 12.661233140726, 12.3943815124513, 13.4865032146794, 13.0175498997761, 12.6205548771332, 12.2570076975105, 11.7821963621301, 11.8289534791217, 13.900266760634, 12.0065725745798, 13.8605884272343, 13.1678642470628, 13.221615851154, 11.3866048278284, 11.5779866700041, 12.6376287758126, 12.685277809404, 12.6560267550019, 13.44059050945, 12.1213850486358, 13.3900512421908, 12.5088910224167, 13.6552539110473, 13.1519365557871, 12.5424867028952, 12.6490050199557, 11.8810583860997, 11.8454346703769, 14.2560542465013, 12.6881072939419, 13.9657245749273, 13.3824480195102, 13.1314818715186, 11.5414302363924, 11.6623713591239, 12.7713355562747, 12.8722769187542, 12.5886159495194, 13.7380684766617, 12.4783921328282, 13.5973811499126, 11.3558943492333, 13.6743959467468, 13.1353824536453, 12.4400184683218, 12.3124137639866, 11.9608707866252, 11.9125354112978, 13.6977857421606, 11.6661825599061, 13.6839582479109, 13.2233687392484, 13.1212805642233, 12.3551038888123, 11.944537976147, 12.9135057347358, 12.8148590965566, 12.3243296123081, 13.7181083537689, 11.9718224339689, 13.5364384816054, 12.3871254420644, 13.1919076988403, 12.7686021282143, 12.6389882653731, 12.6583478435844, 11.7219409096222, 11.7498036659852, 13.7449240379206, 12.1878215512353, 13.6344629747159, 12.9986291647334, 13.0217304601369, 11.4396368216168, 11.5637539427516, 12.3555073669926, 12.7708058883613, 12.5389380631649, 13.6556940528247, 12.0230102044507, 13.5388110687559, 12.5026122619153, 13.6901298655501, 13.0968327375709, 12.6607598660041, 12.4789485906501, 11.7961732201514, 11.8271889208983, 14.1550166279867, 12.3149034154595, 13.9969954199959, 13.5046403051572, 13.1592251906692, 11.5810964480693, 11.7887079535448, 12.7479571068557, 13.3334911721736, 12.5487465869157, 13.5813591573868, 12.3189509378157, 13.4591325050796, 12.5471301064026, 13.5892292602605, 13.1210771905801, 12.7435948926412, 12.650842948565, 11.9490391078218, 11.8006517770909, 14.2092919142078, 12.1444980973615, 13.9391815561078, 13.2839046444758, 13.1454813231011, 11.7322153274605, 11.7858927351905, 12.8052506984304, 12.7189716545155, 12.5568818896457, 13.5878442754417, 12.4477777359314, 13.458613276019, 11.5372694048211, 12.6658988462833, 13.1221991544681, 11.8969288496479, 12.0339560348556, 10.7204984942366, 11.0412509424485, 13.2561433626765, 12.012941116033, 13.6370073728711, 13.002197695103, 12.7885604643513, 11.0553402593453, 11.2876745189804, 12.1992697192111, 11.9848017068259, 12.278456605073, 11.9934645499774, 11.6966758795366, 12.8579605687482, 11.8289387546131, 13.4722714473599, 13.0082937484125, 12.2367344596312, 12.5730361652089, 11.469360719863, 11.6550213330642, 14.1835544602637, 11.8150404817403, 13.1355716488868, 13.3268012030075, 12.7921198485997, 11.6157693286044, 11.4375928910554, 12.5919185255578, 12.6441454762334, 11.4786698756664, 12.0568260083227, 12.1817501237446, 13.179493828018, 12.482186916007, 13.8135383861336, 13.0877042307159, 12.8190994365235, 12.5982547210658, 11.9219181317463, 11.8412909080179, 14.2338829348871, 12.2979588611735, 14.058719661098, 13.4658672864207, 13.1614571666697, 11.6319765583958, 11.6707435299107, 12.748592603903, 12.8012422788702, 12.6888894176377, 13.650958646909, 12.4394506230388, 13.5999420316998, 12.3057574313399, 13.5129788330187, 12.9984008361046, 12.3400230854469, 11.9757830629178, 11.5598324417962, 11.662716037746, 14.1650490878365, 12.3459077129506, 13.872050080087, 13.3018628393663, 12.937836364744, 11.5189867524327, 11.4605117598805, 12.4847538730474, 12.7640734106732, 12.6057307682027, 13.44949472608, 12.4208079578057, 13.2582179349943, 12.089352006588, 13.4958146137878, 13.0671543845645, 12.6097699344219, 12.4457515338686, 11.5362860192753, 11.7393968626716, 14.166624338231, 12.0908451208575, 13.4782855624081, 13.2999955547276, 12.963695118899, 11.6117278488078, 11.5599346216334, 12.4371596873267, 12.4043638509573, 12.4600707376002, 13.66024659012, 12.1113548639499, 13.4943391605217, 12.3984179257503, 13.5589900910887, 13.0836905932385, 12.6080900673645, 12.5631134723535, 11.8327692692605, 11.7924907578967, 14.0623227726551, 12.2302315773303, 13.8347226923388, 13.2681637853064, 13.2067024872683, 11.5587464374154, 11.7251832782771, 12.6986107796993, 12.805328088034, 12.6292919914546, 13.6988770593492, 12.3137639788941, 13.4931149751946, 11.6937941724472, 12.5868096224204, 13.5923481827244, 12.9242008651621, 12.3731542459281, 10.5839508552645, 11.7418152623521, 14.1345300412501, 12.2761436452101, 13.544323241565, 12.0910282805243, 12.8164502035373, 12.0971197466652, 11.8478180090154, 12.5966568277367, 11.8405894318279, 12.1185219654355, 12.3808092066813, 12.3090561687631, 13.456426891685, 11.229218840972, 12.7722119776126, 12.6311678027325, 11.8402552956604, 11.8094751120725, 10.7669196406645, 11.6262237030047, 13.6929072773718, 11.5877420465114, 13.5147625115529, 12.2628509658487, 12.8525339063345, 11.3170372628272, 11.7851818086191, 11.8375340275913, 11.4812912124711, 11.5480262693629, 12.2628193350118, 12.1268382715268, 12.362287228864, 12.5154454573614, 13.6890874448819, 13.1133246265706, 12.6508980062191, 13.0944384363352, 11.7444220178475, 11.788202312252, 14.2235087461741, 12.2785817243029, 13.9817058267098, 13.4413325860831, 13.0621179201736, 11.6638543730146, 11.6611111021212, 12.7792414107846, 12.9184301634218, 12.7024261860363, 13.6161052047215, 12.5368295609207, 13.5762474246579, 11.1733625350445, 12.7050756608253, 12.5548026220104, 11.7264513906262, 11.6121534773929, 11.1822515222237, 11.0680875287627, 13.4985122985148, 11.2108596207772, 13.6059979362596, 12.5672223237942, 12.2494580367023, 11.5569358921532, 11.9075395566093, 11.7755496474401, 12.5690917935476, 11.4707953685782, 12.4259070657297, 12.1889983710044, 12.0654485790551, 12.1425705261961, 13.5671392278641, 13.0353680721295, 12.3887601753491, 12.1219214700775, 11.8236817689861, 11.4285440077107, 13.9948814097669, 11.9901321729555, 13.932488091816, 13.2658360247112, 12.970114120802, 11.4392578178997, 11.7379581814753, 12.5762355274621, 12.5302386964867, 12.5277033085028, 13.3956393385222, 12.1684605138683, 13.1633048170379, 12.2722919103773, 13.5355475267526, 13.1116758697668, 12.333135690233, 12.6665440995659, 11.6689826776057, 11.663592050513, 13.8785363643371, 11.9292220396984, 13.9052476321737, 13.0080009369155, 12.9598050088917, 11.531196640227, 11.1902749867982, 12.681403227326, 12.6435488155062, 12.5519359819725, 13.4235261985135, 12.0765630018092, 13.3260076099378, 11.5996752022672, 13.5947317045457, 12.9106799892542, 12.387935276999, 12.7430349684579, 11.9284122938591, 11.9364590453286, 14.3854330557068, 11.6564773012812, 13.5651412423758, 12.7488489295768, 12.4943666633842, 12.0892282242953, 11.7047668962075, 12.2523636664427, 12.8297507190494, 12.6783708795887, 12.6351337039207, 12.5043297110043, 12.7791815381655, 11.4527573696959, 13.4055151396477, 12.9131997093429, 12.097928047849, 11.9823572184875, 11.3984770700353, 11.9274457924076, 13.9500184988352, 12.0022004748849, 13.8114282445895, 12.8369551364249, 12.7323831349638, 12.3396808884948, 12.0149402327226, 12.6635584420436, 12.4235076014228, 11.9549570315173, 13.4409060002897, 12.9170312613068, 12.264244991164, 11.7714193855219, 13.3354739632807, 12.7699848803271, 12.5459066828092, 12.4613098098632, 11.4333995712598, 12.0283032128855, 13.7678811619782, 12.3714808849695, 13.9189683908722, 13.2473786195409, 13.1453100361181, 11.7356142470369, 12.7676277274645, 12.5599723069956, 12.4793949663034, 12.1297974556491, 13.0978111319256, 13.0392443468801, 13.2176435453941, 12.5058641783154, 13.6118705002855, 13.1611585834267, 12.7769496572479, 12.5026804918528, 11.9467350509648, 11.7310971993245, 14.1453366899402, 12.1999838753462, 14.0027231122769, 13.4497928360407, 13.1668975070217, 11.5674756010673, 12.3716812396046, 12.7966600218629, 12.6479853591291, 12.6559911338199, 13.6871629086851, 12.4330300154409, 13.5880275495595, 12.6426803637164, 13.7416795676585, 13.1198857444851, 12.757699623203, 12.6091608582507, 11.8459902573225, 11.9082550718676, 14.2084309493861, 12.2508531713337, 14.0553859191303, 13.4799928833534, 13.1738486221746, 11.5711576276644, 11.7433728261473, 12.9206736745832, 12.8427107486683, 12.7312262862604, 13.7984582994721, 12.4537499264571, 13.4577847182877, 10.9381526563824, 13.5219924975122, 13.2760046639667, 12.7215116221973, 12.4719578790736, 11.6366473375562, 12.0336316213946, 14.1510201223552, 12.2907167804277, 13.9569695575498, 12.8802270461484, 13.0514896047969, 12.1982686654553, 12.390218217141, 12.3744645393049, 12.3670512418843, 12.2707417127123, 12.6827889931887, 12.6471913741617, 12.6708185654412, 11.8045864573924, 13.6103318321474, 12.8703230948884, 12.2524883361477, 12.5030492317648, 11.722158430223, 11.7495372778026, 13.8549572508313, 12.3038544724066, 13.8144598914325, 13.124747477139, 12.9129936477778, 11.3842958695344, 11.4613550598434, 12.6463475191885, 12.7274719065204, 12.4991210248854, 13.3795018817009, 12.0305299230504, 13.3204780449201, 12.339700379427, 13.6975041198189, 13.0449231226979, 12.495732751068, 12.4363261565451, 11.7776285409422, 11.8651059380164, 14.1864649677431, 12.1346164890333, 14.0277671261815, 13.3945472745435, 13.1450276730544, 11.60902217807, 11.6637730546006, 12.7878933762178, 12.6709337330988, 12.702720237891, 13.510591602696, 12.3044538980337, 13.5231541254579, 12.3136659630286, 13.6735626146457, 12.8976723534437, 12.3167857552449, 12.3730479718319, 11.6816215843634, 11.7892041963772, 14.369748140493, 12.118483795542, 13.7860276543208, 13.3546188628164, 12.9971212051494, 11.5685465585661, 11.4459278968422, 12.7528430716378, 12.583111723649, 12.5117894278597, 13.5065292541595, 12.1736161488926, 13.3933325534438, 12.3236321379257, 13.8925692938509, 13.0894912662911, 12.3222523271553, 12.6470677107343, 11.6693430713624, 11.8046058025688, 14.4578265066562, 12.1829264323573, 13.8854545853271, 13.2996712442842, 13.1533232373315, 11.5135475917404, 11.5558571090971, 12.8021602957608, 12.8674646356332, 12.6464109907194, 13.5359318483096, 12.4228993247371, 13.219568029273, 11.9365148739329, 13.1884262825137, 12.563023430668, 12.4024732061851, 12.1622998786858, 11.3837007892285, 11.1356249836087, 13.9292986262757, 11.8047730673392, 13.4326335000924, 13.1338287896392, 12.7183418655579, 11.3984491866873, 11.8708670648823, 12.2656116201197, 12.3821740513382, 12.2700886557593, 13.1744285699375, 11.8912376397451, 13.2754779219298, 11.848494756562, 13.4668115764945, 12.549475151538, 12.1944924829939, 11.9939688459917, 11.7548252994538, 11.6575308558746, 14.1072833283034, 12.1241498485081, 13.357396877626, 13.2275686867276, 12.7716808938313, 11.2217440956449, 11.7611948774575, 12.4249327394496, 12.4567612727625, 12.2329064234344, 13.0322616565486, 11.6655776654181, 13.0359433015693, 11.937047355669, 13.1898629482762, 12.5726422735212, 12.5726798348933, 12.0909543082315, 11.8674681189592, 11.5311979408258, 14.1079025881184, 12.0205646248482, 13.617551938839, 13.2095395505257, 12.9596814808092, 12.2836826322891, 11.2556875619804, 12.4723201958077, 12.3117960624811, 12.3818862803597, 13.1572185353487, 12.2431256151018, 13.4922237368159, 12.1148300133654, 13.4774992830802, 12.9032351231719, 12.5261684484828, 12.1680090746111, 11.6914778611649, 11.8445761386398, 14.0738028673682, 12.3062083526974, 13.7405725325613, 13.2771431432453, 13.0830786367249, 11.2764224615401, 11.4489469916569, 12.469414263259, 12.8290298438756, 12.6246029877788, 13.4429608457847, 12.3287094060473, 13.3070916630968, 12.3156318290213, 13.3759573769028, 13.0166377641429, 12.4343373737567, 12.5458096664222, 11.8686862807039, 11.8214792031079, 14.3859500668995, 12.2846385440803, 13.9523207676255, 13.2366017445919, 12.9750017918592, 11.4315828838983, 11.4442449351301, 12.6705951448781, 12.7629229563986, 12.5700489137345, 13.4811516414523, 12.3882357513808, 13.5845102587984, 11.5101434203814, 12.9568944767887, 12.6449488971803, 11.7432696576964, 12.1606425913456, 10.8451944246058, 11.4711540418912, 13.0811178880424, 11.3466402224124, 12.8078122697977, 12.3824277155332, 11.8245051088118, 11.033878184474, 11.6134203398484, 12.1002598796603, 11.6010638601163, 11.549676087612, 12.4239063633644, 12.5357439050719, 13.4735697824421, 11.3335038359535, 13.2968940101015, 13.1068219936151, 12.1794806726248, 12.3243265581966, 11.5472077060716, 11.3483409830804, 13.0774766210381, 12.4629207831702, 14.0390806403132, 13.1596011172655, 13.0332160174326, 12.2524695810672, 12.3798895472813, 12.6235470861505, 12.1640426630125, 12.2247950254833, 12.1406323294767, 12.5228400422577, 11.8250517714264, 11.1132901093811, 12.5916097057348, 12.7682983905802, 11.8864039121137, 11.800415946291, 10.6435474468033, 11.2961694830103, 13.2417747551292, 10.7287443118104, 13.404170034173, 12.8532798644789, 12.3239886701443, 10.973512218176, 11.5864957632374, 12.2402309354277, 11.3268467596918, 11.5474672736911, 12.2001649874358, 12.1831748852694, 12.5023904604911, 10.9034123616509, 12.3583158938657, 12.3587959337889, 11.7301769092212, 11.9442324465407, 10.8074253551814, 11.4222002342831, 13.1629365953438, 11.4363230784321, 13.3242302503518, 11.825408351783, 12.1196919865023, 11.2242419691237, 10.6983043452125, 12.0272045192824, 10.9922186890926, 11.4512095226744, 12.1385414491155, 11.7384113871431, 12.7372305172782, 10.9888921538183, 12.5188822409916, 12.4538892877753, 11.8367982320162, 11.3220428550523, 10.933896676187, 11.5720667977783, 13.565061350504, 11.3478477707327, 13.1533542126168, 12.0792202030357, 12.3439491548336, 11.2524967139418, 11.9652599132772, 11.9943533083329, 10.9670405274175, 11.232034876836, 12.0911972823639, 12.1339652204331, 12.0782768737071, 12.5514332584427, 13.565894217701, 13.0898225767442, 12.467035020547, 12.5190854862074, 11.8455961776098, 11.9199120363387, 14.1382753148858, 12.3058658550093, 13.9038980051527, 13.3503739271313, 13.1347468818123, 11.533726947342, 11.6471763690989, 12.6478858879989, 12.7649042838832, 12.6249969549332, 13.6145308119529, 12.249855820614, 13.3572000783088, 12.5206297602564, 13.6884401995588, 13.079944735895, 12.6003431342541, 12.5649432085615, 11.8381394475564, 11.8659160273488, 14.1719200458767, 12.2097078289852, 13.9183137876111, 13.4409762702364, 13.0921004117433, 11.5512224211466, 11.6979831890495, 12.6153325411278, 12.7364199530508, 12.672657414801, 13.7341092557813, 12.3840235703149, 13.5502770081792, 11.0050489030876, 12.7023018004907, 12.5637643204045, 11.1270941962539, 11.5450974837904, 10.3314594722364, 11.3360298515111, 13.6007575276288, 11.1306226931802, 13.1897734420897, 12.0915366317113, 12.0223434380893, 11.380923602636, 11.6363527174703, 12.5823228117857, 10.9131612626126, 11.18209276258, 12.2160627054488, 11.9733332212548, 11.8309676079323, 11.8529979259146, 12.7108501720844, 12.9904020236671, 11.4126749262747, 11.6900910928028, 10.8073242283468, 12.1482455913138, 13.679670859292, 11.3647244132332, 12.985127742006, 12.4405431401608, 12.0490060232124, 11.7470607252347, 10.3178402051276, 13.0201486414481, 11.353545086395, 13.1204863728359, 12.5779113001437, 12.832790870056, 12.6479242366474, 12.0025232704845, 13.1412830628808, 12.877747515031, 11.7570340450347, 11.6051455705384, 10.1065889049367, 11.4827453953986, 13.6828461814584, 11.7332368414748, 13.7208935360265, 12.6657752786406, 12.3302022389033, 11.6014534350454, 11.425885999946, 12.5045811831183, 11.0303105270521, 11.5207074843514, 12.7776857888241, 12.0314273548905, 11.9320983536001, 11.9594632629158, 13.5167917160384, 13.2565688989119, 12.2823047562712, 11.9193618469126, 11.2849132145832, 12.5313752886791, 13.716349075719, 11.7088759586268, 14.1023868622655, 13.450264964694, 13.0639504259696, 11.388220698149, 12.5328149274463, 12.9696769533078, 11.6675409186907, 12.3530836246967, 12.7069826256088, 12.1446835229911, 13.1057643113949, 11.3881892507981, 13.030114645765, 13.1178450546645, 12.0349854340768, 12.4530706934587, 11.1318052281864, 12.1424493549755, 13.5733466271712, 10.9920968341487, 13.3673937324289, 12.8448919690779, 12.3744813875397, 11.6462105478976, 11.3803744435425, 12.8673797281983, 11.7521495572491, 11.7607384043362, 12.4308141310469, 12.3010249057781, 12.2347972789795, 12.5964841565426, 13.7027241674588, 13.2089143041522, 12.8048355040074, 12.5774568422762, 11.884061700795, 11.7978335998857, 14.1378067603058, 12.2216507777382, 13.9514631932932, 13.358050878984, 13.109211083692, 11.6056317358215, 11.663793762001, 12.804595533462, 12.9387305834113, 12.7035478170655, 13.7033606631956, 12.4255368588406, 13.6332981876328, 10.966376439193, 12.3910699854738, 12.3847361632089, 11.4782998839912, 11.9369456555451, 10.3773978003292, 11.6827095525118, 13.0227834634118, 12.0149192231809, 13.0290003842661, 11.9806483358603, 11.9778758082236, 11.0345209694614, 11.7174456170697, 11.9931807583945, 11.91013445003, 11.0777179520053, 11.6575502862021, 11.7234253422465, 11.983396252915, 10.9401815777881, 12.8897542175903, 12.7838939031615, 11.6891514994582, 12.1761459083228, 10.7804594766759, 11.7200262669945, 13.250931943692, 11.4306099502856, 12.8627963120138, 12.3091211528353, 12.3387405488775, 10.2956639653444, 11.6109025606084, 11.9330604584893, 11.315884485154, 11.7597563595566, 12.0529326512517, 11.5204323063685, 12.5542407400717, 10.8901981999531, 12.6716565800448, 12.1850327902115, 12.1184654996523, 11.5935856373925, 10.7679568098204, 12.684661604491, 13.5322869899053, 10.8375973140203, 13.1341414383696, 12.4713608228455, 12.3996735569132, 11.2404611536777, 11.1879960215822, 11.8149699366855, 11.5263753425494, 11.550562684474, 11.881954512162, 11.960385266623, 11.9191343945325, 11.9275162461025, 14.4300776220174, 13.2743343624162, 12.5004372882726, 12.3818166726893, 11.3172066364164, 12.4489415414244, 14.5518055941913, 12.3814877434305, 13.6640347229998, 13.004009818708, 12.98641916686, 11.5971294050632, 13.0919638244835, 12.5969370593044, 12.523375648891, 11.8048025943535, 12.6716475075252, 12.2891516762832, 12.8799082897072, 12.5791235755281, 13.7530238239126, 13.2996352538429, 12.8330831864878, 12.6180370732925, 11.9213345604736, 11.9242170074846, 14.407275051282, 12.5172235205991, 14.0614967570622, 13.5507890858319, 13.3076807320896, 11.7858001461438, 11.8584538361961, 12.8273710855852, 12.9510255323964, 12.6674167292068, 13.8345005412558, 12.5315423088742, 13.5261250542701, 11.2378426019217, 13.317780686254, 12.8520367085728, 13.2314690414045, 11.6164657431659, 10.5998799720902, 11.4357432274465, 14.0417669865121, 12.1607754042512, 13.2466054940719, 12.797111452771, 12.4047191492522, 11.7833124729873, 12.097653018214, 11.7951612085733, 12.3041048876447, 11.1508412920931, 12.4501892541409, 12.4610946871315, 12.3261738189918, 10.8552645219384, 12.9975652341808, 12.6677845732241, 11.7652169789635, 11.6827253641907, 11.110919985169, 11.8897433664123, 13.3123005479684, 11.7760781785571, 13.1638079938049, 12.7956196208381, 12.4920841796339, 11.7252484518286, 11.8268745473843, 11.3318932693566, 11.3357737895019, 11.535325758363, 12.5461021275932, 12.3602886322043, 12.8926798402372), dim = c(20L, 100L))) par(mar=c(5, 4, 4, 2) - 1.9) by(res.over$res.plot, INDICES = res.over$res.plot$var, FUN = function(xx) { plot(x = xx[, "trueval"], y = xx[, "xbar"], col = as.character(xx[, "col"]), xlab = "observed values", ylab = "imputed values", main = paste(xx[1, "var"], " (cov =", xx[1, "pct"],"%)"), ylim = c(min(xx[, "binf"], na.rm = T), max(xx[, "bsup"], na.rm = T))) abline(0, 1) segments(x0 = xx[, "trueval"], x1 = xx[, "trueval"], y0 = xx[, "binf"], y1 = xx[, "bsup"], col = as.character(xx[, "col"])) legend("bottomright", legend = c("0-0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "0.8-1"), col = c("blue", "green", heat.colors(3)[c(3, 2, 1)]), bty = "n", lty = 1, horiz =TRUE, cex = .8, lwd = 0.4) }) ``` The graphic represents the observed values (x-axis) versus the 90% prediction interval (y-axis) for these values. Various colours are used according to the proportion of observed value used to build the interval (which depends on the missing data pattern on each individual). If the conditional model fits the data well, then 90% of intervals cutting the first bisector (indicating the observed values are gathered in the interval) are expected. Here, the imputation model for the `alco` variable does not fit the observed data very well since the coverage is `r round(res.over$res.plot$pct[1],2)`%. The fit could be improved by investigating FCS imputation methods. ## Number of imputed data sets (`m`) {#nbtab} The number of imputed data sets (`m`) should be sufficiently large to improve the partition accuracy. The `choosem` function can be used to check if this number is suitable. This function computes the consensus partition by considering only the first imputed data sets. By this way, a sequence of `m` consensus partitions is obtained. Then, the rand index between successive partitions is computed and reported in a graph. The rand index measures proximity between partitions. If the rand index between the last consensus partitions of the sequence reaches its maximum values (1), then the number of imputed data sets does not modify the consensus partition and this number can be considered as sufficiently large. ```{r, fig.height = 4, fig.width = 4, fig.align = "center"} res.m <- choosem(res.pool.kmeans) ``` Here, the rand index is equal to 1 at `m = 20` imputed data sets, meaning the consensus partition remains unchanged between `m = 19` and `m = 20`, and so probably beyond. Consequently, `m = 20` was a suitable choice. ## Number of clusters {#nbclust} In practice, the number of clusters is generally unknown. A way to tune this number consists in inspecting the instability according to the number of clusters (@Fang12). The more stable partition could be retained. The `choosenbclust` function browses a grid of values for the number of clusters and for each one imputes the data and computes the instability. ```{r,eval=FALSE} res.nbclust <- choosenbclust(res.pool.kmeans) ``` ```{r,fig.height = 4, fig.width = 4, fig.align = "center", echo=FALSE} res.nbclust <- list(nb.clust = 3L, crit = c(`2` = 0.114077699448272, `3` = 0.0973099188526245, `4` = 0.124537091242568, `5` = 0.165533521629721)) plot(as.numeric(names(res.nbclust$crit)),res.nbclust$crit,xlab="nb clust",ylab="Total instability",type="b",xaxt = "n") axis(1, as.numeric(names(res.nbclust$crit)), as.numeric(names(res.nbclust$crit))) ``` On the wine data set, the number of clusters suggested is clearly 3. # Cluster description After building a partition with missing values, a description of each cluster can be performed variable per variable as follows: ```{r, fig.width=7,fig.height=5,fig.align="center",results=FALSE, warning=FALSE, message=FALSE} require(reshape2) require(ggplot2) dat.m = melt(data.frame(wine.na, part = as.factor(part)), id.var=c("part")) ggplot(dat.m, aes(part, value, col = part)) + facet_wrap(variable~., scales = "free_y") + geom_boxplot(width = 0.7) ``` or by investigating the relationships for each pair of variables ```{r,fig.width=7,fig.height=5,fig.align="center", message=FALSE} library(VIM) pairsVIM(wine.na, pch = 21, bg = c("red", "green3", "blue")[part], cex = .2, gap = 0) ``` or from a multivariate point of view by using the principal component analysis, as proposed in the R packages `FactoMineR` (@FactoMineR) and `missMDA` (@missMDA) ```{r, fig.width=7, fig.height=5, fig.align="center"} library(FactoMineR) library(missMDA) # merge the partition variable with the incomplete data set data.pca <- cbind.data.frame(class = factor(part, levels = seq(nb.clust)), wine.na) # perform PCA with missing values by specifying where is the partition variable res.imputepca <- imputePCA(data.pca, quali.sup = 1) res.pca <- PCA(res.imputepca$completeObs, quali.sup = 1, graph = FALSE) plot(res.pca, habillage = 1) ``` Finally, the consensus partition can be analysed by computing external clustering comparison indices as proposed in the `clusterCrit` R package (@clusterCrit) as follows: ```{r} library(clusterCrit) res.crit <- extCriteria(part, ref, crit = "all") round(unlist(res.crit), 2) ``` # References