---
title: "MPN: Most Probable Number for Serial Dilutions"
author: "Martine Ferguson & John Ihrie"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MPN: Most Probable Number for Serial Dilutions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## Introduction

The **MPN** package computes the Most Probable Number (i.e. microbial density)
and other microbial enumeration metrics derived from serial dilutions.


## `mpn()`

**MPN** includes the `mpn()` function to estimate the Most Probable Number
(*MPN*), its variance and confidence interval, and Blodgett's
(*2, 3, 4*) Rarity Index (*RI*).

The user inputs the number of dilutions, number of tubes, number of positive
tubes, amount of inocula, confidence level, and confidence interval method.


## Maximum Likelihood Estimation

As discussed in the references, *MPN* is estimated by maximizing likelihood.
Combining the notaton of Blodgett (*2*) and Jarvis et al. (*7*), we write the
likelihood function as:

$$L = L(\lambda; x_i, n_i, z_i, i = 1,...,k)
    = \prod_{i=1}^k \binom{n_i}{x_i} {(1-exp(-{\lambda}{z_i}))} ^ {x_i}
    {(exp(-{\lambda}{z_i}))} ^ {n_i-x_i}$$

where

- $\lambda$ is the microbial density (concentration) to be estimated
- $k$ is the number of dilution levels
- $x_i$ is the number of positive tubes at the $i^{th}$ dilution level
- $n_i$ is the total number of tubes at the $i^{th}$ dilution level
- $z_i$ is the amount of inoculum per tube at the $i^{th}$ dilution level

As an *R* function:

```{r Define likelihood R function}
#likelihood
L <- function(lambda, positive, tubes, amount) {
  binom_coef <- choose(tubes, positive)
  exp_term   <- exp(-lambda * amount)
  prod(binom_coef * ((1 - exp_term) ^ positive) * exp_term ^ (tubes - positive))
}
L_vec <- Vectorize(L, "lambda")
```

As is typical of maximum likelihood approaches, `mpn()` uses the score function
(derivative of the log-likelihood) to solve for $\hat{\lambda}$, the maximum
likelihood estimate (MLE) of $\lambda$ (i.e., the point estimate of *MPN*).
However, let's demonstrate what is happening in terms of the likelihood function
itself. Assume we have 10g of undiluted inoculum in each of 3 tubes. Now we use
a 10-fold dilution twice (i.e., the relative dilution levels are 1, .1, .01).
Also assume that exactly 1 of the 3 tubes is positive at each dilution level:

```{r Calculate MPN estimate}
#MPN calculation
library(MPN)
my_positive <- c(1, 1, 1) #xi
my_tubes    <- c(3, 3, 3) #ni
my_amount   <- 10 * c(1, .1, .01)  #zi
(my_mpn <- mpn(my_positive, my_tubes, my_amount))
```

If we plot the likelihood function, we see that $\hat{\lambda}$ maximizes the
likelihood:

```{r Plot likelihood and MPN estimate}
my_mpn$MPN
lambda <- seq(0, 0.5, by = .001)
my_L   <- L_vec(lambda, my_positive, my_tubes, my_amount)
plot(lambda, my_L, type = "l", ylab = "Likelihood", main = "Maximum Likelihood")
abline(v = my_mpn$MPN, lty = 2, col = "red")
```

If none of the tubes are positive, the MLE is zero:

```{r No positive tubes}
no_positive <- c(0, 0, 0) #xi
(mpn_no_pos <- mpn(no_positive, my_tubes, my_amount)$MPN)
L_no_pos <- L_vec(lambda, no_positive, my_tubes, my_amount)
plot(lambda, L_no_pos, type = "l", xlim = c(-0.02, 0.2), ylab = "Likelihood",
     main = "No Positives")
abline(v = mpn_no_pos, lty = 2, col = "red")
```

If all of the tubes are positive, then no finite MLE exists:

```{r All positive tubes}
all_positive <- c(3, 3, 3) #xi
mpn(my_tubes, all_positive, my_amount)$MPN
lambda <- seq(0, 200, by = .1)
L_all_pos <- L_vec(lambda, all_positive, my_tubes, my_amount)
plot(lambda, L_all_pos, type = "l", xlim = c(0, 100), ylim = c(0, 1.1),
     ylab = "Likelihood", main = "All Positives")
abline(h = 1, lty = 2)
```

From a practical perspective, if all the tubes are positive, then the scientist
should probably further dilute the sample until some tubes are negative.


## Bias Adjustment
`mpn()` also returns a bias-adjusted version (*9, 5, 6*) of the point
estimate:

```{r Bias adjustment}
my_mpn$MPN
my_mpn$MPN_adj
```


## Confidence Intervals

As discussed in the references, many different confidence intervals (CIs) can be
calculated for the Most Probable Number. Currently, `mpn()` computes a CI using
the approach of Jarvis et al. (*7*) or the likelihood ratio approach of
Ridout (*8*).
However, since these approaches rely on large-sample theory, the results are
more reliable for larger experiments.

```{r Calculate confidence intervals}
my_positive <- c(1, 1, 1)
my_tubes    <- c(3, 3, 3)
my_amount   <- 10 * c(1, .1, .01)
mpn(my_positive, my_tubes, my_amount)  #Jarvis approach
mpn(my_positive, my_tubes, my_amount, CI_method = "LR")  #likelihood ratio
```


## Rarity Index

As Jarvis (7) explains, Blodgett's (*2, 3, 4*) Rarity Index is a ratio of
two likelihoods. The likelihood in the numerator is for the actual results
(i.e., evaluated at the *MPN* point estimate). The likelihood in the denominator
is for the (hypothetical) results that would have given the largest possible
likelihood. So *RI* is larger than 0 and at most 1. Values of *RI* that are very
small are unlikely; therefore, the results should be regarded with suspicion.


## Conclusion

The **MPN** package is more versatile than static Most Probable Number tables in
that the number of tubes can vary across dilution levels, the user can choose
any number (or levels) of dilutions, and the confidence level can be changed.
Also, the Rarity Index, which quantifies the validity of the results, is
included.


---------

### References

1. Bacteriological Analytical Manual, 8th Edition, Appendix 2,
<https://www.fda.gov/food/laboratory-methods-food/bam-appendix-2-most-probable-number-serial-dilutions>

2. Blodgett RJ (2002). "Measuring improbability of outcomes from a serial
dilution test." *Communications in Statistics: Theory and Methods*, 31(12),
2209-2223.

3. Blodgett RJ (2005). "Serial dilution with a confirmation step."
*Food Microbiology*, 22(6), 547-552.

4. Blodgett RJ (2010). "Does a serial dilution experiment's model agree with its
outcome?" *Model Assisted Statistics and Applications*, 5(3), 209-215.

5. Haas CN (1989). "Estimation of microbial densities from dilution
count experiments" *Applied and Environmental Microbiology* 55(8), 1934-1942.

6. Haas CN, Rose JB, Gerba CP (2014). "Quantitative microbial risk assessment,
Second Ed." *John Wiley & Sons, Inc.*, ISBN 978-1-118-14529-6.

7. Jarvis B, Wilrich C, Wilrich P-T (2010). "Reconsideration of the derivation
of Most Probable Numbers, their standard deviations, confidence bounds and
rarity values." *Journal of Applied Microbiology*, 109, 1660-1667.

8. Ridout MS (1994). "A Comparison of Confidence Interval Methods for Dilution
Series Experiments." *Biometrics*, 50(1), 289-296.

9. Salama IA, Koch GG, Tolley DH (1978). "On the estimation of the
most probable number in a serial dilution technique." *Communications in
Statistics - Theory and Methods*, 7(13), 1267-1281.