\documentclass{chapman}
%%% copy Sweave.sty definitions
%%% keeps `sweave' from adding `\usepackage{Sweave}': DO NOT REMOVE
%\usepackage{Sweave}
\RequirePackage[T1]{fontenc}
\RequirePackage{graphicx,ae,fancyvrb}
\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}
\usepackage{relsize}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
fontshape=it,
fontsize=\relsize{-1}}
\DefineVerbatimEnvironment{Scode}{Verbatim}{}
\newenvironment{Schunk}{}{}
%%% environment for raw output
\newcommand{\SchunkRaw}{\renewenvironment{Schunk}{}{}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
fontshape=it,
fontsize=\small}
\rawSinput
}
%%% environment for labeled output
\newcommand{\nextcaption}{}
\newcommand{\SchunkLabel}{
\renewenvironment{Schunk}{\begin{figure}[ht] }{\caption{\nextcaption}
\end{figure} }
\DefineVerbatimEnvironment{Sinput}{Verbatim}{frame = topline}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{frame = bottomline,
samepage = true,
fontfamily=courier,
fontshape=it,
fontsize=\relsize{-1}}
}
%%% S code with line numbers
\DefineVerbatimEnvironment{Sinput}
{Verbatim}
{
%% numbers=left
}
\newcommand{\numberSinput}{
\DefineVerbatimEnvironment{Sinput}{Verbatim}{numbers=left}
}
\newcommand{\rawSinput}{
\DefineVerbatimEnvironment{Sinput}{Verbatim}{}
}
%%% R / System symbols
\newcommand{\R}{\textsf{R}}
\newcommand{\rR}{{R}}
\renewcommand{\S}{\textsf{S}}
\newcommand{\SPLUS}{\textsf{S-PLUS}}
\newcommand{\rSPLUS}{{S-PLUS}}
\newcommand{\SPSS}{\textsf{SPSS}}
\newcommand{\EXCEL}{\textsf{Excel}}
\newcommand{\ACCESS}{\textsf{Access}}
\newcommand{\SQL}{\textsf{SQL}}
%%\newcommand{\Rpackage}[1]{\hbox{\rm\textit{#1}}}
%%\newcommand{\Robject}[1]{\hbox{\rm\texttt{#1}}}
%%\newcommand{\Rclass}[1]{\hbox{\rm\textit{#1}}}
%%\newcommand{\Rcmd}[1]{\hbox{\rm\texttt{#1}}}
\newcommand{\Rpackage}[1]{\index{#1 package@{\fontseries{b}\selectfont #1} package} {\fontseries{b}\selectfont #1}}
\newcommand{\rpackage}[1]{{\fontseries{b}\selectfont #1}}
\newcommand{\Robject}[1]{\texttt{#1}}
\newcommand{\Rclass}[1]{\index{#1 class@\textit{#1} class}\textit{#1}}
\newcommand{\Rcmd}[1]{\index{#1 function@\texttt{#1} function}\texttt{#1}}
\newcommand{\Roperator}[1]{\texttt{#1}}
\newcommand{\Rarg}[1]{\texttt{#1}}
\newcommand{\Rlevel}[1]{\texttt{#1}}
%%% other symbols
\newcommand{\file}[1]{\hbox{\rm\texttt{#1}}}
%%\newcommand{\stress}[1]{\index{#1}\textit{#1}}
\newcommand{\stress}[1]{\textit{#1}}
\newcommand{\booktitle}[1]{\textit{#1}} %%'
%%% Math symbols
\usepackage{amstext}
\usepackage{amsmath}
\newcommand{\E}{\mathsf{E}}
\newcommand{\Var}{\mathsf{Var}}
\newcommand{\Cov}{\mathsf{Cov}}
\newcommand{\Cor}{\mathsf{Cor}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\y}{\mathbf{y}}
\renewcommand{\a}{\mathbf{a}}
\newcommand{\W}{\mathbf{W}}
\newcommand{\C}{\mathbf{C}}
\renewcommand{\H}{\mathbf{H}}
\newcommand{\X}{\mathbf{X}}
\newcommand{\B}{\mathbf{B}}
\newcommand{\V}{\mathbf{V}}
\newcommand{\I}{\mathbf{I}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\bS}{\mathbf{S}}
\newcommand{\N}{\mathcal{N}}
\renewcommand{\L}{L}
\renewcommand{\P}{\mathsf{P}}
\newcommand{\K}{\mathbf{K}}
\newcommand{\m}{\mathbf{m}}
\newcommand{\argmin}{\operatorname{argmin}\displaylimits}
\newcommand{\argmax}{\operatorname{argmax}\displaylimits}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bbeta}{\mathbf{\beta}}
%%% links
\usepackage{hyperref}
\hypersetup{%
pdftitle = {A Handbook of Statistical Analyses Using R (3rd Edition)},
pdfsubject = {Book},
pdfauthor = {Torsten Hothorn and Brian S. Everitt},
colorlinks = {black},
linkcolor = {black},
citecolor = {black},
urlcolor = {black},
hyperindex = {true},
linktocpage = {true},
}
%%% captions & tables
%% : conflics with figure definition in chapman.cls
%%\usepackage[format=hang,margin=10pt,labelfont=bf]{caption}
%%
\usepackage{longtable}
\usepackage[figuresright]{rotating}
%%% R symbol in chapter 1
\usepackage{wrapfig}
%%% Bibliography
\usepackage[round,comma]{natbib}
\renewcommand{\refname}{References \addcontentsline{toc}{chapter}{References}}
\citeindexfalse
%%% texi2dvi complains that \newblock is undefined, hm...
\def\newblock{\hskip .11em plus .33em minus .07em}
%%% Example sections
\newcounter{exercise}[chapter]
\setcounter{exercise}{0}
\newcommand{\exercise}{\stepcounter{exercise} \item{Ex.~\arabic{chapter}.\arabic{exercise} }}
%% URLs
\newcommand{\curl}[1]{\begin{center} \url{#1} \end{center}}
%%% for manual corrections
%\renewcommand{\baselinestretch}{2}
%%% plot sizes
\setkeys{Gin}{width=0.95\textwidth}
%%% color
\usepackage{color}
%%% hyphenations
\hyphenation{drop-out}
\hyphenation{mar-gi-nal}
%%% new bidirectional quotes need
\usepackage[utf8]{inputenc}
%\usepackage{setspace}
\definecolor{sidebox_todo}{rgb}{1,1,0.2}
\newcommand{\todo}[1]{
\hspace{0pt}%
\marginpar{%
\fcolorbox{black}{sidebox_todo}{%
\parbox{\marginparwidth} {
\raggedright\sffamily\footnotesize{TODO: #1}%
}
}%
}
}
\begin{document}
%% Title page
\title{A Handbook of Statistical Analyses Using \R{} --- 3rd Edition}
\author{Torsten Hothorn and Brian S. Everitt}
\maketitle
%%\VignetteIndexEntry{Chapter Conditional Inference}
%%\VignetteDepends{coin}
\setcounter{chapter}{3}
\SweaveOpts{prefix.string=figures/HSAUR,eps=FALSE,keep.source=TRUE}
<>=
rm(list = ls())
s <- search()[-1]
s <- s[-match(c("package:base", "package:stats", "package:graphics", "package:grDevices",
"package:utils", "package:datasets", "package:methods", "Autoloads"), s)]
if (length(s) > 0) sapply(s, detach, character.only = TRUE)
if (!file.exists("tables")) dir.create("tables")
if (!file.exists("figures")) dir.create("figures")
set.seed(290875)
options(prompt = "R> ", continue = "+ ",
width = 63, # digits = 4,
show.signif.stars = FALSE,
SweaveHooks = list(leftpar = function()
par(mai = par("mai") * c(1, 1.05, 1, 1)),
bigleftpar = function()
par(mai = par("mai") * c(1, 1.7, 1, 1))))
HSAURpkg <- require("HSAUR3")
if (!HSAURpkg) stop("cannot load package ", sQuote("HSAUR3"))
rm(HSAURpkg)
### hm, R-2.4.0 --vanilla seems to need this
a <- Sys.setlocale("LC_ALL", "C")
###
book <- TRUE
refs <- cbind(c("AItR", "DAGD", "SI", "CI", "ANOVA", "MLR", "GLM",
"DE", "RP", "GAM", "SA", "ALDI", "ALDII", "SIMC", "MA", "PCA",
"MDS", "CA"), 1:18)
ch <- function(x) {
ch <- refs[which(refs[,1] == x),]
if (book) {
return(paste("Chapter~\\\\ref{", ch[1], "}", sep = ""))
} else {
return(paste("Chapter~", ch[2], sep = ""))
}
}
if (file.exists("deparse.R"))
source("deparse.R")
setHook(packageEvent("lattice", "attach"), function(...) {
lattice.options(default.theme =
function()
standard.theme("pdf", color = FALSE))
})
@
\pagestyle{headings}
<>=
book <- FALSE
@
\chapter[Conditional Inference]{Conditional Inference: Guessing Lengths, Suicides,
Gastrointestinal Damage, and Newborn Infants \label{CI}}
<>=
data("roomwidth", package = "HSAUR3")
nobs <- table(roomwidth$unit)
ties <- tapply(roomwidth$width, roomwidth$unit, function(x) length(x) - length(unique(x)))
library("coin")
@
\section{Introduction}
\section{Conditional Test Procedures}
\section{Analysis Using \R{}}
\subsection{Estimating the Width of a Room Revised}
The unconditional analysis of the room width estimated by two groups of students in
\Sexpr{ch("SI")} led to the conclusion that the estimates in meters are
slightly larger than the estimates in feet. Here, we reanalyze these data in
a conditional framework. First, we convert meters into feet and store the
vector of observations in a variable \Robject{y}:
<>=
data("roomwidth", package = "HSAUR3")
convert <- ifelse(roomwidth$unit == "feet", 1, 3.28)
feet <- roomwidth$unit == "feet"
meter <- !feet
y <- roomwidth$width * convert
@
The test statistic is simply the difference in means
<>=
T <- mean(y[feet]) - mean(y[meter])
T
@
In order to approximate the conditional distribution of the test statistic
$T$ we compute $9999$ test statistics for shuffled $y$ values. A
permutation of the $y$ vector can be obtained from the \Rcmd{sample}
function.
<>=
meandiffs <- double(9999)
for (i in 1:length(meandiffs)) {
sy <- sample(y)
meandiffs[i] <- mean(sy[feet]) - mean(sy[meter])
}
@
\begin{figure}
\begin{center}
<>=
hist(meandiffs)
abline(v = T, lty = 2)
abline(v = -T, lty = 2)
@
\caption{An approximation for the conditional distribution of the difference of mean
\Robject{roomwidth} estimates in the feet and meters group under
the null hypothesis. The vertical lines show the
negative and positive absolute value of the test statistic
$T$ obtained from the original data. \label{CI:perm}}
\end{center}
\end{figure}
The distribution of the test statistic $T$ under the null hypothesis of
independence of room width estimates and groups is depicted in Figure~\ref{CI:perm}.
Now, the value of the test statistic $T$ for the original
unshuffled data can be compared with the distribution of $T$ under the null
hypothesis (the vertical lines in Figure~\ref{CI:perm}). The $p$-value,
i.e., the proportion of
test statistics $T$ larger than \Sexpr{-round(T, 3)} or smaller than
\Sexpr{round(T, 3)}, is
<>=
greater <- abs(meandiffs) > abs(T)
mean(greater)
@
with a confidence interval of
<>=
binom.test(sum(greater), length(greater))$conf.int
@
Note that the approximated conditional $p$-value is roughly the same as the
$p$-value reported by the $t$-test in \Sexpr{ch("SI")}.
\renewcommand{\nextcaption}{\R{} output of the exact permutation test applied to
the \Robject{roomwidth} data. \label{CI-roomwidth-p-fig}}
\SchunkLabel
<>=
library("coin")
independence_test(y ~ unit, data = roomwidth,
distribution = exact())
@
\SchunkRaw
\renewcommand{\nextcaption}{\R{} output of the exact conditional Wilcoxon
rank sum test applied to the \Robject{roomwidth} data. \label{CI-roomwidth-w-fig}}
\SchunkLabel
<>=
wilcox_test(y ~ unit, data = roomwidth,
distribution = exact())
@
\SchunkRaw
\subsection{Crowds and Threatened Suicide}
\renewcommand{\nextcaption}{\R{} output of Fisher's exact test for the %'
\Robject{suicides} data.
\label{CI-suicides-fig}}
\SchunkLabel
<>=
data("suicides", package = "HSAUR3")
fisher.test(suicides)
@
\SchunkRaw
<>=
ftp <- round(fisher.test(suicides)$p.value, 3)
ctp <- round(chisq.test(suicides)$p.value, 3)
@
\subsection{Gastrointestinal Damage} \label{CI:Lanza}
Here we are interested in the comparison of two groups of patients,
where one group received a placebo and the other one Misoprostol. In the
trials shown here, the response variable is measured on an ordered scale
-- see Table~\ref{CI:scores}.
Data from four clinical studies are available and thus the
observations are naturally grouped together. From the \Rclass{data.frame}
\Robject{Lanza} we can construct a three-way table as follows:
<>=
data("Lanza", package = "HSAUR3")
xtabs(~ treatment + classification + study, data = Lanza)
@
<>=
options(width = 65)
@
For the first study, the null hypothesis of independence of treatment and
gastrointestinal damage, i.e., of no treatment effect of Misoprostol, is
tested by
<>=
library("coin")
cmh_test(classification ~ treatment, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)),
subset = Lanza$study == "I")
@
and, by default, the conditional distribution is approximated by the
corresponding limiting distribution. The $p$-value indicates a strong
treatment effect. For the second study, the asymptotic $p$-value is a little
bit larger:
<>=
cmh_test(classification ~ treatment, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)),
subset = Lanza$study == "II")
@
and we make sure that the implied decision is correct by calculating a
confidence interval for the exact $p$-value:
<>=
p <- cmh_test(classification ~ treatment, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)),
subset = Lanza$study == "II", distribution =
approximate(19999))
pvalue(p)
@
The third and fourth study indicate a strong treatment effect as well:
<>=
cmh_test(classification ~ treatment, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)),
subset = Lanza$study == "III")
cmh_test(classification ~ treatment, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)),
subset = Lanza$study == "IV")
@
At the end, a separate analysis for each study is unsatisfactory. Because
the design of the four studies is the same, we can use \Robject{study} as a
block variable and perform a global linear-association test investigating
the treatment effect of Misoprostol in all four studies. The block variable
can be incorporated into the \Rclass{formula} by the \texttt{|} symbol.
<>=
cmh_test(classification ~ treatment | study, data = Lanza,
scores = list(classification = c(0, 1, 6, 17, 30)))
@
Based on this result, a strong treatment effect can be established.
\subsection{Teratogenesis}
\index{Teratogenesis}
In this example, the medical doctor (MD) and the research assistant (RA)
assessed the number of anomalies ($0, 1, 2$ or $3$) for each of $395$ babies:
<>=
anomalies <- c(235, 23, 3, 0, 41, 35, 8, 0,
20, 11, 11, 1, 2, 1, 3, 1)
anomalies <- as.table(matrix(anomalies,
ncol = 4, dimnames = list(MD = 0:3, RA = 0:3)))
anomalies
@
We are interested in testing whether the number of anomalies assessed by the
medical doctor differs structurally from the number reported by the research
assistant. Because we compare \stress{paired} observations, i.e., one pair
of measurements for each newborn, a test of marginal homogeneity (a
generalization of McNemar's test, \Sexpr{ch("SI")}) needs to be applied: %%'
%\newpage
<>=
mh_test(anomalies)
@
The $p$-value indicates a deviation from the null hypothesis. However, the
levels of the response are not treated as ordered. Similar to the analysis
of the gastrointestinal damage data above, we can take this information
into account by the definition of an appropriate score. Here, the number of
anomalies is a natural choice:
<>=
mh_test(anomalies, scores = list(response = c(0, 1, 2, 3)))
@
In our case, one can conclude that the
assessment of the number of anomalies differs between the medical doctor
and the research assistant.
%%\bibliographystyle{LaTeXBibTeX/refstyle}
%%\bibliography{LaTeXBibTeX/HSAUR}
\end{document}