%% This document created by Scientific Word (R) Version 3.5
\documentclass{book}%
\usepackage{graphicx}
\usepackage{amsmath}%
\usepackage{amsfonts}%
\usepackage{amssymb}
%TCIDATA{OutputFilter=latex2.dll}
%TCIDATA{CSTFile=LaTeX Book.cst}
%TCIDATA{Created=Mon Apr 10 14:15:33 2000}
%TCIDATA{LastRevised=Tuesday, January 30, 2001 10:53:34}
%TCIDATA{}
%TCIDATA{}
%TCIDATA{Language=American English}
\newtheorem{theorem}{Theorem}
\newtheorem{acknowledgement}[theorem]{Acknowledgement}
\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{case}[theorem]{Case}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{conclusion}[theorem]{Conclusion}
\newtheorem{condition}[theorem]{Condition}
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{criterion}[theorem]{Criterion}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{solution}[theorem]{Solution}
\newtheorem{summary}[theorem]{Summary}
\newenvironment{proof}[1][Proof]{\textbf{#1.} }{\ \rule{0.5em}{0.5em}}
\begin{document}
\title{Graduate Lectures and Problems in Quality Control and Engineering Statistics:\\Theory and Methods\\\ \ \\\ \ \\{\normalsize To Accompany}\\\textit{{\normalsize Statistical Quality Assurance Methods for Engineers}}\\{\normalsize by}\\{\normalsize Vardeman and Jobe}\\\ \\\ }
\author{Stephen B. Vardeman}
\date{V2.0:\ January 2001\\
\ \\
\ \\
\ \\
\ \\
\ \\
\ \ \\
\ \\
\copyright\ Stephen Vardeman 2001. \ \ Permission to copy for educational
purposes granted by the author, subject to the requirement that this title
page be affixed to each copy (full or partial) produced.}
\maketitle
\tableofcontents
\mainmatter
\chapter{Measurement and Statistics}
V\&J \S2.2 presents an introduction to the topic of measurement and the
relevance of the subject of statistics to the measurement enterprise. This
chapter expands somewhat on the topics presented in V\&J and raises some
additional issues.
Note that V\&J equation (2.1) and the discussion on page 19 of V\&J are
central to the role of statistics in describing measurements in engineering
and quality assurance. Much of Stat 531 concerns ``process variation.'' The
discussion on and around page 19 points out that variation in measurements
from a process will include both components of ``real'' process variation
\textit{and} measurement variation.
\section{Theory for Range-Based Estimation of Variances}
Suppose that $X_{1},X_{2},\ldots,X_{n}$ are iid Normal $(\mu$,$\sigma^{2})$
random variables and let
\begin{align*}
R & = \max X_{i}-\min X_{i}\\
& = \max(X_{i}-\mu)-\min(X_{i}-\mu)\\
& = \sigma\left( \max\left( \frac{X_{i}-\mu}{\sigma}\right) -\min\left(
\frac{X_{i}-\mu}{\sigma}\right) \right) \\
& =\sigma\left( \max Z_{i}-\min Z_{i}\right)
\end{align*}
where $Z_{i}=(X_{i}-\mu)/\sigma$. Then $Z_{1},Z_{2},\ldots,Z_{n}$ are iid
standard normal random variables. So for purposes of studying the distribution
of the range of iid normal variables, it suffices to study the standard normal
case. (One can derive ``general $\sigma$'' facts from the ``$\sigma=1$'' facts
by multiplying by $\sigma$.)
Consider first the matter of the finding the mean of the range of $n$ iid
standard normal variables, $Z_{1},\ldots,Z_{n}$. Let
\[
U=\min Z_{i},\quad V=\max Z_{i}\quad\text{and}\quad W=V-U\ .
\]
Then
\[
\text{E}W=\text{E}V-\text{E}U
\]
and
\[
-\text{E}U=-\text{E}\min Z_{i}=\text{E}(-\min Z_{i})=\text{E}\max(-Z_{i})\ ,
\]
where the $n$ variables $-Z_{1},-Z_{2},\ldots,-Z_{n}$ are iid standard normal.
Thus
\[
\text{E}W=\text{E}V-\text{E}U=2\text{E}V\ .
\]
Then, (as is standard in the theory of order statistics) note that
\[
V\leq t\Leftrightarrow\ \text{all}\ n\ \text{values}\ Z_{i}\ \text{are}\ \leq
t\ .
\]
So with $\Phi$ the standard normal cdf,
\[
P[V\leq t]=\Phi^{n}(t)
\]
and thus a pdf for $V$ is
\[
f(v)=n\phi(v)\Phi^{n-1}(v)\ .
\]
So
\[
\text{E}V=\int_{-\infty}^{\infty}v\left( n\phi(v)\Phi^{n-1}(v)\right) dv\ ,
\]
and the evaluation of this integral becomes a (very small) problem in
numerical analysis. The value of this integral clearly depends upon $n$. It is
standard to invent a constant (whose dependence upon $n$ we will display
explicitly)
\[
d_{2}(n)\doteq\text{E}W=2\text{E}V
\]
that is tabled in Table A.1 of V\&J. With this notation, clearly
\[
\text{E}R=\sigma d_{2}(n)\ ,
\]
(and the range-based formulas in Section 2.2 of V\&J are based on this simple fact).
To find more properties of $W$ (and hence $R$) requires appeal to a well-known
order statistics result giving the joint density of two order statistics. The
joint density of $U$ and $V$ is
\[
f(u,v)=\left\{
\begin{array}
[c]{ll}%
n(n-1)\phi(u)\phi(v)\left( \Phi(v)-\Phi(u)\right) ^{n-2} & \text{for}\ v>u\\
0 & \text{otherwise}\ .
\end{array}
\right.
\]
A transformation then easily shows that the joint density of $U$ and $W=V-U$
is
\[
g(u,w)=\left\{
\begin{array}
[c]{ll}%
n(n-1)\phi(u)\phi(u+w)\left( \Phi(u+w)-\Phi(u)\right) ^{n-2} &
\text{for}\ w>0\\
0 & \text{otherwise}\ .
\end{array}
\right.
\]
Then, for example, the cdf of $W$ is
\[
P[W\leq t]=\int_{0}^{t}\int_{-\infty}^{\infty}g(u,w)dudw\ ,
\]
and the mean of $W^{2}$ is
\[
\text{E}W^{2}=\int_{0}^{\infty}\int_{-\infty}^{\infty}w^{2}g(u,w)dudw\ .
\]
Note that upon computing E$W$ and E$W^{2}$, one can compute both the variance
of $W$
\[
\text{Var}\,W=\text{E}W^{2}-\left( \text{E}W\right) ^{2}%
\]
and the standard deviation of $W$, $\sqrt{\text{Var}\,W\,}$. It is common to
give this standard deviation the name $d_{3}(n)$ (where we continue to make
the dependence on $n$ explicit and again this constant is tabled in Table A.1
of V\&J). Clearly, having computed $d_{3}(n)\doteq\sqrt{\text{Var}\,W}$, one
then has
\[
\sqrt{\text{Var}\,R}=\sigma d_{3}(n)\ .
\]
\section{Theory for Sample-Variance-Based Estimation of Variances}
Continue to suppose that $X_{1},X_{2},\ldots,X_{n}$ are iid Normal
$(\mu,\sigma^{2})$ random variables and take
\[
s^{2}\doteq\frac{1}{n-1}\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}\ .
\]
Standard probability theory says that
\[
\frac{(n-1)s^{2}}{\sigma^{2}}\sim\chi_{n-1}^{2}\ .
\]
Now if $U\sim\chi_{\nu}^{2}$ it is the case that E$U=\nu$ and Var$\,U=2\nu$.
It is thus immediate that
\[
\text{E}s^{2}=\text{E}\left( \frac{\sigma^{2}}{n-1}\right) \left(
\frac{(n-1)s^{2}}{\sigma^{2}}\right) =\left( \frac{\sigma^{2}}{n-1}\right)
\text{E}\left( \frac{(n-1)s^{2}}{\sigma^{2}}\right) =\sigma^{2}%
\]
and
\[
\text{Var}\,s^{2}=\text{Var}\,\left( \left( \frac{\sigma^{2}}{n-1}\right)
\left( \frac{(n-1)s^{2}}{\sigma^{2}}\right) \right) =\left( \frac
{\sigma^{2}}{n-1}\right) ^{2}\text{Var}\,\left( \frac{(n-1)s^{2}}{\sigma
^{2}}\right) =\frac{2\sigma^{4}}{n-1}%
\]
so that
\[
\sqrt{\text{Var}\,s^{2}}=\sigma^{2}\sqrt{\frac{2}{n-1}}\ .
\]
Knowing that $(n-1)s^{2}/\sigma^{2}\sim\chi_{n-1}^{2}$ also makes it easy
enough to develop properties of $s=\sqrt{s^{2}}$. For example, if
\[
f(x)=\left\{
\begin{array}
[c]{ll}%
\displaystyle\frac{1}{2^{(n-1)/2}\Gamma(\frac{n-1}{2})}\,x^{\left( \frac
{n-1}{2}\right) -1}\exp\left( -\frac{x}{2}\right) & \mbox{for }\,x>0\\
0 & \mbox{otherwise}%
\end{array}
\right.
\]
is the $\chi_{n-1}^{2}$ probability density, then
\[
\text{E}s=\text{E}\sqrt{\frac{\sigma^{2}}{n-1}}\sqrt{\frac{(n-1)s^{2}}%
{\sigma^{2}}}=\frac{\sigma}{\sqrt{n-1}}\int_{0}^{\infty}\sqrt{x}f(x)dx=\sigma
c_{4}(n)\ ,
\]
for
\[
c_{4}(n)\doteq\frac{\int_{0}^{\infty}\sqrt{x}f(x)dx}{\sqrt{n-1}}%
\]
another constant (depending upon $n$) tabled in Table A.1 of V\&J. Further,
the standard deviation of $s$ is
\[
\sqrt{\text{Var}\,s}=\sqrt{\text{E}s^{2}-\left( \text{E}s\right) ^{2}}%
=\sqrt{\sigma^{2}-\left( \sigma c_{4}(n)\right) ^{2}}=\sigma\sqrt
{1-c_{4}^{2}(n)}=\sigma c_{5}(n)
\]
for
\[
c_{5}(n)\doteq\sqrt{1-c_{4}^{2}(n)}%
\]
yet another constant tabled in Table A.1.
The fact that sums of independent $\chi^{2}$ random variables are again
$\chi^{2}$ (with degrees of freedom equal to the sum of the component degrees
of freedom) and the kinds of relationships in this section provide means of
combining various kinds of sample variances to get ``pooled'' estimators of
variances (and variance components) and finding the means and variances of
these estimators. For example, if one pools in the usual way the sample
variances from $r$ normal samples of size $m$ to get a single pooled sample
variance, $s_{\text{pooled}}^{2}$, $r(m-1)s_{\text{pooled}}^{2}/\sigma^{2}$ is
$\chi^{2}$ with degrees of freedom $\nu=r(m-1)$. That is, all of the above can
be applied by thinking of $s_{\text{pooled}}^{2}$ as a sample variance based
on a sample of size ``$n$''$=r(m-1)+1$.
\section{Sample Variances and Gage R\&R}
The methods of gage R\&R analysis presented in V\&J \S2.2.2 are based on
ranges (and the facts in \S1.1 above). They are presented in V\&J not because
of their efficiency, but because of their computational simplicity. Better
(and analogous) methods can be based on the facts in \S1.2 above. For example,
under the two-way random effects model (2.4) of V\&J, if one pools $I\times J$
``cell'' sample variances $s_{ij}^{2}$ to get $s_{\text{pooled}}^{2}$, all of
the previous paragraph applies and gives methods of estimating the
repeatability variance component $\sigma^{2}$ (or the repeatability standard
deviation $\sigma$) and calculating means and variances of estimators based on
$s_{\text{pooled}}^{2}$.
Or, consider the problem of estimating $\sigma_{\text{reproducibility}}$
defined in display (2.5) of V\&J. With $\bar{y}_{ij}$ as defined on page 24 of
V\&J, note that for fixed $i$, the $J$ random variables $\bar{y}_{ij}%
-\alpha_{i}$ have the same sample variance as the $J$ random variables
$\bar{y}_{ij}$, namely
\[
s_{i}^{2}\doteq\frac{1}{J-1}\sum_{j}(\bar{y}_{ij}-\bar{y}_{i.})^{2}\ .
\]
But for fixed $i$ the $J$ random variables $\bar{y}_{ij}-\alpha_{i}$ are iid
normal with mean $\mu$ and variance $\sigma_{\beta}^{2}+\sigma_{\alpha\beta
}^{2}+\sigma^{2}/m$, so that
\[
\text{E}s_{i}^{2}=\sigma_{\beta}^{2}+\sigma_{\alpha\beta}^{2}+\sigma^{2}/m\ .
\]
So
\[
\frac{1}{I}\sum_{i}s_{i}^{2}%
\]
is a plausible estimator of $\sigma_{\beta}^{2}+\sigma_{\alpha\beta}%
^{2}+\sigma^{2}/m$. Hence
\[
\frac{1}{I}\sum_{i}s_{i}^{2}-\frac{s_{\text{pooled}}^{2}}{m}\ ,
\]
or better yet
\begin{equation}
\max\left( 0,\frac{1}{I}\sum_{i}s_{i}^{2}-\frac{s_{\text{pooled}}^{2}}%
{m}\right) \label{eq1.3.1}%
\end{equation}
is a plausible estimator of $\sigma_{\text{reproducibility}}^{2}$.
\section{ANOVA and Gage R\&R}
Under the two-way random effects model (2.4) of V\&J, with balanced data, it
is well-known that the ANOVA mean squares
\begin{align*}
MSE & =\frac{1}{IJ(m-1)}\sum_{i,j,k}(y_{ijk}-\bar{y}_{..})^{2}\ ,\\
MSAB & =\frac{m}{(I-1)(J-1)}\sum_{i,j}(\bar{y}_{ij}-\bar{y}_{i.}-\bar
{y}_{.j}+\bar{y}_{..})^{2}\ ,\\
MSA & =\frac{mJ}{I-1}\sum_{i}(\bar{y}_{i.}-\bar{y}_{..})^{2}\ ,\quad
\text{and}\\
MSB & =\frac{mI}{J-1}\sum_{i}(\bar{y}_{.j}-\bar{y}_{..})^{2}\ ,
\end{align*}
are independent random variables, that
\begin{align*}
\text{E}MSE & =\sigma^{2}\ ,\\
\text{E}MSAB & =\sigma^{2}+m\sigma_{\alpha\beta}^{2}\ ,\\
\text{E}MSA & =\sigma^{2}+m\sigma_{\alpha\beta}^{2}+mJ\sigma_{\alpha}%
^{2}\ ,\quad\text{and}\\
\text{E}MSB & =\sigma^{2}+m\sigma_{\alpha\beta}^{2}+mI\sigma_{\beta}^{2}\ ,
\end{align*}
and that the quantities
\[
\frac{(m-1)IJMSE}{\text{E}MSE}\,\,,\,\,\frac{(I-1)(J-1)MSAB}{\text{E}%
MSAB}\,\,,\,\,\frac{(I-1)MSA}{\text{E}MSA}\,\,\text{and \ }\frac
{(J-1)MSB}{\text{E}MSB}\mbox{ \ }%
\]
are $\chi^{2}$ random variables with respective degrees of freedom
\[
(m-1)IJ\,,\,(I-1)(J-1)\,,\,(I-1)\,\mbox{and }(J-1)\ .
\]
These facts about sums of squares and mean squares for the two-way random
effects model are often summarized in the usual (two-way random effects model)
ANOVA table, Table \ref{tab1.4.1}. (The sums of squares are simply the mean
squares multiplied by the degrees of freedom. More on the interpretation of
such tables can be found in places like \S8-4 of V.)%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{Two-way Balanced Data Random Effects Analysis ANOVA Table}%
\label{tab1.4.1}
\begin{tabular}
[c]{lcccc}%
\multicolumn{5}{c}{ANOVA Table}\\
Source & $SS$ & $df$ & $MS$ & E$MS$\\\hline
Parts & $SSA$ & $I-1$ & $MSA$ & \multicolumn{1}{l}{$\sigma^{2}+m\sigma
_{\alpha\beta}^{2}+mJ\sigma_{\alpha}^{2}$}\\
Operators & $SSB$ & $J-1$ & $MSB$ & \multicolumn{1}{l}{$\sigma^{2}%
+m\sigma_{\alpha\beta}^{2}+mI\sigma_{\beta}^{2}$}\\
Parts$\times$Operators & $SSAB$ & $(I-1)(J-1)$ & $MSAB$ &
\multicolumn{1}{l}{$\sigma^{2}+m\sigma_{\alpha\beta}^{2}$}\\
Error & $SSE$ & $(m-1)IJ$ & $MSE$ & \multicolumn{1}{l}{$\sigma^{2}$}\\\hline
Total & $SSTot$ & $mIJ-1$ & & \multicolumn{1}{l}{}%
\end{tabular}
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
As a matter of fact, the ANOVA error mean square is exactly $s_{\text{pooled}%
}^{2}$ from \S1.3 above. Further, the expected mean squares suggest ways of
producing sensible estimators of other parametric functions of interest in
gage R\&R contexts (see V\&J page 27 in this regard). For example, note that
\[
\sigma_{\text{reproducibility}}^{2}=\frac{1}{mI}\text{E}MSB+\frac{1}%
{m}(1-\frac{1}{I})\text{E}MSAB-\frac{1}{m}\text{E}MSE\ ,
\]
which suggests the ANOVA-based estimator
\begin{equation}
\widehat{\sigma}_{\text{reproducibility}}^{2}=\max\left( 0,\frac{1}%
{mI}MSB+\frac{1}{m}(1-\frac{1}{I})MSAB-\frac{1}{m}MSE\right) \ .
\label{eq1.4.1}%
\end{equation}
What may or may not be well known is that this estimator (\ref{eq1.4.1}) is
exactly the estimator of $\sigma_{\text{reproducibility}}^{2}$ in display
(\ref{eq1.3.1}).
Since many common estimators of quantities of interest in gage R\&R studies
are functions of mean squares, it is useful to have at least some crude
standard errors for them. These can be derived from ``delta
method''/``propagation of error''/Taylor series argument provided in the
appendix to these notes. For example, if $MS_{i}$ $i=1,\ldots,k$ are
independent random variables, $\left( \nu_{i}MS_{i}/\text{E}MS_{i}\right) $
with a $\chi_{\nu_{i}}^{2}$ distribution, consider a function of $k$ real
variables $f(x_{1},\ldots,x_{k})$ and the random variable
\[
U=f(MS_{1},MS_{2},...,MS_{k})\ .
\]
Propagation of error arguments produce the approximation
\[
\text{Var}\,U\approx\sum_{i=1}^{k}\left( \frac{\partial f}{\partial x_{i}%
}\bigg|_{\text{E}MS_{1},\text{E}MS_{2},...,\text{E}MS_{k}}\right)
^{2}\text{Var}\,MS_{i}=\sum_{i=1}^{k}\left( \frac{\partial f}{\partial x_{i}%
}\bigg|_{\text{E}MS_{1},\text{E}MS_{2},...,\text{E}MS_{k}}\right) ^{2}%
\frac{2(\text{E}MS_{i})^{2}}{\nu_{i}}\ ,
\]
and upon substituting mean squares for their expected values, one has a
standard error for $U$, namely
\begin{equation}
\sqrt{\widehat{\text{Var}\,}U}=\sqrt{2\sum_{i=1}^{k}\left( \frac{\partial
f}{\partial x_{i}}\bigg|_{MS_{1},MS_{2},\ldots,MS_{k}}\right) ^{2}%
\frac{(MS_{i})^{2}}{\nu_{i}}}\ . \label{eq1.4.2}%
\end{equation}
In the special case where the function of the mean squares of interest is
linear in them, say
\[
U=\sum_{i=1}^{k}c_{i}MS_{i}\ ,
\]
the standard error specializes to
\[
\sqrt{\widehat{\text{Var}\,}U}=\sqrt{2\sum_{i=1}^{k}\frac{c_{i}^{2}%
(MS_{i})^{2}}{\nu_{i}}}\ ,
\]
which provides at least a crude method of producing standard errors for
$\widehat{\sigma}_{\text{reproducibility}}^{2}$ and $\widehat{\sigma
}_{\text{overall}}^{2}$. Such standard errors are useful in giving some
indication of the precision with which the quantities of interest in a gage
R\&R study have been estimated.
\section{Confidence Intervals for Gage R\&R Studies}
The parametric functions of interest in gage R\&R studies (indeed in all
random effects analyses) are functions of variance components, or
equivalently, functions of expected mean squares. It is thus possible to apply
theory for estimating such quantities to the problem of assessing precision of
estimation in a gage study. As a first (and very crude) example of this, note
that taking the point of view of \S1.4 above, where $U=f(MS_{1},MS_{2}%
,\ldots,MS_{k})$ is a sensible point estimator of an interesting function of
the variance components and $\sqrt{\widehat{\text{Var}}\,U}$ is the standard
error (\ref{eq1.4.2}), simple approximate two-sided 95\% confidence limits can
be made as
\[
U\pm1.96\sqrt{\widehat{\text{Var}}\,U}\ .
\]
These limits have the virtue of being amenable to ``hand'' calculation from
the ANOVA sums of squares, but they are not likely to be reliable (in terms of
holding their nominal/asymptotic coverage probability) for $I$,$J$ or $m$ small.
Linear models experts have done substantial research aimed at finding reliable
confidence interval formulas for important functions of expected mean squares.
For example, the book \textit{Confidence Intervals on Variance Components} by
Burdick and Graybill gives results (on the so-called ``modified large sample
method'') that can be used to make confidence intervals on various important
functions of variance components. The following is some material taken from
Sections 3.2 and 3.3 of the Burdick and Graybill book.
Suppose that $MS_{1},MS_{2},\ldots,MS_{k}$ are $k$ independent mean squares.
(The $MS_{i}$ are of the form $SS_{i}/\nu_{i}$, where $SS_{i}/$E$MS_{i}%
=\nu_{i}MS_{i}/$E$MS_{i}$ has a $\chi_{\nu_{i}}^{2}$ distribution.) For $1\leq
pi}^{p}c_{i}c_{j}%
MS_{i}MS_{j}G_{ij}^{\ast}\ ,
\]
for
\[
G_{i}=1-\frac{\nu_{i}}{\chi_{\alpha:\nu_{i}}^{2}}\ ,
\]%
\[
H_{i}=\frac{\nu_{i}}{\chi_{1-\alpha:\nu_{i}}^{2}}-1\ ,
\]%
\[
G_{ij}=\frac{(F_{\alpha:\nu_{i},\nu_{j}}-1)^{2}-G_{i}^{2}F_{\alpha:\nu_{i}%
,\nu_{j}}^{2}-H_{j}^{2}}{F_{\alpha:\nu_{i},\nu_{j}}}\ ,
\]
and
\[
G_{ij}^{\ast}=\left\{
\begin{array}
[c]{ll}%
0 & \mbox{if}\ p=1\\
\displaystyle\frac{1}{p-1}\left( \left( 1-\frac{\nu_{i}+\nu_{j}}%
{\chi_{\alpha:\nu_{i}+\nu_{j}}}\right) ^{2}\frac{(\nu_{i}+\nu_{j})^{2}}%
{\nu_{i}\nu_{j}}-\frac{G_{i}^{2}\nu_{i}}{\nu_{j}}-\frac{G_{j}^{2}\nu_{j}}%
{\nu_{i}}\right) & \mbox{otherwise}\ .
\end{array}
\right.
\]
On the other hand,
\[
V_{\mbox{U}}=\sum_{i=1}^{p}c_{i}^{2}MS_{i}^{2}H_{i}^{2}+\sum_{i=p+1}^{k}%
c_{i}^{2}MS_{i}^{2}G_{i}^{2}+\sum_{i=1}^{p}\,\,\sum_{j=p+1}^{k}c_{i}%
c_{j}MS_{i}MS_{j}H_{ij}+\sum_{i=p+1}^{k-1}\,\,\sum_{j>i}^{k}c_{i}c_{j}%
MS_{i}MS_{j}H_{ij}^{\ast}\ ,
\]
for $G_{i}$ and $H_{i}$ as defined above, and
\[
H_{ij}=\frac{(1-F_{1-\alpha:\nu_{i},\nu_{j}})^{2}-H_{i}^{2}F_{1-\alpha:\nu
_{i},\nu_{j}}^{2}-G_{j}^{2}}{F_{1-\alpha:\nu_{i},\nu_{j}}}\ ,
\]
and
\[
H_{ij}^{\ast}=\left\{
\begin{array}
[c]{ll}%
0 & \mbox{if}\ k=p+1\\
\displaystyle\frac{1}{k-p-1}\left( \left( 1-\frac{\nu_{i}+\nu_{j}}%
{\chi_{\alpha:\nu_{i}+\nu_{j}}^{2}}\right) ^{2}\frac{(\nu_{i}+\nu_{j})^{2}%
}{\nu_{i}\nu_{j}}-\frac{G_{i}^{2}\nu_{i}}{\nu_{j}}-\frac{G_{j}^{2}\nu_{j}}%
{\nu_{i}}\right) & \mbox{otherwise}\ .
\end{array}
\right.
\]
One uses $(L,\infty)$ or $(-\infty,U)$ for confidence level $(1-\alpha)$ and
the interval $(L,U)$ for confidence level $(1-2\alpha)$. (Using these formulas
for ``hand'' calculation is (obviously) no picnic. The C program written by
Brandon Paris (available off the Stat 531 Web page) makes these calculations painless.)
A problem similar to the estimation of quantity (\ref{eq1.5.1}) is that of
estimating
\begin{equation}
\theta=c_{1}\text{E}MS_{1}+\cdots+c_{p}\text{E}MS_{p} \label{eq1.5.2}%
\end{equation}
for $p\geq1$ and positive constants $c_{1},c_{2},\ldots,c_{p}$. In this case
let
\[
\widehat{\theta}=c_{1}MS_{1}+\cdots+c_{p}MS_{p}\ ,
\]
and continue the $G_{i}$ and $H_{i}$ notation from above. Then approximate
confidence limits on $\theta$ given in display (\ref{eq1.5.2}) are of the
form
\[
L=\widehat{\theta}-\sqrt{\sum_{i=1}^{p}c_{i}^{2}MS_{i}^{2}G_{i}^{2}}\mbox{
\ and/or \ }U=\widehat{\theta}+\sqrt{\sum_{i=1}^{p}c_{i}^{2}MS_{i}^{2}%
H_{i}^{2}}\ .
\]
One uses $(L,\infty)$ or $(-\infty,U)$ for confidence level $(1-\alpha)$ and
the interval $(L,U)$ for confidence level $(1-2\alpha)$.
The Fortran program written by Andy Chiang (available off the Stat 531 Web
page) applies Burdick and Graybill-like material and the standard errors
(\ref{eq1.4.2}) to the estimation of many parametric functions of relevance in
gage R\&R studies.
Chiang's 2000 Ph.D. dissertation work (to appear in \textit{Technometrics} in
August 2001) has provided an entirely different method of interval estimation
of functions of variance components that is a uniform improvement over the
``modified large sample'' methods presented by Burdick and Graybill. His
approach is related to ``improper Bayes'' methods with so called ``Jeffreys
priors.'' Andy has provided software for implementing his methods that, as
time permits, will be posted on the Stat 531 Web page. He can be contacted
(for preprints of his work) at stackl@nus.edu.sg at the National University of Singapore.
\section{Calibration and Regression Analysis}
The estimation of standard deviations and variance components is a
contribution of the subject of statistics to the quantification of measurement
system \textit{precision}. The subject also has contributions to make in the
matter of improving measurement \textit{accuracy}. Calibration is the business
of bringing a local measurement system in line with a standard measurement
system. One takes measurements $y$ with a gage or system of interest on test
items with ``known'' values $x$ (available because they were previously
measured using a ``gold standard'' measurement device). The data collected are
then used to create a conversion scheme for translating local measurements to
approximate gold standard measurements, thereby hopefully improving local
accuracy. In this short section we note that usual regression methodology has
implications in this kind of enterprise.
The usual polynomial regression model says that $n$ observed random values
$y_{i}$ are related to fixed values $x_{i}$ via%
\begin{equation}
y_{i}=\beta_{0}+\beta_{1}x_{i}+\beta_{2}x_{i}^{2}+\cdots+\beta_{k}x_{i}%
^{k}+\varepsilon_{i} \label{eq1.7.1}%
\end{equation}
for iid Normal $(0,\sigma^{2})$ random variables $\varepsilon_{i}$. The
parameters $\beta$ and $\sigma$ are the usual objects of inference in this
model.\ In the calibration context with $x$ a gold standard value, $\sigma$
quantifies precision for the local measurement system. Often (at least over a
limited range of $x$) 1) a low order polynomial does a good job of describing
the observed $x$-$y$ relationship between local and gold standard measurements
and 2) the usual (least squares) fitted relationship%
\[
\hat{y}=g(x)=b_{0}+bx+b_{2}x^{2}+\cdots+b_{k}x^{k}%
\]
has an inverse $g^{-1}(y)$. When such is the case, given a measurement
$y_{n+1}$ from the local measurement system, it is plausible to estimate that
a corresponding measurement from the gold standard system would be $\hat
{x}_{n+1}=$ $g^{-1}(y_{n+1})$. A reasonable question is then ``How good is
this estimate?''. That is, the matter of confidence interval estimation of
$x_{n+1}$ is important.
One general method for producing such confidence sets for $x_{n+1}$ is based
on the usual ``prediction interval'' methodology associated with the model
(\ref{eq1.7.1}). That is, for a given $x$, it is standard (see, e.g. \S 9-2 of
V or \S 9.2.4 of V\&J\#2) to produce a prediction interval of the form%
\[
\hat{y}\pm t\sqrt{s^{2}+\left( \text{std error}(\hat{y})\right) ^{2}}%
\]
for an additional corresponding $y$. And those intervals have the property
that for all choices of $x,\sigma,\beta_{0},\beta_{1},\beta_{2},...,\beta_{k}$%
\begin{align*}
& P_{x,\sigma,\beta_{0},\beta_{1},\beta_{2},...,\beta_{k}}[y\text{ is in the
prediction interval at }x]\\
\quad & =\text{desired confidence level}\\
\quad & =1-P[\text{a }t_{n-k-1}\text{ random variable exceeds }|t|]\text{ .}%
\end{align*}
But rewording only slightly, the event%
\[
\text{``}y\text{ is in the prediction interval at }x\text{''}%
\]
is the same as the event%
\[
\text{``}x\text{ produces a prediction interval including }y\text{.''}%
\]
So a confidence set for $x_{n+1}$ based on the observed value $y_{n+1}$ is%
\begin{equation}
\{x|\text{ the prediction interval corresponding to }x\text{ includes }%
y_{n+1}\}\text{ .}\label{eq1.7.2}%
\end{equation}
Conceptually, one simply makes prediction limits around the fitted
relationship $\hat{y}=g(x)=b_{0}+bx+b_{2}x^{2}+\cdots+b_{k}x^{k}$ and then
upon observing a new $y$ sees what $x$'s are consistent with that observation.
This produces a confidence set with the desired confidence level.
The only real difficulties with the above general prescription are 1) the lack
of simple explicit formulas and 2) the fact that when $\sigma$ is large (so
that the regression $\sqrt{MSE}$ tends to be large) or the fitted relationship
is very nonlinear, the method can produce (completely rational but)
unpleasant-looking confidence sets. The first ``problem'' is really of limited
consequence in a time when standard statistical software will automatically
produce plots of prediction limits associated with low order regressions. And
the second matter is really inherent in the problem.
For the (simplest) linear version of this ``inverse prediction'' problem,
there is an approximate confidence method in common use that doesn't have the
deficiencies of the method (\ref{eq1.7.2}). It is derived from a Taylor series
argument and has its own problems, but is nevertheless worth recording here
for completeness sake. That is, under the $k=1$ version of the model
(\ref{eq1.7.1}), commonly used approximate confidence limits for $x_{n+1}$ are
(for $\hat{x}_{n+1}=(y_{n+1}-b_{0})/b_{1}$ and $\bar{x}$ the sample mean of
the gold standard measurements from the calibration experiment)%
\[
\hat{x}_{n+1}\pm t\frac{\sqrt{MSE}}{|b_{1}|}\sqrt{1+\frac{1}{n}+\frac{(\hat
{x}_{n+1}-\bar{x})^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\text{ .}%
\]
\section{Crude Gaging and Statistics}
All real-world measurement is ``to the nearest something.'' Often one may
ignore this fact, treat measured values as if they were ``exact'' and
experience no real difficulty when using standard statistical methods (that
are really based on an assumption that data are exact). However, sometimes in
industrial applications gaging is ``crude'' enough that standard (e.g.
``normal theory'') formulas give nonsensical results. This section briefly
considers what can be done to appropriately model and draw inferences from
crudely gaged data. The assumption throughout is that what are available are
integer data, obtained by coding raw observations via
\[
\mathit{integer~observation}=\frac
{\mathit{raw~observation-some~reference~value}}%
{\mathit{smallest~unit~of~measurement}}%
\]
(the ``\textit{smallest unit of measurement}'' is ``the nearest something'' above).
\subsection{Distributions of Sample Means and Ranges from Integer Observations}
To begin with something simple, note first that in situations where only a few
different coded values are ever observed, rather than trying to model
observations with some continuous distribution (like a normal one) it may well
make sense to simply employ a discrete pmf, say $f$, to describe any single
measurement. In fact, suppose that a single (crudely gaged) observation $Y$
has a pmf $f(y)$ such that
\[
f(y)=0\quad\mbox{unless}\quad y=1,2,...,M\ .
\]
Then if $Y_{1},Y_{2},\ldots,Y_{n}$ are iid with this marginal discrete
distribution, one can easily approximate the distribution of a function of
these variables via simulation (using common statistical packages). And for
two of the most common statistics used in QC settings (the sample mean and
range) one can even work out exact probability distributions using
computationally feasible and very elementary methods.
To find the probability distribution of $\bar{Y}$ in this context, one can
build up the probability distributions of sums of iid $Y_{i}$'s recursively by
``adding probabilities on diagonals in two-way joint probability tables.'' For
example the $n=2$ distribution of $\bar{Y}$ can be obtained by making out a
two-way table of joint probabilities for $Y_{1}$ and $Y_{2}$ and adding on
diagonals to get probabilities for $Y_{1}+Y_{2}$. Then making a two-way table
of joint probabilities for $(Y_{1}+Y_{2})$ and $Y_{3}$ one can add on
diagonals and find a joint distribution for $Y_{1}+Y_{2}+Y_{3}$. Or noting
that the distribution of $Y_{3}+Y_{4}$ is the same as that for $Y_{1}+Y_{2}$,
it is possible to make a two-way table of joint probabilities for
$(Y_{1}+Y_{2})$ and $(Y_{3}+Y_{4})$, add on diagonals and find the
distribution of $Y_{1}+Y_{2}+Y_{3}+Y_{4}$. And so on. (Clearly, after finding
the distribution for a sum, one simply divides possible values by $n$ to get
the corresponding distribution of $\bar{Y}$.)
To find the probability distribution of $R=\max Y_{i}-\min Y_{i}$ (for $Y_{i}%
$'s as above) a feasible computational scheme is as follows. Let
\[
S_{kj} = \left\{
\begin{array}
[c]{ll}%
\sum_{x=k}^{j}f(y) = P[k\leq Y\leq j] & \mbox{if}\ k\leq j\\
0 & \mbox{otherwise}%
\end{array}
\right.
\]
and compute and store these for $1\leq k,j\leq M$. Then define
\[
M_{kj}=P[\min Y_{i}=k \ \mbox{and} \ \max Y_{i}=j]\ .
\]
Now the event $\{\min Y_{i}=k$ and $\max Y_{i}=j\}$ is the event $\{$all
observations are between $k$ and $j$ inclusive$\}$ less the event $\{$the
minimum is greater than $k$ or the maximum is less than $j\}$. Thus, it is
straightforward to see that
\[
M_{kj}=(S_{kj})^{n}-(S_{k+1,j})^{n}-(S_{k,j-1})^{n}+(S_{k+1,j-1})^{n}%
\]
and one may compute and store these values. Finally, note that
\[
P[R=r]=\sum_{k=1}^{M-r}M_{k,k+r}\ .
\]
These ``algorithms'' are good for any distribution $f$ on the integers
$1,2,\ldots,M$. Karen (Jensen) Hulting's ``DIST'' program (available off the
Stat 531 Web page) automates the calculations of the distributions of $\bar
{Y}$ and $R$ for certain $f$'s related to ``integer rounding of normal
observations.'' (More on this rounding idea directly.)
\subsection{Estimation Based on Integer-Rounded Normal Data}
The problem of drawing inferences from crudely gaged data is one that has a
history of at least 100 years (if one takes a view that crude gaging
essentially ``rounds'' ``exact'' values). Sheppard in the late 1800's noted
that if one rounds a continuous variable to integers, the variability in the
distribution is typically increased. He thus suggested not using the sample
standard deviation ($s$) of rounded values but instead employing what is known
as Sheppard's correction to arrive at
\begin{equation}
\sqrt{\frac{(n-1)s^{2}}{n}-\frac{1}{12}} \label{eq1.6.1}%
\end{equation}
as a suitable estimate of ``standard deviation'' for integer-rounded data.
The notion of ``interval-censoring'' of fundamentally continuous observations
provides a natural framework for the application of modern statistical theory
to the analysis of crudely gaged data. For univariate $X$ with continuous cdf
$F(x|\mbox{\boldmath$\theta$})$ depending upon some (possibly vector)
parameter $\mbox{\boldmath$\theta$}$, consider $X^{\ast}$ derived from $X$ by
rounding to the nearest integer. Then the pmf of $X^{\ast}$ is, say,
\[
g(x^{\ast}|\mbox{\boldmath$\theta$}) \doteq\left\{
\begin{array}
[c]{ll}%
F(x^{\ast}+.5|\mbox{\boldmath$\theta$})-F(x^{\ast}-.5|\mbox{\boldmath$\theta$%
}) & \mbox{for }\ x^{\ast}\ \mbox{an integer}\\
0 & \mbox{otherwise}\ .
\end{array}
\right.
\]
Rather than doing inference based on the unobservable variables $X_{1}%
,X_{2},\ldots,X_{n}$ that are iid $F(x|\mbox{\boldmath$\theta$})$, one might
consider inference based on $X_{1}^{\ast},X_{2}^{\ast},\ldots,X_{n}^{\ast}$
that are iid with pmf $g(x^{\ast}|\mbox{\boldmath$\theta$})$.
The normal version of this scenario (the integer-rounded normal data model)
makes use of
\[
g(x^{\ast}|\mu,\sigma)\doteq\left\{
\begin{array}
[c]{ll}%
\displaystyle\Phi\left( \frac{x^{\ast}+.5-\mu}{\sigma}\right) -\Phi\left(
\frac{x^{\ast}-.5-\mu}{\sigma}\right) & \mbox{for}\ x^{\ast}\ \mbox{an
integer}\\
0 & \mbox{otherwise}\ ,
\end{array}
\right.
\]
and the balance of this section will consider the use of this specific
important model. So suppose that $X_{1}^{\ast},X_{2}^{\ast},\ldots,X_{n}%
^{\ast}$ are iid integer-valued random observations (generated from underlying
normal observations by rounding). For an observed vector of integers
$(x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}^{\ast})$ it is useful to consider the
so-called ``likelihood function'' that treats the (joint) probability assigned
to the vector $(x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}^{\ast})$ as a function
of the parameters,
\[
L(\mu,\sigma)\doteq\prod_{i}g(x_{i}^{\ast}|\mu,\sigma)=\prod_{i}\left(
\Phi\left( \frac{x_{i}^{\ast}+.5-\mu}{\sigma}\right) -\Phi\left(
\frac{x_{i}^{\ast}-.5-\mu}{\sigma}\right) \right) \ .
\]
The log of this function of $\mu$ and $\sigma$ is (naturally enough) called
the loglikelihood and will be denoted as
\[
\mathcal{L}(\mu,\sigma)\doteq\ln L(\mu,\sigma)\ .
\]
A sensible estimator of the parameter vector $(\mu,\sigma)$ is ``the point
$(\widehat{\mu},\widehat{\sigma})$ maximizing the loglikelihood.'' This
prescription for estimation is only partially complete, depending upon the
nature of the sample $x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}^{\ast}$. There
are three cases to consider, namely:
\begin{enumerate}
\item When the sample range of $x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}$ is at
least 2, $\mathcal{L}(\mu,\sigma)$ is well-behaved (nice and ``mound-shaped'')
and numerical maximization or just looking at contour plots will quickly allow
one to maximize the loglikelihood. (It is worth noting that in this
circumstance, usually $\widehat{\sigma}$ is close to the ``Sheppard
corrected'' value in display (\ref{eq1.6.1}).)
\item When the sample range of $x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}$ is 1,
strictly speaking $\mathcal{L}(\mu,\sigma)$ fails to achieve a maximum.
However, with
\[
m\doteq\#[x_{i}^{\ast}=\min x_{i}^{\ast}]\ ,
\]
$(\mu,\sigma)$ pairs with $\sigma$ small and
\[
\mu\approx\min x_{i}^{\ast}+.5-\sigma\Phi^{-1}\left( \frac{m}{n}\right)
\]
will have
\[
\mathcal{L}(\mu,\sigma)\approx\mathop{\sup}\limits_{\mu,\sigma}\mathcal{L}%
(\mu,\sigma)=m\ln m+(n-m)\ln(n-m)-n\ln n\ .
\]
That is, in this case one ought to ``estimate'' that $\sigma$ is small and the
relationship between $\mu$ and $\sigma$ is such that a fraction $m/n$ of the
underlying normal distribution is to the left of $\min x_{i}^{\ast}+.5$, while
a fraction $1-m/n$ is to the right.
\item When the sample range of $x_{1}^{\ast},x_{2}^{\ast},\ldots,x_{n}$ is 0,
strictly speaking $\mathcal{L}(\mu,\sigma)$ fails to achieve a maximum.
However,
\[
\mathop{\sup}\limits_{\mu,\sigma}\mathcal{L}(\mu,\sigma)=0
\]
and for any $\mu\in(x_{1}^{\ast}-.5,x_{1}^{\ast}+.5)$, $\mathcal{L}(\mu
,\sigma)\rightarrow0$ as $\sigma\rightarrow0$. That is, in this case one ought
to ``estimate'' that $\sigma$ is small and $\mu\in(x_{1}^{\ast}-.5,x_{1}%
^{\ast}+.5)$.
\end{enumerate}
Beyond the making of point estimates, the loglikelihood function can provide
approximate confidence sets for the parameters $\mu$ and/or $\sigma$. Standard
``large sample'' statistical theory says that (for large $n$ and $\chi
_{\alpha:\nu}^{2}$ the upper $\alpha$ point of the $\chi_{\nu}^{2}$ distribution):
\begin{enumerate}
\item An approximate $(1-\alpha)$ level confidence set for the parameter
vector $(\mu,\sigma)$ is
\begin{equation}
\{(\mu,\sigma)|\mathcal{L}(\mu,\sigma)>\mathop{\sup}\limits_{\mu,\sigma
}\mathcal{L}(\mu,\sigma)-\frac{1}{2}\chi_{\alpha:2}^{2}\}\ . \label{eq1.6.2}%
\end{equation}
\item An approximate $(1-\alpha)$ level confidence set for the parameter $\mu$
is
\begin{equation}
\{\mu|\mathop{\sup}\limits_{\sigma}\mathcal{L}(\mu,\sigma)>\mathop{\sup
}\limits_{\mu,\sigma}\mathcal{L}(\mu,\sigma)-\frac{1}{2}\chi_{\alpha:1}%
^{2}\}\ . \label{eq1.6.3}%
\end{equation}
\item An approximate $(1-\alpha)$ level confidence set for the parameter
$\sigma$ is
\begin{equation}
\{\sigma|\mathop{\sup}\limits_{\mu}\mathcal{L}(\mu,\sigma)>\mathop{\sup
}\limits_{\mu,\sigma}\mathcal{L}(\mu,\sigma)-\frac{1}{2}\chi_{\alpha:1}%
^{2}\}\ . \label{eq1.6.4}%
\end{equation}
\end{enumerate}
Several comments and a fuller discussion are in order regarding these
confidence sets. In the first place, Karen (Jensen) Hulting's CONEST program
(available off the Stat 531 Web page) is useful in finding $\mathop{\sup
}\limits_{\mu,\sigma}\mathcal{L}(\mu,\sigma)$ and producing rough contour
plots of the (joint) sets for $(\mu,\sigma)$ in display (\ref{eq1.6.2}).
Second, it is common to call the function of $\mu$ defined by
\[
\mathcal{L}^{\ast}(\mu)=\mathop{\sup}\limits_{\sigma}\mathcal{L}(\mu,\sigma)
\]
the ``profile loglikelihood'' function for $\mu$ and the function of $\sigma$%
\[
\mathcal{L}^{\ast\ast}(\sigma)=\mathop{\sup}\limits_{\mu}\mathcal{L}%
(\mu,\sigma)
\]
the ``profile loglikelihood'' function for $\sigma$. Note that display
(\ref{eq1.6.3}) then says that the confidence set should consist of those
$\mu$'s for which the profile loglikelihood is not too much smaller than the
maximum achievable. And something entirely analogous holds for the sets in
(\ref{eq1.6.4}). Johnson Lee (in 2001 Ph.D. dissertation work) has carefully
studied these confidence interval estimation problems and determined that some
modification of methods (\ref{eq1.6.3}) and (\ref{eq1.6.4}) is necessary in
order to provide guaranteed coverage probabilities for small sample sizes. (It
is also very important to realize that contrary to naive expectations, not
even a large sample size will make the usual $t$-intervals for $\mu$ and
$\chi^{2}$-intervals for $\sigma$ hold their nominal confidence levels in the
event that $\sigma$ is small, i.e. that the rounding or crudeness of the
gaging is important. Ignoring the rounding when it is important can produce
actual confidence levels near 0 for methods with large nominal confidence levels.)
\subsubsection{Intervals for a Normal Mean Based on Integer-Rounded Data}
Specifically regarding the sets for $\mu$ in display (\ref{eq1.6.3}), Lee (in
work to appear in the \textit{Journal of Quality Technology}) has shown that
one must replace the value $\chi_{\alpha:1}^{2}$ with something larger in
order to get small $n$ actual confidence levels not too far from nominal for
``most'' $(\mu,\sigma)$. In fact, the choice%
\[
c(n,\alpha)=n\ln\left( \frac{t_{\frac{\alpha}{2}:(n-1)}^{2}}{n-1}+1\right)
\]
(for $t_{\frac{\alpha}{2}:(n-1)}$ the upper $\frac{\alpha}{2}$ point of the
$t$ distribution with $\nu=n-1$ degrees of freedom) is appropriate.
After replacing $\chi_{\alpha:1}^{2}$ with $c(n,\alpha)$ in display
(\ref{eq1.6.3}) there remains the numerical analysis problem of actually
finding the interval prescribed by the display. The nature of the numerical
analysis required depends upon the sample range encountered in the crudely
gaged data. Provided the range is at least $2$, $\mathcal{L}^{\ast}(\mu)$ is
well-behaved (continuous and ``mound-shaped'') and even simple trial and error
with Karen (Jensen) Hulting's CONEST program will quickly produce the
necessary interval. When the range is $0$ or $1$, $\mathcal{L}^{\ast}(\mu)$
has respectively $2$ or $1$ discontinuities and the numerical analysis is a
bit trickier. Lee has recorded the results of the numerical analysis for small
sample sizes and $\alpha=.05,.10$ and $.20$ (confidence levels respectively
$95\%,90\%$ and $80\%$).
When a sample of size $n$ produces range $0$ with, say, all observations equal
to $x^{\ast}$, the intuition that one ought to estimate $\mu\in(x^{\ast
}-.5,x^{\ast}+.5)$ is sound unless $n$ is very small. If $n$ and $\alpha$ are
as recorded in Table \ref{tab1.6.1} then display (\ref{eq1.6.3}) (modified by
the use of $c(n,\alpha)$ in place of $\chi_{\alpha:1}^{2}$) leads to the
interval $(x^{\ast}-\Delta,x^{\ast}+\Delta)$. (Otherwise it leads to
$(x^{\ast}-.5,x^{\ast}+.5)$ for these $\alpha$.)%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{$\Delta$ for 0-Range Samples Based on Very Small $n$}%
\label{tab1.6.1}
\begin{tabular}
[c]{llll}
& & $\alpha$ & \\
$n$ & \multicolumn{1}{c}{$.05$} & \multicolumn{1}{c}{$.10$} &
\multicolumn{1}{c}{$.20$}\\\hline
$2$ & \multicolumn{1}{r}{$3.084$} & \multicolumn{1}{r}{$1.547$} &
\multicolumn{1}{r}{$.785$}\\
$3$ & \multicolumn{1}{r}{$.776$} & \multicolumn{1}{r}{$.562$} &
\multicolumn{1}{r}{}\\
$4$ & \multicolumn{1}{r}{$.517$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{}%
\\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
In the case that a sample of size $n$ produces range $1$ with, say, all
observations $x^{\ast}$ or $x^{\ast}+1$, the interval prescribed by display
(\ref{eq1.6.3}) (with $c(n,\alpha)$ used in place of $\chi_{\alpha:1}^{2}$)
can be thought of as having the form $(x^{\ast}+.5-\Delta_{L},x^{\ast
}+.5+\Delta_{U})$ where $\Delta_{L}$ and $\Delta_{U}$ depend upon%
\begin{equation}
n_{x^{\ast}}=\#[\text{observations }x^{\ast}]\text{ \ and \ }n_{x^{\ast}%
+1}=\#[\text{observations }x^{\ast}+1]\text{ .} \label{eq1.6.5}%
\end{equation}
When $n_{x^{\ast}}\geq n_{x^{\ast}+1}$, it is the case that $\Delta_{L}%
\geq\Delta_{U}$. And when $n_{x^{\ast}}\leq n_{x^{\ast}+1}$, correspondingly
$\Delta_{L}\leq\Delta_{U}$. Let%
\begin{equation}
m=\max\{n_{x^{\ast}},n_{x^{\ast}+1}\} \label{eq1.6.6}%
\end{equation}
and correspondingly take%
\[
\Delta_{1}=\max\{\Delta_{L},\Delta_{U}\}\text{ and }\Delta_{2}=\min
\{\Delta_{L},\Delta_{U}\}\text{ .}%
\]
Table \ref{tab1.6.2}\ then gives values for $\Delta_{1}$ and $\Delta_{2}$ for
$n\leq10$ and $\alpha=.05,.10$ and $.2$.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{($\Delta_{1}$$,$$\Delta_{2}$) for Range 1 Samples Based on Small
$n$}\label{tab1.6.2}
\begin{tabular}
[c]{llccc}
& & & $\alpha$ & \\
\multicolumn{1}{c}{$n$} & \multicolumn{1}{c}{$m$} & $.05$ & $.10$ &
$.20$\\\hline\hline
$2$ & $1$ & \multicolumn{1}{l}{$(6.147,6.147)$} &
\multicolumn{1}{l}{$(3.053,3.053)$} & \multicolumn{1}{l}{$(1.485,1.485)$%
}\\\hline
$3$ & $2$ & \multicolumn{1}{l}{$(1.552,1.219)$} &
\multicolumn{1}{l}{$(1.104,0.771)$} & \multicolumn{1}{l}{$(0.765,0.433)$%
}\\\hline
$4$ & $3$ & \multicolumn{1}{l}{$(1.025,0.526)$} &
\multicolumn{1}{l}{$(0.082,0.323)$} & \multicolumn{1}{l}{$(0.639,0.149)$}\\
& $2$ & \multicolumn{1}{l}{$(0.880,0.880)$} &
\multicolumn{1}{l}{$(0.646,0.646)$} & \multicolumn{1}{l}{$(0.441,0.441)$%
}\\\hline
$5$ & $4$ & \multicolumn{1}{l}{$(0.853,0.257)$} &
\multicolumn{1}{l}{$(0.721,0.132)$} & \multicolumn{1}{l}{$(0.592,0.024)$}\\
& $3$ & \multicolumn{1}{l}{$(0.748,0.548)$} &
\multicolumn{1}{l}{$(0.592,0.339)$} & \multicolumn{1}{l}{$(0.443,0.248)$%
}\\\hline
$6$ & $5$ & \multicolumn{1}{l}{$(0.772,0.116)$} &
\multicolumn{1}{l}{$(0.673,0.032)$} & \multicolumn{1}{l}{$(0.569,0.000)$}\\
& $4$ & \multicolumn{1}{l}{$(0.680,0.349)$} &
\multicolumn{1}{l}{$(0.562,0.235)$} & \multicolumn{1}{l}{$(0.444,0.126)$}\\
& $3$ & \multicolumn{1}{l}{$(0.543,0.543)$} &
\multicolumn{1}{l}{$(0.420,0.420)$} & \multicolumn{1}{l}{$(0.299,0.299)$%
}\\\hline
$7$ & $6$ & \multicolumn{1}{l}{$(0.726,0.035)$} &
\multicolumn{1}{l}{$(0.645,0.000)$} & \multicolumn{1}{l}{$(0.556,0.000)$}\\
& $5$ & \multicolumn{1}{l}{$(0.640,0.218)$} &
\multicolumn{1}{l}{$(0.545,0.130)$} & \multicolumn{1}{l}{$(0.446,0.046)$}\\
& $4$ & \multicolumn{1}{l}{$(0.534,0.393)$} &
\multicolumn{1}{l}{$(0.432,0.293)$} & \multicolumn{1}{l}{$(0.329,0.193)$%
}\\\hline
$8$ & $7$ & \multicolumn{1}{l}{$(0.698,0.000)$} &
\multicolumn{1}{l}{$(0.626,0.000)$} & \multicolumn{1}{l}{$(0.547,0.000)$}\\
& $6$ & \multicolumn{1}{l}{$(0.616,0.129)$} &
\multicolumn{1}{l}{$(0.534,0.058)$} & \multicolumn{1}{l}{$(0.446,0.000)$}\\
& $5$ & \multicolumn{1}{l}{$(0.527,0.281)$} &
\multicolumn{1}{l}{$(0.439,0.197)$} & \multicolumn{1}{l}{$(0.347,0.113)$}\\
& $4$ & \multicolumn{1}{l}{$(0.416,0.416)$} &
\multicolumn{1}{l}{$(0.327,0.327)$} & \multicolumn{1}{l}{$(0.236,0.236)$%
}\\\hline
$9$ & $8$ & \multicolumn{1}{l}{$(0.677,0.000)$} &
\multicolumn{1}{l}{$(0.613,0.000)$} & \multicolumn{1}{l}{$(0.541,0.000)$}\\
& $7$ & \multicolumn{1}{l}{$(0.599,0.065)$} &
\multicolumn{1}{l}{$(0.526,0.010)$} & \multicolumn{1}{l}{$(0.448,0.000)$}\\
& $6$ & \multicolumn{1}{l}{$(0.521,0.196)$} &
\multicolumn{1}{l}{$(0.443,0.124)$} & \multicolumn{1}{l}{$(0.361,0.054)$}\\
& $5$ & \multicolumn{1}{l}{$(0.429,0.321)$} &
\multicolumn{1}{l}{$(0.350,0.242)$} & \multicolumn{1}{l}{$(0.267,0.163)$%
}\\\hline
$10$ & $9$ & \multicolumn{1}{l}{$(0.662,0.000)$} &
\multicolumn{1}{l}{$(0.604,0.000)$} & \multicolumn{1}{l}{$(0.537,0.000)$}\\
& $8$ & \multicolumn{1}{l}{$(0.587,0.020)$} &
\multicolumn{1}{l}{$(0.521,0.000)$} & \multicolumn{1}{l}{$(0.450,0.000)$}\\
& $7$ & \multicolumn{1}{l}{$(0.515,0.129)$} &
\multicolumn{1}{l}{$(0.446,0.069)$} & \multicolumn{1}{l}{$(0.371,0.012)$}\\
& $6$ & \multicolumn{1}{l}{$(0.437,0.242)$} &
\multicolumn{1}{l}{$(0.365,0.174)$} & \multicolumn{1}{l}{$(0.289,0.105)$}\\
& $5$ & \multicolumn{1}{l}{$(0.346,0.346)$} &
\multicolumn{1}{l}{$(0.275,0.275)$} & \multicolumn{1}{l}{$(0.200,0.200)$%
}\\\hline\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\subsubsection{Intervals for a Normal Standard Deviation Based on
Integer-Rounded Data}
Specifically regarding the sets for $\sigma$ in display (\ref{eq1.6.4}), Lee
found that in order to get small $n$ actual confidence levels not too far from
nominal, one must not only replace the value $\chi_{\alpha:1}^{2}$ with
something larger, but must make an additional adjustment for samples with
ranges $0$ and $1$.
Consider first replacing $\chi_{\alpha:1}^{2}$ in display (\ref{eq1.6.4}) with
a (larger) value $d(n,\alpha)$ given in Table \ref{tab1.6.3}. Lee found that
for those $(\mu,\sigma)$ with moderate to large $\sigma$, making this
$d(n,\alpha)$ for $\chi_{\alpha:1}^{2}$ substitution is enough to produce an
actual confidence level approximating the nominal one. However, even this
modification is not adequate to produce an acceptable coverage probability for
$(\mu,\sigma)$ with small $\sigma$.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{$d(n,\alpha)$ for Use in Estimating $\sigma$}\label{tab1.6.3}
\begin{tabular}
[c]{llll}
& & $\alpha$ & \\
\multicolumn{1}{c}{$n$} & \multicolumn{1}{c}{$.05$} & &
\multicolumn{1}{c}{$.10$}\\\hline
\multicolumn{1}{r}{$2$} & \multicolumn{1}{r}{$10.47$} & &
\multicolumn{1}{r}{$7.71$}\\
\multicolumn{1}{r}{$3$} & \multicolumn{1}{r}{$7.26$} & &
\multicolumn{1}{r}{$5.23$}\\
\multicolumn{1}{r}{$4$} & \multicolumn{1}{r}{$6.15$} & &
\multicolumn{1}{r}{$4.39$}\\
\multicolumn{1}{r}{$5$} & \multicolumn{1}{r}{$5.58$} & &
\multicolumn{1}{r}{$3.97$}\\
\multicolumn{1}{r}{$6$} & \multicolumn{1}{r}{$5.24$} & &
\multicolumn{1}{r}{$3.71$}\\
\multicolumn{1}{r}{$7$} & \multicolumn{1}{r}{$5.01$} & &
\multicolumn{1}{r}{$3.54$}\\
\multicolumn{1}{r}{$8$} & \multicolumn{1}{r}{$4.84$} & &
\multicolumn{1}{r}{$3.42$}\\
\multicolumn{1}{r}{$9$} & \multicolumn{1}{r}{$4.72$} & &
\multicolumn{1}{r}{$3.33$}\\
\multicolumn{1}{r}{$10$} & \multicolumn{1}{r}{$4.62$} & &
\multicolumn{1}{r}{$3.26$}\\
\multicolumn{1}{r}{$15$} & \multicolumn{1}{r}{$4.34$} & &
\multicolumn{1}{r}{$3.06$}\\
\multicolumn{1}{r}{$20$} & \multicolumn{1}{r}{$4.21$} & &
\multicolumn{1}{r}{$2.97$}\\
\multicolumn{1}{r}{$30$} & \multicolumn{1}{r}{$4.08$} & &
\multicolumn{1}{r}{$2.88$}\\
\multicolumn{1}{r}{$\infty$} & \multicolumn{1}{r}{$3.84$} & &
\multicolumn{1}{r}{$2.71$}\\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
For samples with range $0$ or $1$, formula (\ref{eq1.6.4}) prescribes
intervals of the form $(0,U)$. And reasoning that when $\sigma$ is small,
samples will typically have range $0$ or $1$, Lee was able to find (larger)
replacements for the limit $U$ prescribed by (\ref{eq1.6.4}) so that the
resulting estimation method has actual confidence level not much below the
nominal level for any $(\mu,\sigma)$ (with $\sigma$ \textit{large or small).}
That is if a $0$-range sample is observed, estimate $\sigma$ by%
\[
(0,\Lambda_{0})
\]
where $\Lambda_{0}$ is taken from Table \ref{tab1.6.4}. If a range $1$ sample
is observed consisting, say, of values $x^{\ast}$ and $x^{\ast}+1$, and
$n_{x^{\ast}},n_{x^{\ast}+1}$ and $m$ are as in displays (\ref{eq1.6.5}) and
(\ref{eq1.6.6}), estimate $\sigma$ using%
\[
(0,\Lambda_{1,m})
\]
where $\Lambda_{1,m}$ is taken from Table \ref{tab1.6.5}.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{$\Lambda_{0}$ for Use in Estimating $\sigma$}\label{tab1.6.4}
\begin{tabular}
[c]{cccc}
& & $\alpha$ & \\
$n$ & $.05$ & & $.10$\\\hline
\multicolumn{1}{r}{$2$} & \multicolumn{1}{r}{$5.635$} & &
\multicolumn{1}{r}{$2.807$}\\
\multicolumn{1}{r}{$3$} & \multicolumn{1}{r}{$1.325$} & &
\multicolumn{1}{r}{$0.916$}\\
\multicolumn{1}{r}{$4$} & \multicolumn{1}{r}{$0.822$} & &
\multicolumn{1}{r}{$0.653$}\\
\multicolumn{1}{r}{$5$} & \multicolumn{1}{r}{$0.666$} & &
\multicolumn{1}{r}{$0.558$}\\
\multicolumn{1}{r}{$6$} & \multicolumn{1}{r}{$0.586$} & &
\multicolumn{1}{r}{$0.502$}\\
\multicolumn{1}{r}{$7$} & \multicolumn{1}{r}{$0.533$} & &
\multicolumn{1}{r}{$0.464$}\\
\multicolumn{1}{r}{$8$} & \multicolumn{1}{r}{$0.495$} & &
\multicolumn{1}{r}{$0.435$}\\
\multicolumn{1}{r}{$9$} & \multicolumn{1}{r}{$0.466$} & &
\multicolumn{1}{r}{$0.413$}\\
\multicolumn{1}{r}{$10$} & \multicolumn{1}{r}{$0.443$} & &
\multicolumn{1}{r}{$0.396$}\\
\multicolumn{1}{r}{$11$} & \multicolumn{1}{r}{$0.425$} & &
\multicolumn{1}{r}{$0.381$}\\
\multicolumn{1}{r}{$12$} & \multicolumn{1}{r}{$0.409$} & &
\multicolumn{1}{r}{$0.369$}\\
\multicolumn{1}{r}{$13$} & \multicolumn{1}{r}{$0.396$} & &
\multicolumn{1}{r}{$0.358$}\\
\multicolumn{1}{r}{$14$} & \multicolumn{1}{r}{$0.384$} & &
\multicolumn{1}{r}{$0.349$}\\
\multicolumn{1}{r}{$15$} & \multicolumn{1}{r}{$0.374$} & &
\multicolumn{1}{r}{$0.341$}\\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{$\Lambda_{1,m}$ for Use in Estimating $\sigma$ ($m$ in
Parentheses)}\label{tab1.6.5}
\begin{tabular}
[c]{llllllll}
& & & & $\alpha$ & & & \\
\multicolumn{1}{c}{$n$} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{$.05$} &
\multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{} &
\multicolumn{1}{c}{$.10$} & \multicolumn{1}{c}{}\\\hline
\multicolumn{1}{r}{$2$} & \multicolumn{1}{r}{$16.914(1)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$8.439(1)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$3$} & \multicolumn{1}{r}{$3.535(2)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$2.462(2)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$4$} & \multicolumn{1}{r}{$1.699(3)$} &
\multicolumn{1}{r}{$2.034(2)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$1.303(3)$} & \multicolumn{1}{r}{$1.571(2)$} &
\multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$5$} & \multicolumn{1}{r}{$1.143(4)$} &
\multicolumn{1}{r}{$1.516(3)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$0.921(4)$} & \multicolumn{1}{r}{$1.231(3)$} &
\multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$6$} & \multicolumn{1}{r}{$0.897(5)$} &
\multicolumn{1}{r}{$1.153(4)$} & \multicolumn{1}{r}{$1.285(3)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.752(5)$} &
\multicolumn{1}{r}{$0.960(4)$} & \multicolumn{1}{r}{$1.054(3)$}\\
\multicolumn{1}{r}{$7$} & \multicolumn{1}{r}{$0.768(6)$} &
\multicolumn{1}{r}{$0.944(5)$} & \multicolumn{1}{r}{$1.106(4)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.660(6)$} &
\multicolumn{1}{r}{$0.800(5)$} & \multicolumn{1}{r}{$0.949(4)$}\\
\multicolumn{1}{r}{$8$} & \multicolumn{1}{r}{$0.687(7)$} &
\multicolumn{1}{r}{$0.819(6)$} & \multicolumn{1}{r}{$0.952(5)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.599(7)$} &
\multicolumn{1}{r}{$0.707(6)$} & \multicolumn{1}{r}{$0.825(5)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$1.009(4)$} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.880(4)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$9$} & \multicolumn{1}{r}{$0.629(8)$} &
\multicolumn{1}{r}{$0.736(7)$} & \multicolumn{1}{r}{$0.837(6)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.555(8)$} &
\multicolumn{1}{r}{$0.644(7)$} & \multicolumn{1}{r}{$0.726(6)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.941(5)$} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.831(5)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$10$} & \multicolumn{1}{r}{$0.585(9)$} &
\multicolumn{1}{r}{$0.677(8)$} & \multicolumn{1}{r}{$0.747(7)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.520(9)$} &
\multicolumn{1}{r}{$0.597(8)$} & \multicolumn{1}{r}{$0.654(7)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.851(6)$} &
\multicolumn{1}{r}{$0.890(5)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$0.753(6)$} & \multicolumn{1}{r}{$0.793(5)$} &
\multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$11$} & \multicolumn{1}{r}{$0.550(10)$} &
\multicolumn{1}{r}{$0.630(9)$} & \multicolumn{1}{r}{$0.690(8)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.493(10)$} &
\multicolumn{1}{r}{$0.560(9)$} & \multicolumn{1}{r}{$0.609(8)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.775(7)$} &
\multicolumn{1}{r}{$0.851(6)$} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{$0.685(7)$} & \multicolumn{1}{r}{$0.763(6)$} &
\multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$12$} & \multicolumn{1}{r}{$0.522(11)$} &
\multicolumn{1}{r}{$0.593(10)$} & \multicolumn{1}{r}{$0.646(9)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.470(11)$} &
\multicolumn{1}{r}{$0.531(10)$} & \multicolumn{1}{r}{$0.573(9)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.708(8)$} &
\multicolumn{1}{r}{$0.789(7)$} & \multicolumn{1}{r}{$0.818(6)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.626(8)$} &
\multicolumn{1}{r}{$0.707(7)$} & \multicolumn{1}{r}{$0.738(6)$}\\
\multicolumn{1}{r}{$13$} & \multicolumn{1}{r}{$0.499(12)$} &
\multicolumn{1}{r}{$0.563(11)$} & \multicolumn{1}{r}{$0.610(10)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.452(12)$} &
\multicolumn{1}{r}{$0.506(11)$} & \multicolumn{1}{r}{$0.544(10)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.658(9)$} &
\multicolumn{1}{r}{$0.733(8)$} & \multicolumn{1}{r}{$0.791(7)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.587(9)$} &
\multicolumn{1}{r}{$0.655(8)$} & \multicolumn{1}{r}{$0.716(7)$}\\
\multicolumn{1}{r}{$14$} & \multicolumn{1}{r}{$0.479(13)$} &
\multicolumn{1}{r}{$0.537(12)$} & \multicolumn{1}{r}{$0.580(11)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.436(13)$} &
\multicolumn{1}{r}{$0.485(12)$} & \multicolumn{1}{r}{$0.520(11)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.622(10)$} &
\multicolumn{1}{r}{$0.681(9)$} & \multicolumn{1}{r}{$0.745(8)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.558(10)$} &
\multicolumn{1}{r}{$0.607(9)$} & \multicolumn{1}{r}{$0.674(8)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.768(7)$} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.698(7)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\
\multicolumn{1}{r}{$15$} & \multicolumn{1}{r}{$0.463(14)$} &
\multicolumn{1}{r}{$0.515(13)$} & \multicolumn{1}{r}{$0.555(12)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.422(14)$} &
\multicolumn{1}{r}{$0.468(13)$} & \multicolumn{1}{r}{$0.499(12)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.593(11)$} &
\multicolumn{1}{r}{$0.639(10)$} & \multicolumn{1}{r}{$0.701(9)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.534(11)$} &
\multicolumn{1}{r}{$0.574(10)$} & \multicolumn{1}{r}{$0.632(9)$}\\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.748(8)$} & \multicolumn{1}{r}{} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{} & \multicolumn{1}{r}{$0.682(8)$} &
\multicolumn{1}{r}{} & \multicolumn{1}{r}{}\\\hline
\end{tabular}
{\tiny \ }%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
The use of these values $\Lambda_{0}$ for range $0$ samples, and
$\Lambda_{1,m}$ for range $1$ samples, and the values $d(n,\alpha)$ in place
of $\chi_{\alpha:1}^{2}$ in display (\ref{eq1.6.4}) finally produces a
reliable method of confidence interval estimation for $\sigma$ when normal
data are integer-rounded.
\chapter{Process Monitoring}
Chapters 3 and 4 of V\&J discuss methods for process monitoring. The key
concept there regarding the probabilistic description of monitoring schemes is
the run length idea introduced on page 91 and specifically in display (3.44).
Theory for describing run lengths is given in V\&J only for the very simplest
case of geometrically distributed $T$. This chapter presents some more general
tools for the analysis/comparison of run length distributions of monitoring
schemes, namely discrete time finite state Markov chains and recursions
expressed in terms of integral (and difference) equations.
\section{Some Theory for Stationary Discrete Time Finite State Markov Chains
With a Single Absorbing State}
These are probability models for random systems that at times $t=1,2,3\ldots$
can be in one of a finite number of states
\[
\mbox{S}_{1},\mbox{S}_{2},\ldots,\mbox{S}_{m},\mbox{S}_{m+1}\ .
\]
The ``Markov'' assumption is that the conditional distribution of where the
system is at time $t+1$ given the entire history of where it has been up
through time $t$ only depends upon where it is at time $t$. (In colloquial
terms: The conditional distribution of where I'll be tomorrow given where I am
and how I got here depends only on where I am, not on how I got here.) So
called ``stationary'' Markov Chain (MC) models employ the assumption that
movement between states from any time $t$ to time $t+1$ is governed by a
(single) matrix of (one-step) ``transition probabilities'' (that is
independent of $t$)
\[
\mathop{\mbox{\boldmath$P$}}\limits_{(m+1)\times(m+1)}=\left( p_{ij}\right)
\]
where
\[
p_{ij}=P[\mbox{system is in S}_{j}\ \mbox{at time}\ t+1\,|\,\mbox{system is in
S}_{i}\ \mbox{at time}\ t]\ .
\]
As a simple example of this, consider the transition matrix
\begin{equation}
\mathop{\mbox{\boldmath$P$}}\limits_{3\times3}\doteq\left(
\begin{array}
[c]{rrr}%
.8 & .1 & .1\\
.9 & .05 & .05\\
0 & 0 & 1
\end{array}
\right) \ . \label{eq2.1.1}%
\end{equation}
Figure \ref{fig2.1.1} is a useful schematic representation of this model.%
%TCIMACRO{\FRAME{ftbpFU}{3.2837in}{2.6126in}{0pt}{\Qcb{Schematic for a MC with
%Transition Matrix (\ref{eq2.1.1})}}{\Qlb{fig2.1.1}}{fig2-1-1.ps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 3.2837in; height 2.6126in; depth 0pt; original-width 7.235in;
%original-height 5.7449in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/fig2-1-1.ps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.6126in,
width=3.2837in
]%
{../CLASS/531/Notes/fig2-1-1.ps}%
\caption{Schematic for a MC with Transition Matrix (\ref{eq2.1.1})}%
\label{fig2.1.1}%
\end{center}
\end{figure}
%EndExpansion
The Markov Chain represented by Figure \ref{fig2.1.1} has an interesting
property. That is, while it is possible to move back and forth between states
1 and 2, once the system enters state 3, it is ``stuck'' there. The standard
jargon for this property is to say that S$_{3}$ is an \textit{absorbing
state}. (In general, if $p_{ii}=1$, S$_{i}$ is called an absorbing state.)
Of particular interest in applications of MCs to the description of process
monitoring schemes are chains with a single absorbing state, say S$_{m+1}$,
where it is possible to move (at least eventually) from any other state to the
absorbing state. One thing that makes these chains so useful is that it is
very easy to write down a matrix formula for a vector giving the mean number
of transitions required to reach S$_{m+1}$ from any of the other states. That
is, with
\[
L_{i}=\mbox{the mean number of transitions required to move from S}_{i}%
~\mbox{to S}_{m+1}\ ,
\]%
\[
\mathop{\mbox{\boldmath$L$}}\limits_{m\times1}=\left(
\begin{array}
[c]{c}%
L_{1}\\
L_{2}\\
\vdots\\
L_{m}%
\end{array}
\right) \ ,\quad\mathop{\mbox{\boldmath$P$}}\limits_{(m+1)\times
(m+1)}=\left(
\begin{array}
[c]{cc}%
\mathop{\mbox{\boldmath$R$}}\limits_{m\times m} & \mathop{\mbox{\boldmath$r$}%
}\limits_{m\times1}\\
\mathop{\mbox{\boldmath$0$}}\limits_{1\times m} & \mathop{1}\limits_{1\times1}%
\end{array}
\right) \ ,\quad\mbox{and}\quad\mathop{\mbox{\boldmath$1$}}\limits_{m\times
1}=\left(
\begin{array}
[c]{c}%
1\\
1\\
\vdots\\
1
\end{array}
\right)
\]
it is the case that
\begin{equation}
\mbox{\boldmath$L$}=(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})^{-1}%
\mathbf{1}\ . \label{eq2.1.2}%
\end{equation}
To argue that display (\ref{eq2.1.2}) is correct, note that the following
system of $m$ equations ``clearly'' holds:
\begin{align*}
L_{1} & =(1+L_{1})p_{11}+(1+L_{2})p_{12}+\cdots+(1+L_{m})p_{1m}+1\cdot
p_{1,m+1}\\
L_{2} & =(1+L_{1})p_{21}+(1+L_{2})p_{22}+\cdots+(1+L_{m})p_{2m}+1\cdot
p_{2,m+1}\\
& \vdots\\
L_{m} & =(1+L_{1})p_{m1}+(1+L_{2})p_{m2}+\cdots+(1+L_{m})p_{mm}+1\cdot
p_{m,m+1}\ .
\end{align*}
But this set is equivalent to the set
\begin{align*}
L_{1} & =1+p_{11}L_{1}+p_{12}L_{2}+\cdots+p_{1m}L_{m}\\
L_{2} & =1+p_{21}L_{1}+p_{22}L_{2}+\cdots+p_{2m}L_{m}\\
& \vdots\\
L_{m} & =1+p_{m1}L_{1}+p_{m2}L_{2}+\cdots+p_{mm}L_{m}%
\end{align*}
and in matrix notation, this second set of equations is
\begin{equation}
\mbox{\boldmath$L$}=\mbox{\boldmath$1$}+\mbox{\boldmath$RL$}\ .
\label{eq2.1.3}%
\end{equation}
So
\[
\mbox{\boldmath$L$}-\mbox{\boldmath$RL$}=\mbox{\boldmath$1$}\ ,
\]
i.e.
\[
(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})\mbox{\boldmath$L$}=\mbox
{\boldmath$1$}\ .
\]
Under the conditions of the present discussion it is the case that
$(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})$ is guaranteed to be nonsingular,
so that multiplying both sides of this matrix equation by the inverse of
$(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})$ one finally has equation
(\ref{eq2.1.2}).
For the simple 3-state example with transition matrix (\ref{eq2.1.1}) it is
easy enough to verify that with
\[
\mbox{\boldmath$R$}=\left(
\begin{array}
[c]{rr}%
.8 & .1\\
.9 & .05
\end{array}
\right)
\]
one has
\[
(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})^{-1}\mathbf{1}=\left(
\begin{array}
[c]{c}%
10.5\\
11
\end{array}
\right) \ .
\]
That is, the mean number of transitions required for absorption (into S$_{3}$)
from S$_{1}$ is $10.5$ while the mean number required from S$_{2}$ is $11.0 $.
When one is working with numerical values in $\mbox{\boldmath$P$}$ and thus
wants numerical values in $\mbox{\boldmath$L$}$, the matrix formula
(\ref{eq2.1.2}) is most convenient for use with numerical analysis software.
When, on the other hand, one has some algebraic expressions for the $p_{ij}$
and wants algebraic expressions for the $L_{i}$, it is usually most effective
to write out the system of equations represented by display (\ref{eq2.1.3})
and to try and see some slick way of solving for an $L_{i}$ of interest.
It is also worth noting that while the discussion in this section has centered
on the computation of mean times to absorption, other properties of ``time to
absorption'' variables can be derived and expressed in matrix notation. For
example, Problem 2.22 shows that it is fairly easy to find the variance (or
standard deviation) of time to absorption variables.
\section{Some Applications of Markov Chains to the Analysis of Process
Monitoring Schemes}
When the ``current condition'' of a process monitoring scheme can be thought
of as discrete random variable (with a finite number of possible values), because
\begin{enumerate}
\item the variables $Q_{1},Q_{2},..$. fed into it are intrinsically discrete
(for example representing counts) and are therefore naturally modeled using a
discrete probability distribution (and the calculations prescribed by the
scheme produce only a fixed number of possible outcomes),
\item ``discretization'' of the $Q$'s has taken place as a part of the
development of the monitoring scheme (as, for example, in the ``zone test''
schemes outlined in Tables 3.5 through 3.7 of V\&J), or
\item one approximates continuous distributions for $Q$'s and/or states of the
scheme with a ``finely-discretized'' version in order to approximate exact
(continuous) run length properties,
\end{enumerate}
\noindent one can often apply the material of the previous section to the
prediction of scheme behavior. (This is possible when the evolution of the
monitoring scheme can be thought of in terms of movement between ``states''
where the conditional distribution of the next ``state'' depends only on a
distribution for the next $Q$ which itself depends only on the current
``state'' of the scheme.) This section contains four examples of what can be
done in this direction.
As an initial simple example, consider the simple monitoring scheme (suggested
in the book \textit{Sampling Inspection and Quality Control} by Wetherill)
that signals an alarm the first time
\begin{enumerate}
\item a single point $Q$ plots ``outside 3 sigma limits,'' or
\item two consecutive $Q$'s plot ``between 2 and 3 sigma limits.''
\end{enumerate}
\noindent(This is a simple competitor to the sets of alarm rules specified in
Tables 3.5 through 3.7 of V\&J.) Suppose that one assumes that $Q_{1}%
,Q_{2},\ldots$ are iid and
\[
q_{1}=P[Q_{1}\mbox{ plots outside 3 sigma limits}]
\]
and
\[
q_{2}=P[Q_{1}\,\mbox{ plots between 2 and 3 sigma limits}]\ .
\]
Then one might think of describing the evolution of the monitoring scheme with
a 3-state MC with states
\begin{align*}
\mbox{S}_{1} & =\mbox{``all is OK,''}\\
\mbox{S}_{2} & =\mbox{``no alarm yet and the current}\ Q\ \mbox{is between 2
and 3 sigma limits,'' and}\\
\mbox{S}_{3} & =\mbox{``alarm.''}%
\end{align*}
For this representation, an appropriate transition matrix is
\begin{equation}
\mbox{\boldmath$P$}=\left(
\begin{array}
[c]{ccc}%
1-q_{1}-q_{2} & q_{2} & q_{1}\\
1-q_{1}-q_{2} & 0 & q_{1}+q_{2}\\
0 & 0 & 1
\end{array}
\right) \label{eq2.2.1}%
\end{equation}
and the ARL of the scheme (under the iid model for the $Q$ sequence) is
$L_{1}$, the mean time to absorption into the alarm state from the ``all-OK''
state. Figure \ref{fig2.2.1} is a schematic representation of this scenario.%
%TCIMACRO{\FRAME{ftbpFU}{3.0139in}{2.6377in}{0pt}{\Qcb{Schematic for a MC with
%Transition Matrix (\ref{eq2.2.1})}}{\Qlb{fig2.2.1}}{fig2-2-1.ps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 3.0139in; height 2.6377in; depth 0pt; original-width 6.6366in;
%original-height 5.7994in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/fig2-2-1.ps';file-properties "XNPEU";}}}%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.6377in,
width=3.0139in
]%
{../CLASS/531/Notes/fig2-2-1.ps}%
\caption{Schematic for a MC with Transition Matrix (\ref{eq2.2.1})}%
\label{fig2.2.1}%
\end{center}
\end{figure}
%EndExpansion
It is worth noting that a system of equations for $L_{1}$ and $L_{2}$ is
\begin{align*}
L_{1} & =1\cdot q_{1}+(1+L_{2})q_{2}+(1+L_{1})(1-q_{1}-q_{2})\\
L_{2} & =1\cdot(q_{1}+q_{2})+(1+L_{1})(1-q_{1}-q_{2})\ ,
\end{align*}
which is equivalent to
\begin{align*}
L_{1} & =1+L_{1}\cdot(1-q_{1}-q_{2})+L_{2}q_{2}\\
L_{2} & =1+L_{1}(1-q_{1}-q_{2})\ ,
\end{align*}
which is the ``non-matrix version'' of the system (\ref{eq2.1.3}) for this
example. It is easy enough to verify that this system of two linear equations
in the unknowns $L_{1}$ and $L_{2}$ has a (simultaneous) solution with
\[
L_{1}=\frac{1+q_{2}}{1-(1-q_{1}-q_{2})-q_{2}(1-q_{1}-q_{2})}\ .
\]
As a second application of MC technology to the analysis of a process
monitoring scheme, we will consider a so-called ``Run-Sum'' scheme. To define
such a scheme, one begins with ``zones'' for the variable $Q$ as indicated in
Figure 3.9 of V\&J. Then ``scores'' are defined for various possible values of
$Q$. For $j=0,1,2$ a score of $+j$ is assigned to the eventuality that $Q$ is
in the ``positive $j$-sigma to $(j+1)$-sigma zone,'' while a score of $-j$ is
assigned to the eventuality that $Q$ is in the ``negative $j$-sigma to
$(j+1)$-sigma zone.'' A score of $+3$ is assigned to any $Q$ above the ``upper
3-sigma limit'' while a score of $-3$ is assigned to any $Q$ below the ``lower
3-sigma limit.'' Then, for the variables $Q_{1},Q_{2},\ldots$ one defines
corresponding scores $Q_{1}^{\ast},Q_{2}^{\ast},\ldots$ and ``run sums''
$R_{1},R_{2},\ldots$ where%
\begin{align*}
R_{i} & =\text{``the `sum' of scores }Q^{\ast}\text{ through time }i\text{
under the provision that a }\\
& \text{new sum is begun whenever a score is observed with a sign
different}\\
& \text{from the existing Run-Sum.''}%
\end{align*}
(Note, for example, that a new score of $Q^{\ast}=+0$ will reset a current
Run-Sum of $R=-2$ to $+0$.) The Run-Sum scheme then signals at the first $i$
for which $\left| Q_{i}^{\ast}\right| =3$ \ or \ $\left| R_{i}\right|
\geq4 $.
Then define states for a Run-Sum process monitoring scheme
\begin{align*}
\mbox{S}_{1} & =\mbox{``no alarm yet and }R=-0\mbox{,''}\\
\mbox{S}_{2} & =\mbox{``no alarm yet and }R=-1\mbox{,''}\\
\mbox{S}_{3} & =\mbox{``no alarm yet and }R=-2\mbox{,''}\\
\mbox{S}_{4} & =\mbox{``no alarm yet and }R=-3\mbox{,''}\\
\mbox{S}_{5} & =\mbox{``no alarm yet and }R=+0\mbox{,''}\\
\mbox{S}_{6} & =\mbox{``no alarm yet and }R=+1\mbox{,''}\\
\mbox{S}_{7} & =\mbox{``no alarm yet and }R=+2\mbox{,''}\\
\mbox{S}_{8} & =\mbox{``no alarm yet and }R=+3\mbox{,'' and}\\
\mbox{S}_{9} & =\mbox{``alarm.''}%
\end{align*}
If one assumes that the observations $Q_{1},Q_{2},\ldots$ are iid and for
$j=-3,-2,-1,-0,$ $+0,+1,+2,+3$ lets
\[
q_{j}=P[Q_{1}^{\ast}=j]\ ,
\]
an appropriate transition matrix for describing the evolution of the scheme
is
\[
\mbox{\boldmath$P$}=\left(
\begin{array}
[c]{ccccccccc}%
q_{-0} & q_{-1} & q_{-2} & 0 & q_{+0} & q_{+1} & q_{+2} & 0 & q_{-3}+q_{+3}\\
0 & q_{-0} & q_{-1} & q_{-2} & q_{+0} & q_{+1} & q_{+2} & 0 & q_{-3}+q_{+3}\\
0 & 0 & q_{-0} & q_{-1} & q_{+0} & q_{+1} & q_{+2} & 0 & q_{-3}+q_{-2}%
+q_{+3}\\
0 & 0 & 0 & q_{-0} & q_{+0} & q_{+1} & q_{+2} & 0 & q_{-3}+q_{-2}%
+q_{-1}+q_{+1}\\
q_{-0} & q_{-1} & q_{-2} & 0 & q_{+0} & q_{+1} & q_{+2} & 0 & q_{-3}+q_{+3}\\
q_{-0} & q_{-1} & q_{-2} & 0 & 0 & q_{+0} & q_{+1} & q_{+2} & q_{-3}+q_{+3}\\
q_{-0} & q_{-1} & q_{-2} & 0 & 0 & 0 & q_{+0} & q_{+1} & q_{-3}+q_{+2}%
+q_{+3}\\
q_{-0} & q_{-1} & q_{-2} & 0 & 0 & 0 & 0 & q_{+0} & q_{-3}+q_{+1}%
+q_{+2}+q_{+3}\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
\end{array}
\right)
\]
and the ARL for the scheme is $L_{1}=L_{5}$. (The fact that the 1st and 5th
rows of $\mbox{\boldmath$P$}$ are identical makes it clear that the mean times
to absorption from S$_{1}$ and S$_{5}$ must be the same.) It turns out that
clever manipulation with the ``non-matrix'' version of display (\ref{eq2.1.3})
in this example even produces a fairly simple expression for the scheme's ARL.
(See Problem 2.24 and Reynolds (1971 \textit{JQT}) and the references therein
in this final regard.)
To turn to a different type of application of the MC technology, consider the
analysis of a high side decision interval CUSUM scheme as described in \S4.2
of V\&J. Suppose that the variables $Q_{1},Q_{2},\ldots$ are iid with a
continuous distribution specified by the probability density $f(y)$. Then the
variables $Q_{1}-k_{1},Q_{2}-k_{1},Q_{3}-k_{1},\ldots$ are iid with
probability density $f^{\ast}(y)=f(y+k_{1})$. For a positive integer $m $, we
will think of replacing the variables $Q_{i}-k_{1}$ with versions of them
rounded to the nearest multiple of $h/m$ before CUSUMing. Then the CUSUM
scheme can be thought of in terms of a MC with states
\[
\mbox{S}_{i}=\mbox{``no alarm yet and the current CUSUM is }(i-1)\left(
\frac{h}{m}\right) \mbox{''}%
\]
for $i=1,2,\ldots,m$ and
\[
\mbox{S}_{m+1}=\mbox{``alarm.''}%
\]
Then let
\[
q_{-m}=\int_{-\infty}^{-h+\frac{1}{2}\left( \frac{h}{m}\right) }f^{\ast
}(y)dy=P[Q_{1}-k_{1}\leq-h+\frac{1}{2}\left( \frac{h}{m}\right) ]\ ,
\]%
\[
q_{m}=\int_{h-\frac{1}{2}\left( \frac{h}{m}\right) }^{\infty}f^{\ast
}(y)dy=P[h-\frac{1}{2}\left( \frac{h}{m}\right) UCL_{EWMA}$ or $\lambda Q_{1}+(1-\lambda
)uUCL_{x}$) there will be an immediate
signal and the run length will be 1. If $x_{1}$is not extreme ($LCL_{x}\leq
x_{1}\leq UCL_{x}$) one observation will have been spent and on average
another $L(x_{1})$ observations will be required in order to produce a signal.
So it is reasonable that the ARL for the $X/MR$ scheme is
\[
ARL=1\cdot\left( 1-P[LCL_{x}\leq x_{1}\leq UCL_{x}]\right) +\int_{LCL_{x}%
}^{UCL_{x}}(1+L(y))f(y)dy\ ,
\]
that is
\begin{equation}
ARL=1+\int_{LCL_{x}}^{UCL_{x}}L(y)f(y)dy\ , \label{eq2.3.7}%
\end{equation}
where it remains to find a way of computing the function $L(y)$ in order to
feed it into expression (\ref{eq2.3.7}).
In order to derive an integral equation for $L(y)$ consider the situation if
there has been no alarm and the current individual observation is $y$. There
are two possibilities of where one will be after observing one more
individual, $x$. If $x$ is extreme or too far from $y$ ($xUCL_{x}$ or $|x-y|>UCL_{R}$) only one additional observation is required to
produce a signal. On the other hand, if $x$ is not extreme and not too far
from $y$ ($LCL_{x}\leq x\leq UCL_{x}$ and $|x-y|\leq UCL_{R}$) one more
observation will have been spent and on average another $L(x)$ will be
required to produce a signal. That is,
\begin{align*}
L(y) & =1\cdot\left( P[xUCL_{x}\quad\mbox
{or}\quad|x-y|>UCL_{R}]\right) \\
& {}+\int_{\max\left( LCL_{x},y-UCL_{R}\right) }^{\min\left(
UCL_{x},y+UCL_{R}\right) }(1+L(x))f(x)dx\ ,
\end{align*}
that is,
\begin{align}
L(y) & =1+\int_{\max\left( LCL_{x},y-UCL_{R}\right) }^{\min\left(
UCL_{x},y+UCL_{R}\right) }L(x)f(x)dx\nonumber\\
& =1+\int_{LCL_{x}}^{UCL_{x}}I[|x-y|\leq UCL_{R}]L(x)f(x)dx\ .
\label{eq2.3.8}%
\end{align}
(The notation $I[A]\,$is ``indicator function'' notation, meaning that when
$A$ holds $I[A]=1,$ and otherwise $I[A]=0$.) As in the earlier CUSUM and EWMA
examples, once one specifies a quadrature rule for definite integrals on the
interval $[LCL_{x},UCU_{x}]$, this expression (\ref{eq2.3.8}) provides a set
of $m$ linear equations for approximate values of $L(a_{i})$'s. When this
system is solved, the resulting values can be fed into a discretized version
of equation (\ref{eq2.3.7}) and an approximate ARL produced. It is worth
noting that the potential discontinuities of the integrand in equation
(\ref{eq2.3.8}) (produced by the indicator function) have the effect of making
numerical solutions of this equation much less well-behaved than those for the
other integral equations developed in this section.
The examples of this section have dealt only with ARLs for schemes based on
(continuous) iid observations. It therefore should be said that:
\begin{enumerate}
\item The iid assumption can in some cases be relaxed to give tractable
integral equations for situations where correlated sequences $Q_{1}%
,Q_{2},\ldots$ are involved (see for example Problem 2.27),
\item Other descriptors of the run length distribution (beyond the ARL) can
often be shown to solve simple integral equations (see for example the
integral equations for CUSUM run length second moment and run length
probability function in Problem 2.31), and
\item In some cases, with discrete variables $Q$ there are difference equation
analogues of the integral equations presented here (that ultimately correspond
to the kind of MC calculations illustrated in the previous section).
\end{enumerate}
\chapter{An Introduction to Discrete Stochastic Control Theory/Minimum
Variance Control}
Section 3.6 of V\&J provides an elementary introduction to the topic of
Engineering Control and contrasts this adjustment methodology with (the
process monitoring methodology of) control charting. The last item under the
Engineering Control heading of Table 3.10 of V\&J makes reference to ``optimal
stochastic control'' theory. The object of this theory is to model system
behavior using probability tools and let the consequences of the model
assumptions help guide one in the choice of effective control/adjustment
algorithms. This chapter provides a very brief introduction to this theory.
\section{General Exposition}
Let
\[
\{\ldots,Z(-1),Z(0),Z(1),Z(2),\ldots\}
\]
stand for observations on a process \textit{assuming that no control actions
are taken}. One first needs a stochastic/probabilistic model for the sequence
$\{Z(t)\}$, and we will let
\[
\mathcal{F}%
\]
stand for such a model. $\mathcal{F}$ is a joint distribution for the $Z$'s
and might, for example, be:
\begin{enumerate}
\item a simple random walk model specified by the equation
$Z(t)=Z(t-1)+\epsilon(t)$, where the $\epsilon$'s are iid normal
($0,\sigma^{2}) $ random variables,
\item a random walk model with drift specified by the equation
$Z(t)=Z(t-1)+d+\epsilon(t)$, where $d$ is a constant and the $\epsilon$'s are
iid normal $(0,\sigma^{2})$ random variables, or
\item some Box-Jenkins ARIMA model for the $\{Z(t)\}$ sequence.
\end{enumerate}
Then let
\[
a(t)
\]
stand for a control action taken at time $t$, after observing the process. One
needs notation for the current impact of control actions taken in past
periods, so we will further let
\[
A(a,s)
\]
stand for the current impact on the process of a control action $a$ taken $s$
periods ago. In many systems, the control actions, $a$, are numerical, and
$A(a,s)=ah(s)$ where $h(s)$ is the so-called ``impulse response function''
giving the impact of a unit control action taken $s$ periods previous.
$A(a,s)$ might, for example, be:
\begin{enumerate}
\item given by $A(a,s)=a$ for $s\geq1$ in a machine tool control problem where
``$a$'' means ``move the cutting tool out $a$ units'' (and the controlled
variable is a measured dimension of a work piece),
\item given by $A(a,s)=0$ for $s\leq u$ and by $A(a,s)=a$ for $s>u$ in a
machine tool control problem where ``$a$'' means ``move the cutting tool out
$a$ units'' and there are $u$ periods of dead time, or
\item given by $A(a,s)=\left( 1-\exp\left( \frac{-sh}{\tau}\right) \right)
\,a$ for $s\geq1$ in a chemical process control problem with time constant
$\tau$ and control period $h$ seconds.
\end{enumerate}
We will then assume that what one actually observes for (controlled) process
behavior at time $t\geq1$ is
\[
Y(t)=Z(t)+\sum_{s=0}^{t-1}A(a(s),t-s)\ ,
\]
which is the sum of what would have been observed with no control and all of
the current effects of previous control actions. For $t\geq0$, $a(t)$ will be
chosen based on
\[
\{\ldots,Z(-1),Z(0),Y(1),Y(2),\ldots,Y(t)\}\ .
\]
A common objective in this context is to choose the actions so as to minimize
\[
E_{\mathcal{F}}\left( Y(t)-T(t)\right) ^{2}%
\]
or
\[
\sum_{s=1}^{t}E_{\mathcal{F}}\left( Y(s)-T(s)\right) ^{2}%
\]
for some (possibly time-dependent) target value $T(s)$. The problem of
choosing of control actions to accomplish this goal is called the ``minimum
variance`` (MV) control problem, and it has a solution that can be described
in fairly (deceptively, perhaps) simple terms.
Note first that given $\{\ldots,Z(-1),Z(0),Y(1),Y(2),\ldots,Y(t)\}$ one can
recover $\{\ldots,Z(-1),Z(0),Z(1),Z(2),\ldots,Z(t)\}$. This is because
\[
Z(s)=Y(s)-\sum_{r=0}^{s-1}A(a(r),s-r)
\]
i.e., to get $Z(s)$, one simply subtracts the (known) effects of previous
control actions from $Y(s)$.
Then the model $\mathcal{F}$ (at least in theory) provides one a conditional
distribution for $Z(t+1),Z(t+2),Z(t+3),\ldots$ given the observed $Z$'s
through time $t$. The conditional distribution for $Z(t+1),Z(t+2),Z(t+3)\ldots
$ given what one can observe through time $t$, namely $\{\ldots
,Z(-1),Z(0),Y(1),Y(2),\ldots,Y(t)\}$, is then the conditional distribution one
gets for $Z(t+1),Z(t+2),Z(t+3),\ldots$ from the model $\mathcal{F}$ after
recovering $Z(1),Z(2),\ldots,Z(t)$ from the corresponding $Y$'s. Then for
$s\geq t+1$, let
\[
E_{\mathcal{F}}[Z(s)|\ldots,Z(-1),Z(0),Z(1),Z(2),\ldots,Z(t)]~~\mbox{or
just}~~E_{\mathcal{F}}[Z(s)|Z^{t}]
\]
stand for the mean of this conditional distribution of $Z(s)$ available at
time $t$.
Suppose that there are $u\geq0$ periods of dead time ($u$ could be 0). Then
the earliest $Y$ that one can hope to influence by choice of $a(t)$ is
$Y(t+u+1)$. Notice then that if one takes action $a(t)$ at time $t$, one's
most natural projection of $Y(t+u+1)$ at time $t$ is
\[
\widehat{Y}(t+u+1|t)\doteq E_{\mathcal{F}}[Z(t+u+1)|Z^{t}]+\sum_{s=0}%
^{t-1}A(a(s),t+u+1-s)+A(a(t),u+1)
\]
It is then natural (and in fact turns out to give the MV control strategy) to
try to choose $a(t)$ so that
\[
\widehat{Y}(t+u+1|t)=T(t+u+1)\ .
\]
That is, the MV strategy is to try to choose $a(t)$ so that
\[
A(a(t),u+1)=T(t+u+1)-\left\{ E_{\mathcal{F}}[Z(t+u+1)|Z^{t}]+\sum_{s=0}%
^{t-1}A(a(s),t+u+1-s)\right\} \ .
\]
A caveat here is that in practice MV control tends to be ``ragged.'' That is,
in order to exactly optimize the mean squared error, constant tweaking (and
often fairly large adjustments are required). By changing one's control
objective somewhat it is possible to produce ``smoother'' optimal control
policies that are nearly as effective as MV algorithms in terms of keeping a
process on target. That is, instead of trying to optimize
\[
E_{\mathcal{F}}\sum_{s=1}^{t}\left( Y(s)-T(s)\right) ^{2}\ ,
\]
in a situation where the $a$'s are numerical ($a=0$ indicating ``no
adjustment'' and the ``size'' of adjustments increasing with $|a|$) one might
for a constant $\lambda>0$ set out to minimize the alternative criterion
\[
E_{\mathcal{F}}\left( \sum_{s=1}^{t}\left( Y(s)-T(s)\right) ^{2}%
+\lambda\sum_{s=0}^{t-1}(a(s))^{2}\right) \ .
\]
Doing so will ``smooth'' the MV algorithm.
\section{An Example}
To illustrate the meaning of the preceding formalism, consider the model
($\mathcal{F}$) specified by
\begin{equation}
\left.
\begin{array}
[c]{llcl}
& Z(t)=W(t)+\epsilon(t) & \mbox{for} & t\geq0\\
\mbox{and} & W(t)=W(t-1)+d+\nu(t) & \mbox{for} & t\geq1
\end{array}
\right\} \label{eq3.2.1}%
\end{equation}
for $d$ a (known) constant, the $\epsilon$'s normal $(0,\sigma_{\epsilon}%
^{2})$, the $\nu$'s normal $(0,\sigma_{\nu}^{2})$ and all the $\epsilon$'s and
$\nu$'s independent. ($Z(t)$ is a random walk with drift observed with error.)
Under this model and an appropriate 0 mean normal initializing distribution
for $W(0)$, it is the case that each
\[
\widehat{Z}(t+1|\,t)\doteq E_{\mathcal{F}}[Z(t+1)|Z(0),\ldots,Z(t)]
\]
may be computed recursively as
\[
\widehat{Z}(t+1|t)=\alpha Z(t)+(1-\alpha)\widehat{Z}(t|t-1)+d
\]
for some constant $\alpha$ (that depends upon the known variances
$\sigma_{\epsilon}^{2}$ and $\sigma_{\nu}^{2}$).
We will find MV control policies under model (\ref{eq3.2.1}) with two
different functions $A(a,s)$. Consider first the possibility
\begin{equation}
A(a,s)=a\,\,\forall s\geq1\ ,(3.2.2) \label{eq3.2.2}%
\end{equation}
(an adjustment ``$a$'' at a given time period takes its full and permanent
effect at the next time period).
Consider the situation at time $t=0$. Available are $Z(0)$ and $\widehat
{Z}(0\mathopen|-1)$ (the prior mean of $W(0)$) and from these one may compute
the prediction
\[
\widehat{Z}(1|0)\doteq\alpha Z(0)+(1-\alpha)\widehat{Z}(0\mathopen|-1)+d\ .
\]
That means that taking control action $a(0)$, one should predict a value of
\[
\widehat{Y}(1|0)\doteq\widehat{Z}(1|0)+a(0)
\]
for the controlled process at time $t=1$, and upon setting this equal to the
target $T(1)$ and solving for $a(0)$ one should thus choose
\[
a(0)=T(1)-\widehat{Z}(1|0)\ .
\]
At time $t=1$ one has observed $Y(1)$ and may recover $Z(1)$ by noting that
\[
Y(1)=Z(1)+A(a(0),1)=Z(1)+a(0)\ ,
\]
so that
\[
Z(1)=Y(1)-a(0)\ .
\]
Then a prediction (of the uncontrolled process) one step ahead is
\[
\widehat{Z}(2|1)\doteq\alpha Z(1)+(1-\alpha)\widehat{Z}(1|0)+d\ .
\]
That means that with a target of $T(2)$ one should predict a value of the
controlled process at time $t=2$ of
\[
\widehat{Y}(2|1)\doteq\widehat{Z}(2|1)+a(0)+a(1)\ .
\]
Upon setting this value equal to $T(2)$ and solving it is clear that one
should choose
\[
a(1)=T(2)-\left( \widehat{Z}(2|1)+a(0)\right) \ .
\]
So in general under (\ref{eq3.2.2}), at time $t$ one may note that
\[
Z(t)=Y(t)-\sum_{s=0}^{t-1}a(s)
\]
and (recursively) compute
\[
\widehat{Z}(t+1|t)\doteq\alpha Z(t)+(1-\alpha)\widehat{Z}(t|t-1)+d\ .
\]
Then setting the predicted value of the controlled process equal to $T(t+1)$
and solving for $a(t)$, find the MV control action
\[
a(t)=T(t+1)-\left( \widehat{Z}(t+1|t)+\sum_{s=0}^{t-1}a(s)\right) \ .
\]
Finally, consider the problem of MV control under the same model
(\ref{eq3.2.1}), but now using
\begin{equation}
A(a,s)=\left\{
\begin{array}
[c]{lll}%
0 & \mbox{if} & s=1\\
a & \mbox{for} & s=2,3,\ldots
\end{array}
\right. \label{eq3.2.3}%
\end{equation}
(a description of response to process adjustment involving one period of
delay, after which the full effect of an adjustment is immediately and
permanently felt).
Consider the situation at time $t=0$. In hand are $Z(0)$ and the prior mean of
$W(0)$, $\widehat{Z}(0\mathopen|-1)$, and the first $Y$ that one can affect by
choice of $a(0)$ is $Y(2)$. Now
\begin{align*}
Z(2) & =W(2)+\epsilon(2)\ ,\\
& =W(1)+d+\nu(2)+\epsilon(2)\ ,\\
& =Z(1)-\epsilon(1)+d+\nu(2)+\epsilon(2)
\end{align*}
so that
\begin{align*}
\widehat{Z}(2|0) & \doteq E_{\mathcal{F}}[Z(2)|Z(0)]\ ,\\
& =E_{\mathcal{F}}[Z(1)-\epsilon(1)+d+\nu(2)+\epsilon(2)|Z(0)]\ ,\\
& =\widehat{Z}(1|0)+d\ ,\\
& =\alpha Z(0)+(1-\alpha)\widehat{Z}(0\mathopen|-1)+2d
\end{align*}
is a prediction of where the uncontrolled process will be at time $t=2$. Then
a prediction for the controlled process at time $t=2$ is
\[
\widehat{Y}(2|0)\doteq\widehat{Z}(2|0)+A(a(0),2)=\widehat{Z}(2|0)+a(0)
\]
and upon setting this equal to the time $t=2$ target, $T(2)$, and solving, one
has the MV control action
\[
a(0)=T(2)-\widehat{Z}(2|0)\ .
\]
At time $t=1$ one has in hand $Y(1)=Z(1)$ and $\widehat{Z}(1|0)$ and the first
$Y$ that can be affected by the choice of $a(1)$ is $Y(3)$. Now
\begin{align*}
Z(3) & =W(3)+\epsilon(3)\ ,\\
& =W(2)+d+\nu(3)+\epsilon(3)\ ,\\
& =Z(2)-\epsilon(2)+d+\nu(3)+\epsilon(3)
\end{align*}
so that
\begin{align*}
\widehat{Z}(3|1) & \doteq E_{\mathcal{F}}[Z(3)|Z(0),Z(1)]\ ,\\
& =E_{\mathcal{F}}[Z(2)-\epsilon(2)+d+\nu(3)+\epsilon(3)|Z(0),Z(1)]\ ,\\
& =\widehat{Z}(2|1)+d\ ,\\
& =\alpha Z(1)+(1-\alpha)\widehat{Z}(1|0)+2d
\end{align*}
is a prediction of where the uncontrolled process will be at time $t=3$. Then
a prediction for the controlled process at time $t=3$ is
\[
\widehat{Y}(3|1)\doteq\widehat{Z}(3|1)+A(a(0),3)+A(a(1),2)=\widehat
{Z}(3|1)+a(0)+a(1)
\]
and upon setting this equal to the time $t=3$ target, $T(3)$, and solving, one
has the MV control action
\[
a(1)=T(3)-\left( \widehat{Z}(3|1)+a(0)\right) \ .
\]
Finally, in general under (\ref{eq3.2.3}), one may at time $t$ note that
\[
Z(t)=Y(t)-\sum_{s=0}^{t-2}a(s)
\]
and (recursively) compute
\[
\widehat{Z}(t+2|t)\doteq\alpha Z(t)+(1-\alpha)\widehat{Z}(t|t-1)+2d\ .
\]
Then setting the time $t+2$ predicted value of the controlled process equal to
$T(t+2)$ and solving for $a(t)$, we find the MV control action
\[
a(t)=T(t+2)-\left( \widehat{Z}(t+2|t)+\sum_{s=0}^{t-1}a(s)\right) \ .
\]
\chapter{Process Characterization and Capability Analysis}
Sections 5.1 through 5.3 of V\&J discuss the problem of summarizing the
behavior of a stable process. The ``bottom line'' of that discussion is that
one-sample statistical methods can be used in a straightforward manner to
characterize a process/population/universe standing behind data collected
under stable process conditions. Section 5.5 of V\&J opens a discussion of
summarizing process behavior when it is not sensible to model all data in hand
as random draws from a single/fixed universe. The notes in this chapter carry
the theme of \S5.5 of V\&J slightly further and add some theoretical detail
missing in the book.
\section{General Comments on Assessing and Dissecting ``Overall Variation''}
The questions ``How much variation is there overall?'' and ``Where is the
variation coming from?'' are fundamental to process
characterization/understanding and the guidance of improvement efforts. To
provide a framework for discussion here, suppose that in hand one has $r$
samples of data, sample $i$ of size $n_{i}$ ($i=1,\ldots,r$). Depending upon
the specific application, these $r$ samples can have many different logical
structures. For example, \S5.5 of V\&J considers the case where the $n_{i}$
are all the same and the $r$ samples are naturally thought of as having a
balanced hierarchical/tree structure. But many others (both ``regular'' and
completely ``irregular'') are possible. For example Figure \ref{fig4.1.1} is a
schematic parallel to Figure 5.16 of V\&J for a ``staggered nested data
structure.''%
%TCIMACRO{\FRAME{ftbpFU}{2.2943in}{1.7253in}{0pt}{\Qcb{Schematic of a staggered
%Nested Data Set}}{\Qlb{fig4.1.1}}{nfig4-1-1.eps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.2943in; height 1.7253in; depth 0pt; original-width 5.6662in;
%original-height 4.2454in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig4-1-1.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=1.7253in,
width=2.2943in
]%
{../CLASS/531/Notes/Nfig4-1-1.eps}%
\caption{Schematic of a staggered Nested Data Set}%
\label{fig4.1.1}%
\end{center}
\end{figure}
%EndExpansion
When data in hand represent the entire universe of interest, methods of
probability and statistical inference have no relevance to the basic questions
``How much variation is there overall?'' and ``Where is the variation coming
from?'' The problem is one of descriptive statistics only, and various
creative combinations of methods of statistical graphics and basic numerical
measures (like sample variances and ranges) can be assembled to address these
issues. And most simply, a ``grand sample variance'' is one sensible
characterization of ``overall variation.''
The tools of probability and statistical inference only become relevant when
one sees data in hand as representing something more than themselves. And
there are basically two standard routes to take in this enterprise. The first
posits some statistical model for the process standing behind the data (like
the hierarchical random effects model (5.28) of V\&J). One may then use the
data in hand in the estimation of parameters (and functions of parameters) of
that model in order to characterize process behavior, assess overall
variability and dissect that variation into interpretable pieces.
The second standard way in which probabilistic and statistical methods become
relevant (to the problems of assessing overall variation and analysis of its
components) is through the adoption of a ``finite population sampling''
perspective. That is, there are times where there is conceptually some
(possibly highly structured) concrete data set of interest and the data in
hand arise through the application (possibly in various complicated ways) of
random selection of \textit{some} of the elements of that data set. (As one
possible example, think of a warehouse that contains 100 crates, each of which
contains 4 trays, each of which in turn holds 50 individual machine parts. The
20,000 parts in the warehouse could constitute a concrete population of
interest. If one were to sample 3 crates at random, select at random 2 trays
from each and then select 5 parts from each tray at random, one has a
classical finite population sampling problem. Probability/randomness has
entered through the sampling that is necessitated because one is unwilling to
collect data on all 20,000 parts.)
Section 5.5 of V\&J introduces the first of these two approaches to assessing
and dissecting overall variation for balanced hierarchical data. But it does
not treat the finite population sampling ideas at all. The present chapter of
these notes thus extends slightly the random effects analysis ideas discussed
in \S5.5 and then presents some simple material from the theory of finite
population sampling.
\section{More on Analysis Under the Hierarchical Random Effects Model}
Consider the hierarchical random effects model with 2 levels of nesting
discussed in \S5.5.2 of V\&J. We will continue the notations $y_{ijk}%
,\ \bar{y}_{ij},\ \bar{y}_{i.}$ and $\bar{y}_{..}$ used in that section and
also adopt some additional notation. For one thing, it will be useful to
define some ranges. Let
\[
R_{ij}= \max_{k} y_{ijk} - \min_{k} y_{ijk}= \mbox{the range of the $j$th
sample within the $i$th level of A}\ ,
\]
\[
\Delta_{i} = \max_{j} \bar{y}_{ij} - \min_{j} \bar{y}_{ij} = \mbox{the range
of the $J$ sample means within the $i$th level of A}\ ,
\]
and
\[
\Gamma= \max_{i} \bar{y}_{i.} - \min_{i} \bar{y}_{i.} = \mbox{the range of the
means for the $I$ levels of A}\ .
\]
It will also be useful to consider the ANOVA sums of squares and mean squares
alluded to briefly in \S5.5.3. So let
\begin{align*}
\mathit{SSTot} & = \sum_{i,j,k}(y_{ijk}-\bar{y}_{..})^{2}\\
& = (IJK-1) \times\mbox{the grand sample variance of all $IJK$ observations}%
\ ,\\
SSC(B(A)) & = \sum_{i,j,k}(y_{ijk}-\bar{y}_{ij})^{2}\\
& = (K-1) \times\mbox{the sum of all $IJ$ ``level C'' sample variances}\ ,\\
\mathit{SSB(A)} & = K\sum_{i,j}(\bar{y}_{ij} - \bar{y}_{i.})^{2}\\
& = K(J-1) \times\mbox{the sum of all $I$ sample variances of $J$ means
$\bar{y}_{ij}$}%
\end{align*}
and
\begin{align*}
\mathit{SSA} & = KJ\sum_{i}(\bar{y}_{i.}-\bar{y}_{..})^{2}\\
& = KJ(I-1) \times\mbox{the sample variance of the $I$ means $\bar{y}_{i.}$%
}\ .
\end{align*}
Note that in the notation of \S5.5.2, $SSA=KJ(I-1)s_{\mathrm{A}}^{2}$,
$SSB(A)=K(J-1)\sum_{i=1}^{I} s_{\mathrm{B}i}^{2}$ and $SSC(B(A))=(K-1)\sum
_{i,j} s_{ij}^{2} = IJ(K-1)\widehat{\sigma}^{2}$. And it is an algebraic fact
that $\mathit{SSTot} = SSA+SSB(A)+SSC(B(A))$.
Mean squares are derived from these sums of squares by dividing by appropriate
degrees of freedom. That is, define
\[
MSA \doteq\frac{SSA}{I-1}\ ,
\]
\[
MSB(A) \doteq\frac{SSB(A)}{I(J-1)}\ ,
\]
and
\[
MSC(B(A)) \doteq\frac{SSC(B(A))}{IJ(K-1)}\ .
\]
Now these ranges, sums of squares and mean squares are interesting measures of
variation in their own right, but are especially helpful when used to produce
estimates of variance components and functions of variance components. For
example, it is straightforward to verify that under the hierarchical random
effects model (5.28) of V\&J
\[
\mathrm{E}R_{ij}=d_{2}(K)\sigma\ ,
\]%
\[
\text{E}\Delta_{i}=d_{2}(J)\sqrt{\sigma_{\beta}^{2}+\sigma^{2}/K}%
\]
and
\[
\mathrm{E}\Gamma=d_{2}(I)\sqrt{\sigma_{\alpha}^{2}+\sigma_{\beta}^{2}%
/J+\sigma^{2}/JK}\ .
\]
So, reasoning as in \S2.2.2 of V\&J (there in the context of two-way random
effects models and gage R\&R) reasonable range-based point estimates of the
variance components are
\[
\widehat{\sigma}^{2}=\left( \frac{\bar{R}}{d_{2}(K)}\right) ^{2}\ ,
\]%
\[
\widehat{\sigma}_{\beta}^{2}=\max\left( 0,\left( \frac{\bar{\Delta}}%
{d_{2}(J)}\right) ^{2}-\frac{\widehat{\sigma}^{2}}{K}\right)
\]
and
\[
\widehat{\sigma}_{\alpha}^{2}=\max\left( 0,\left( \frac{\Gamma}{d_{2}%
(I)}\right) ^{2}-\frac{1}{J}\left( \frac{\bar{\Delta}}{d_{2}(J)}\right)
^{2}\right) \ .
\]
Now by applying linear model theory or reasoning from V\&J displays (5.30) and
(5.32) and the fact that E$s_{ij}^{2}=\sigma^{2}$, one can find expected
values for the mean squares above. These are
\[
\text{E}\mathit{MSA}=KJ\sigma_{\alpha}^{2}+K\sigma_{\beta}^{2}+\sigma^{2}\ ,
\]%
\[
\text{E}\mathit{MSB}(A)=K\sigma_{\beta}^{2}+\sigma^{2}%
\]
and
\[
\text{E}\mathit{MSC}(B(A))=\sigma^{2}\ .
\]
And in a fashion completely parallel to the exposition in \S1.4 of these
notes, standard linear model theory implies that the quantities
\[
\frac{IJ(K-1)MSC(B(A))}{\text{E}MSC(B(A))},\frac{I(J-1)MSB(A)}{\text{E}%
MSB(A)}\ \mbox{and}\ \frac{(I-1)MSA}{\text{E}MSA}%
\]
are independent $\chi^{2}$ random variables with respective degrees of
freedom
\[
IJ(K-1),I(J-1)\ \mbox{and}\ (I-1)\ .
\]%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{Balanced Data Hierarchical Random Effects Analysis ANOVA Table (2
Levels of Nesting)}\label{tab4.2.1}
\begin{tabular}
[c]{lcccc}%
\multicolumn{5}{c}{ANOVA Table}\\
Source & $SS$ & $df$ & $MS$ & E$MS$\\\hline
A & $SSA$ & $I-1$ & $MSA$ & $KJ\sigma_{\alpha}^{2}+K\sigma_{\beta}^{2}%
+\sigma^{2}$\\
B(A) & $SSB(A)$ & $I(J-1)$ & $MSB(A)$ & $K\sigma_{\beta}^{2}+\sigma^{2}$\\
C(B(A)) & $SSC(B(A))$ & $IJ(K-1)$ & $MSC(B(A))$ & $\sigma^{2}$\\\hline
Total & $\mathit{SSTot}$ & $IJK-1$ & &
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
These facts about sums of squares and mean squares for the hierarchical random
effects model are conveniently summarized in the usual (hierarchical random
effects model) ANOVA table (for two levels of nesting), Table \ref{tab4.2.1}.
Further, the fact that the expected mean squares are simple linear
combinations of the variance components $\sigma_{\alpha}^{2}$, $\sigma_{\beta
}^{2}$ and $\sigma^{2}$ motivates the use of linear combinations of mean
squares in the estimation of the variance components (as in \S5.5.3 of V\&J).
In fact (as indicated in \S5.5.3 of V\&J) the standard ANOVA-based estimators
\[
\widehat{\sigma}^{2}=\frac{SSC(B(A))}{IJ(K-1)}\ ,
\]%
\[
\widehat{\sigma}_{\beta}^{2}=\frac{1}{K}\max\left( 0,\frac{SSB(A)}%
{I(J-1)}-\widehat{\sigma}^{2}\right)
\]
and
\[
\widehat{\sigma}_{\alpha}^{2}=\frac{1}{JK}\max\left( 0,\frac{SSA}%
{(I-1)}-\frac{SSB(A)}{I(J-1)}\right)
\]
are exactly the estimators (described without using ANOVA notation) in
displays (5.29), (5.31) and (5.33) of V\&J. The virtue of describing them in
the present terms is to suggest/emphasize that all that was said in \S1.4 and
\S1.5 (in the gage R\&R context) about making standard errors for functions of
mean squares and ANOVA-based confidence intervals for functions of variance
components is equally true in the present context.
For example, the formula (\ref{eq1.4.2}) of these notes can be applied to
derive standard errors for $\widehat{\sigma}_{\beta}^{2}$ and $\widehat
{\sigma}_{\alpha}^{2}$ immediately above. Or since
\[
\sigma_{\beta}^{2}=\frac{1}{K}\text{E}MSB(A)-\frac{1}{K}\text{E}MSC(B(A))
\]
and
\[
\sigma_{\alpha}^{2}=\frac{1}{JK}\text{E}MSA-\frac{1}{JK}\text{E}MSB(A)
\]
are both of form (\ref{eq1.5.1}), the material of \S1.5 can be used to set
confidence limits for these quantities.
As a final note in this discussion of the what is possible under the
hierarchical random effects model, it is worth noting that while the present
discussion has been confined to a ``balanced data'' framework, Problem 4.8
shows that at least point estimation of variance components can be done in a
fairly elementary fashion even in unbalanced data contexts.
\section{Finite Population Sampling and Balanced Hierarchical Structures}
This brief subsection is meant to illustrate the kinds of things that can be
done with finite population sampling theory in terms of estimating overall
variability in a (balanced) hierarchical concrete population of items and
dissecting that variability.
Consider first a finite population consisting of $NM$ items arranged into $N$
levels of A, with $M$ levels of B within each level of A. (For example, there
might be $N$ boxes, each containing $M$ widgets. Or there might be $N$ days,
on each of which $M$ items are manufactured.) Let
\begin{align*}
y_{ij} & = \mbox{a measurement on the item at level $i$ of A and level $j$
of B within the}\\
& \mbox{$i$th level of A (e.g. the diameter of the $j$th widget in the $i$th
box)}\ .
\end{align*}
Suppose that the quantity of interest is the (grand) variance of all $NM$
measurements,
\[
S^{2}=\frac{1}{NM-1} \sum_{i=1}^{N} \sum_{j=1}^{M}(y_{ij} - \bar{y}_{.}%
)^{2}\ .
\]
(This is clearly one quantification of overall variation.)
The usual one-way ANOVA identity applied to the $NM$ numbers making up the
population of interest shows that the population variance can be expressed as
\[
S^{2}=\frac{1}{NM-1}\,\left( M(N-1)S^{2}_{\mathrm{A}}+N(M-1)S^{2}%
_{\mathrm{B}}\right)
\]
where
\[
S^{2}_{\mathrm{A}}=\frac{1}{N-1}\, \sum_{i=1}^{N} (\bar{y}_{i}-\bar{y}%
_{.})^{2}= \mbox{the variance of the $N$ ``A level means''}%
\]
and
\[
S^{2}_{\mathrm{B}}=\frac{1}{N}\, \sum_{i=1}^{N}\left( \frac{1}{M-1}
\sum_{j=1}^{M}(y_{ij}-\bar{y}_{i})^{2}\right) = \mbox{the average of the $N$
``within A level variances.''}%
\]
Suppose that one selects a simple random sample of $n$ levels of A, and for
each level of A a simple random sample of $m$ levels of B within A. (For
example, one might sample $n$ boxes and $m$ widgets from each box.) A naive
way to estimate $S^{2}$ is to simply use the sample variance
\[
s^{2}=\frac{1}{nm-1}\sum(y_{ij}-\bar{y}_{.}^{\ast})^{2}%
\]
where the sum is over the $nm$ items selected and $\bar{y}_{.}^{\ast}$ is the
mean of those measurements. Unfortunately, this is not such a good estimator.
Material from Chapter 10 of Cochran's \textit{Sampling Techniques} can be used
to show that
\[
\text{E}s^{2}=\frac{m(n-1)}{nm-1}\,S_{\mathrm{A}}^{2}+\left( \frac
{n(m-1)}{nm-1}+\frac{m(n-1)}{nm-1}\,\left( \frac{1}{m}-\frac{1}{M}\right)
\right) \,S_{\mathrm{B}}^{2}\ ,
\]
which is not in general equal to $S^{2}$.
However, it is possible to find a linear combination of the sample versions of
$S_{\mathrm{A}}^{2}$ and $S_{\mathrm{B}}^{2}$ that has expected value equal to
the population variance. That is, let
\begin{align*}
s_{\mathrm{A}}^{2} & =\frac{1}{n-1}\sum(\bar{y}_{i}^{\ast}-\bar{y}_{.}%
^{\ast})^{2}\\
& =\mbox{the sample variance of the $n$ sample means (from the sampled levels
of A)}%
\end{align*}
and
\begin{align*}
s_{\mathrm{B}}^{2} & =\frac{1}{n}\sum\left( \frac{1}{m-1}\sum(y_{ij}%
-\bar{y}_{i}^{\ast})^{2}\right) \\
& =\mbox{the average of the $n$ sample variances (from the sampled levels of
A)}\ .
\end{align*}
Then, it turns out that
\[
\text{E}s_{\mathrm{A}}^{2}=S_{\mathrm{A}}^{2}+\left( \frac{1}{m}-\frac{1}%
{M}\right) S_{\mathrm{B}}^{2}%
\]
and
\[
\text{E}s_{\mathrm{B}}^{2}=S_{\mathrm{B}}^{2}\ .
\]
From this it follows that an unbiased estimator of $S^{2}$ is the quantity
\[
\frac{M(N-1)}{NM-1}s_{\mathrm{A}}^{2}+\left( \frac{N(M-1)}{NM-1}%
-\frac{M(N-1)}{NM-1}\left( \frac{1}{m}-\frac{1}{M}\right) \right)
s_{\mathrm{B}}^{2}\ .
\]
This kind of analysis can, of course, be carried beyond the case of a single
level of nesting. For example, consider the situation with two levels of
nesting (where both the finite population and the observed values have
balanced hierarchical structure). Then in the ANOVA notation of \S4.2 above,
take
\[
s^{2}_{\mathrm{A}} = \frac{SSA}{(I-1)JK}\ ,
\]
\[
s^{2}_{\mathrm{B}} = \frac{SSB(A)}{I(J-1)K}%
\]
and
\[
s_{\mathrm{C}}^{2}=\frac{SSC(B(A))}{IJ(K-1)}\ .
\]
Let $S_{\mathrm{A}}^{2},\ S_{\mathrm{B}}^{2}$ and $S_{\mathrm{C}}^{2}$ be the
population analogs of $s_{\mathrm{A}}^{2},\ s_{\mathrm{B}}^{2}$ and
$s_{\mathrm{C}}^{2}$, and $f_{\mathrm{B}}$ and $f_{\mathrm{C}}$ be the
sampling fractions at the second and third stages of item selection. Then it
turns out that
\[
\mathrm{E}s_{\mathrm{A}}^{2}=S_{\mathrm{A}}^{2}+\frac{(1-f_{\mathrm{B}})}%
{J}S_{\mathrm{B}}^{2}+\frac{(1-f_{\mathrm{C}})}{JK}S_{\mathrm{C}}^{2}\ ,
\]
\[
\mathrm{E}s_{\mathrm{B}}^{2}=S_{\mathrm{B}}^{2}+\frac{(1-f_{\mathrm{C}})}%
{K}S_{\mathrm{C}}^{2}%
\]
and
\[
\mathrm{E}s_{\mathrm{C}}^{2}=S_{\mathrm{C}}^{2}\ .
\]
So (since the grand population variance, $S^{2}$, is expressible as a linear
combination of $S_{\mathrm{A}}^{2},\ S_{\mathrm{B}}^{2}$ and $S_{\mathrm{C}%
}^{2}$, each of which can be estimated by a linear combination of
$s_{\mathrm{A}}^{2},\ s_{\mathrm{B}}^{2}$ and $s_{\mathrm{C}}^{2}$) an
unbiased estimator of the population variance can be built as an appropriate
linear combination of $s_{\mathrm{A}}^{2},\ s_{\mathrm{B}}^{2}$ and
$s_{\mathrm{C}}^{2}$.
\chapter{Sampling Inspection}
Chapter 8 of V\&J treats the subject of sampling inspection, introducing the
basic methods of acceptance sampling and continuous inspection. This chapter
extends that discussion somewhat. We consider how (in the fraction
nonconforming context) one can move from single sampling plans to quite
general acceptance sampling plans, we provide a brief discussion of the
effects of inspection/measurement error on the real (as opposed to nominal)
statistical properties of acceptance sampling plans, and then the chapter
closes with an elaboration of \S8.5 of V\&J, providing some more details on
the matter of economic arguments in the choice of sampling inspection schemes.
\section{More on Fraction Nonconforming Acceptance Sampling}
Section 8.1 of V\&J (and for that matter \S8.2 as well) confines itself to the
discussion of single sampling plans. For those plans, a sample size is fixed
in advance at some value $n$, and lot disposal is decided on the basis of
inspection of exactly $n$ items. There are, however, often good reasons to
consider acceptance sampling plans whose ultimate sample size depends upon
``how the inspected items look'' as they are examined. (One might, for
example, want to consider a ``double sampling'' plan that inspects an initial
small sample, terminating sampling if items look especially good or especially
bad so that appropriate lot disposal seems clear, but takes an additional
larger sample if the initial one looks ``inconclusive'' regarding the likely
quality of the lot.) This section considers fraction nonconforming acceptance
sampling from the most general perspective possible and develops the OC, ASN,
AOQ and ATI for a general fraction nonconforming plan.
Consider the possibility of inspecting one item at a time from a lot of $N$,
and after inspecting each successive item deciding to 1) stop sampling and
accept the lot, 2) stop sampling and reject the lot or 3) inspect another
item. With
\[
X_{n}=\mbox{the number of nonconforming items found among the first $n$
inspected}%
\]
a helpful way of thinking about various different plans in this context is in
terms of possible paths through a grid of ordered pairs of integers
$(n,X_{n})$ with $0\leq X_{n}\leq n$. Different acceptance sampling plans then
amount to different choices of ``Accept Boundary'' and ``Reject Boundary.''
Figure \ref{fig5.1.1} is a diagram representing a single sampling plan with
$n=6$ and $c=2$, Figure \ref{fig5.1.2} is a diagram representing a ``doubly
curtailed'' version of this plan (one that recognizes that there is no need to
continue inspection after lot disposal has been determined) and Figure
\ref{fig5.1.3} illustrates a double sampling plan in these terms.%
%TCIMACRO{\FRAME{ftbpFU}{2.9213in}{2.9559in}{0pt}{\Qcb{Diagram for the $n=6$,
%$c=2$ Single Sampling Plan}}{\Qlb{fig5.1.1}}{nfig5-1-1.eps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.9213in; height 2.9559in; depth 0pt; original-width 7.2333in;
%original-height 7.3189in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-1-1.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.9559in,
width=2.9213in
]%
{../CLASS/531/Notes/Nfig5-1-1.eps}%
\caption{Diagram for the $n=6$, $c=2$ Single Sampling Plan}%
\label{fig5.1.1}%
\end{center}
\end{figure}
%EndExpansion%
%TCIMACRO{\FRAME{ftbpFU}{2.9213in}{2.1525in}{0pt}{\Qcb{Diagram for Doubly
%Curtailed $n=6$, $c=2$ Single Sampling Plan}}{\Qlb{fig5.1.2}}{nfig5-1-2a.eps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.9213in; height 2.1525in; depth 0pt; original-width 7.235in;
%original-height 5.3108in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-1-2a.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.1525in,
width=2.9213in
]%
{../CLASS/531/Notes/Nfig5-1-2a.eps}%
\caption{Diagram for Doubly Curtailed $n=6$, $c=2$ Single Sampling Plan}%
\label{fig5.1.2}%
\end{center}
\end{figure}
%EndExpansion%
%TCIMACRO{\FRAME{ftbpFU}{2.9213in}{2.5918in}{0pt}{\Qcb{Diagram for a Small
%Double Sampling Plan}}{\Qlb{fig5.1.3}}{nfig5-1-3a.eps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.9213in; height 2.5918in; depth 0pt; original-width 7.235in;
%original-height 6.4117in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-1-3a.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.5918in,
width=2.9213in
]%
{../CLASS/531/Notes/Nfig5-1-3a.eps}%
\caption{Diagram for a Small Double Sampling Plan}%
\label{fig5.1.3}%
\end{center}
\end{figure}
%EndExpansion
Now on a diagram like those in the figures, one may very quickly count the
number of permissible paths from $(0,0)$ to a point in the grid by (working
left to right) marking each point $(n,X_{n})$ in the grid (that it is possible
to reach) with the \textit{sum} of the numbers of paths reaching
$(n-1,X_{n}-1)$ and $(n-1,X_{n})$ provided neither of those points is a
``stop-sampling point.'' (No feasible paths leave a stop-sampling point. So
path counts to them do not contribute to path counts for any points to their
right.) Figure \ref{fig5.1.4} is a version of Figure \ref{fig5.1.2} with
permissible movements through the $(n,X_{n})$ grid marked by arrows, and path
counts indicated.%
%TCIMACRO{\FRAME{ftbpFU}{2.9213in}{2.1525in}{0pt}{\Qcb{Diagram for the Doubly
%Curtailed Single Sampling Plan with Path Counts Indicated}}{\Qlb{fig5.1.4}%
%}{nfig5-1-4.eps}{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.9213in; height 2.1525in; depth 0pt; original-width 7.235in;
%original-height 5.3108in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-1-4.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=2.1525in,
width=2.9213in
]%
{../CLASS/531/Notes/Nfig5-1-4.eps}%
\caption{Diagram for the Doubly Curtailed Single Sampling Plan with Path
Counts Indicated}%
\label{fig5.1.4}%
\end{center}
\end{figure}
%EndExpansion
The reason that one cares about the path counts is that for any stop-sampling
point $(n,X_{n})$, from perspective A
\[
P[\mbox{reaching}\ (n,X_{n})]=\left( \mbox{path count from (0,0)
to}\ (n,X_{n})\right) \frac{{\binom{N-n }{Np-X_{n}}}}{{\binom{N }{Np}}}\ ,
\]
while from perspective B
\[
P[\mbox{reaching}\ (n,X_{n})] = \left( \mbox{path count from (0,0)
to}\ (n,X_{n})\right) p^{X_{n}}(1-p)^{n-X_{n}}\ .
\]
And these probabilities of reaching the various stop sampling points are the
fundamental building blocks of the standard statistical characterizations of
an acceptance sampling plan.
For example, with A and R respectively the acceptance and rejection
boundaries, the OC for an arbitrary fraction nonconforming plan is
\begin{equation}
Pa=\sum_{(n,X_{n})\in\mathrm{A}}P[\mbox{reaching}\ (n,X_{n})]\ .
\label{eq5.1.1}%
\end{equation}
And the mean number of items sampled (the Average Sample Number) is
\begin{equation}
ASN=\sum_{(n,X_{n})\in\mathrm{A}\cup\mathrm{R}}nP[\mbox{reaching}%
\ (n,X_{n})]\ . \label{eq5.1.2}%
\end{equation}
Further, under the rectifying inspection scenario, from perspective B
\begin{equation}
AOQ=\sum_{(n,X_{n})\in\mathrm{A}}(1-\frac{n}{N})pP[\mbox{reaching}%
\ (n,X_{n})]\ , \label{eq5.1.3}%
\end{equation}
from perspective A
\begin{equation}
AOQ=\sum_{(n,X_{n})\in\mathrm{A}}(p-\frac{X_{n}}{N})P[\mbox{reaching}%
\ (n,X_{n})]\ \label{eq5.1.4}%
\end{equation}
and
\begin{equation}
ATI=N(1-Pa)+\sum_{(n,X_{n})\in\mathrm{A}}nP[\mbox{reaching}\ (n,X_{n})]\ .
\label{eq5.1.5}%
\end{equation}
These formulas are conceptually very simple and quite universal. The fact that
specializing them to any particular choice of acceptance boundary and
rejection boundary might have been unpleasant when computations had to be done
``by hand'' is largely irrelevant in today's world of plentiful fast and cheap
computing. These simple formulas and a personal computer make completely
obsolete the many many pages of specialized formulas that at one time filled
books on acceptance sampling.
Two other matters of interest remain to be raised regarding this general
approach to fraction nonconforming acceptance sampling. The first concerns the
difficult mathematical question ``What are good shapes for the accept and
reject boundaries?'' We will talk a bit in the final section of this chapter
about criteria upon which various plans might be compared and allude to how
one might try to find a ``best'' plan (``best'' shapes for the acceptance and
rejection boundaries) according to such criteria. But at this point, we wish
only to note that Abraham Wald working in the 1940s on the problem of
sequential testing, developed some approximate theory that suggests that
parallel straight line boundaries (the acceptance boundary below the rejection
boundary) have some attractive properties. He was even able to provide some
approximate two-point design criteria. That is, in order to produce a plan
whose OC curve runs approximately through the points $(p_{1},Pa_{1})$ and
$(p_{2},Pa_{2})$ (for $p_{1}Pa_{2}$) Wald suggested linear
stop-sampling boundaries with
\begin{equation}
\mathit{slope}=\frac{\ln\left( \frac{1-p_{1}}{1-p_{2}}\right) }{\ln\left(
\frac{p_{2}(1-p_{1})}{p_{1}(1-p_{2})}\right) }\ . \label{eq5.1.6}%
\end{equation}
An appropriate $X_{n}$-intercept for the acceptance boundary is approximately
\begin{equation}
h_{\mathrm{A}}=\frac{\ln\left( \frac{Pa_{1}}{Pa_{2}}\right) }{\ln\left(
\frac{p_{2}(1-p_{1})}{p_{1}(1-p_{2})}\right) }\ , \label{eq5.1.7}%
\end{equation}
while an appropriate $X_{n}$-intercept for the rejection boundary is
approximately
\begin{equation}
h_{\mathrm{R}}=\frac{\ln\left( \frac{1-Pa_{2}}{1-Pa_{1}}\right) }{\ln\left(
\frac{p_{2}(1-p_{1})}{p_{1}(1-p_{2})}\right) }\ . \label{eq5.1.8}%
\end{equation}
Wald actually derived formulas (\ref{eq5.1.6}) through (\ref{eq5.1.8}) under
``infinite lot size'' assumptions (that also allowed him to produce some
approximations for both the OC and ASN of his plans). Where one is thinking of
applying Wald's boundaries in acceptance sampling of a real (finite $N$) lot,
the question of exactly how to truncate the sampling (close in the right side
of the ``continue sampling region'') must be answered in some sensible
fashion. And once that is done, the basic formulas (\ref{eq5.1.1}) through
(\ref{eq5.1.5}) are of course relevant to describing the resulting plan. (See
Problem 5.4 for an example of this kind of logic in action.)
Finally, it is an interesting side-light here (that can come into play if one
wishes to estimate $p$ based on data from something other than a single
sampling plan) that provided the stop-sampling boundary has \textit{exactly}
one more point in it than the largest possible value of $n$, the uniformly
minimum variance unbiased estimator of $p$ for both type A and type B contexts
is (for ($n,X_{n})$ a stop-sampling point)
\[
\widehat{p}\left( (n,X_{n})\right) =\frac{\mbox{path count from (1,1)
to}\ (n,X_{n})}{\mbox{path count from (0,0) to}\ (n,X_{n})}\ .
\]
For example, Figure \ref{fig5.1.5} shows the path counts from (1,1) needed (in
conjunction with the path counts indicated in Figure \ref{fig5.1.4}) to find
the uniformly minimum variance unbiased estimator of $p$ when the doubly
curtailed single sampling plan of Figure \ref{fig5.1.4} is used.%
%TCIMACRO{\FRAME{ftbpFU}{2.5598in}{1.5947in}{0pt}{\Qcb{Path Counts from $(1,1)$
%to Stop Sampling Points for the Plan of Figure \ref{fig5.1.4}}}{\Qlb{fig5.1.5}%
%}{nfig5-1-5.eps}{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.5598in; height 1.5947in; depth 0pt; original-width 7.235in;
%original-height 4.4763in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-1-5.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=1.5947in,
width=2.5598in
]%
{../CLASS/531/Notes/Nfig5-1-5.eps}%
\caption{Path Counts from $(1,1)$ to Stop Sampling Points for the Plan of
Figure \ref{fig5.1.4}}%
\label{fig5.1.5}%
\end{center}
\end{figure}
%EndExpansion
Table \ref{tab5.1.1} lists the values of $\widehat{p}$ for the 7 points in the
stop-sampling boundary for the doubly curtailed single sampling plan with
$n=6$ and $c=2$, along with the corresponding values of $X_{n}/n$ (the maximum
likelihood estimator of $p$).%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\label{tab5.1.1}\caption{The UMVUE and MLE of $p$ for the Doubly Curtailed
Single Sampling Plan}
\begin{tabular}
[c]{ccc}%
Stop-sampling point $(n,X_{n})$ & UMVUE, $\widehat{p}$ & MLE, $X_{n}%
/n$\\\hline
$(3,3)$ & $1/1$ & $3/3$\\
$(4,0)$ & $0/1$ & $0/4$\\
$(4,3)$ & $2/3$ & $3/4$\\
$(5,1)$ & $1/4$ & $1/5$\\
$(5,3)$ & $3/6$ & $3/5$\\
$(6,2)$ & $4/10$ & $2/6$\\
$(6,3)$ & $4/10$ & $3/6$\\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\section{Imperfect Inspection and Acceptance Sampling}
The nominal statistical properties of sampling inspection procedures are
``perfect inspection'' properties. The OC formulas for the attributes plans in
\S8.1 and \S8.4 of V\&J and \S5.1 above are really premised on the ability to
tell with certainty whether an inspected item is conforming or nonconforming.
And the OC formulas for the variables plans in \S8.2 of V\&J are premised on
an assumption that the measurement $x$ that determines whether an item is
conforming or nonconforming can be obtained for a given item
\textit{completely without measurement error}. But the truth is that
real-world inspection is not perfect and the nominal statistical properties of
these methods at best approximate their actual properties. The purpose of this
section is to investigate (first in the attributes context and then in the
variables context) just how far actual OC values for common acceptance
sampling plans can be from nominal ones.
Consider first the percent defective context and suppose that when a
conforming (good) item is inspected, there is a probability $w_{\mathrm{G}}$
of misclassifying it as nonconforming. Similarly, suppose that when a
nonconforming (defective) item is inspected, there is a probability
$w_{\mathrm{D}}$ of misclassifying it as conforming. Then from perspective B,
a probabilistic description of any single inspected item is given in Table
\ref{tab5.2.1}, where in that table we are using the abbreviation
\[
p^{\ast}=w_{\mathrm{G}}(1-p)+p(1-w_{\mathrm{D}})
\]
for the probability that an item (of unspecified actual condition) is
classified as nonconforming by the inspection process.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\label{tab5.2.1}\caption{Perspective B Description of a Single Inspection
Allowing fo Inspection Error}
\begin{tabular}
[c]{lcccc}
& & \multicolumn{2}{c}{Inspection Result} & \\
& & G & D & \\\cline{3-4}%
Actual & G & \multicolumn{1}{|c|}{$(1-w_{\mathrm{G}})(1-p)$} &
\multicolumn{1}{c|}{$w_{\mathrm{G}}(1-p)$} & $1-p$\\\cline{3-4}%
Condition & D & \multicolumn{1}{|c|}{$pw_{\mathrm{D}}$} &
\multicolumn{1}{c|}{$p(1-w_{\mathrm{D}})$} & $p$\\\cline{3-4}
& & $1-p^{\ast}$ & $p^{\ast}$ &
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
It should thus be obvious that from perspective B in the fraction
nonconforming context, an attributes single sampling plan with sample size $n
$ and acceptance number $c$ has an actual acceptance probability that depends
not only on $p$ but on $w_{\mathrm{G}}$ and $w_{\mathrm{D}}$ as well through
the formula
\begin{equation}
Pa(p,w_{\mathrm{G}},w_{\mathrm{D}})=\sum_{x=0}^{c}{\binom{n}{x}}\left(
p^{\ast}\right) ^{x}\left( 1-p^{\ast}\right) ^{n-x}\ . \label{eq5.2.1}%
\end{equation}
On the other hand, the perspective A version of the fraction nonconforming
scenario yields the following. For an integer $x$ from $0$ to $n$, let $U_{x}
$ and $V_{x}$ be independent random variables,
\[
U_{x}\sim\mathrm{Binomial}\,(x,1-w_{\mathrm{D}})\ \ \mbox{and}\ \ V_{x}%
\sim\mathrm{Binomial}\,(n-x,w_{\mathrm{G}})\ .
\]
And let
\[
r_{x}=P[U_{x}+V_{x}\leq c]
\]
be the probability that a sample containing $x$ nonconforming items actually
passes the lot acceptance criterion. (Note that the nonstandard distribution
of $U_{x}+V_{x}$ can be generated using the same ``adding on diagonals of a
table of joint probabilities'' idea used in \S1.7.1 to generate the
distribution of $\overline{x}$.) Then it is evident that from perspective A an
attributes single sampling plan with sample size $n$ and acceptance number $c$
has an actual acceptance probability
\begin{equation}
Pa(p,w_{\mathrm{G}},w_{\mathrm{D}})=\sum_{x=0}^{n}\frac{{\binom{Np}{x}}%
{\binom{N(1-p)}{n-x}}}{{\binom{N}{n}}}\,r_{x}\ . \label{eq5.2.2}%
\end{equation}
It is clear that nonzero $w_{\mathrm{G}}$ or $w_{\mathrm{D}}$ change nominal
OC's given in displays (8.6) and (8.5) of V\&J into the possibly more
realistic versions given respectively by equations (\ref{eq5.2.1}) and
(\ref{eq5.2.2}) here. In some cases, it may be possible to determine
$w_{\mathrm{G}}$ and $w_{\mathrm{D}}$ experimentally and therefore derive both
nominal and ``real'' OC curves for a fraction nonconforming single sampling
plan. Or, if one were \textit{a priori} willing to guarantee that $0\leq
w_{\mathrm{G}}\leq a$ and that $0\leq w_{\mathrm{D}}\leq b$, it is pretty
clear that from perspective B one might then at least guarantee that
\begin{equation}
Pa(p,a,0)\leq Pa(p,w_{\mathrm{G}},w_{\mathrm{D}})\leq Pa(p,0,b)
\label{eq5.2.3}%
\end{equation}
and have an ``OC band'' in which the real OC (that depends upon the unknown
inspection efficacy) is guaranteed to lie.
Similar analyses can be done for nonconformities per unit contexts as follows.
Suppose that during inspection of product, real nonconformities are missed
with probability $m$ and that (independent of the occurrence and inspection of
real nonconformities) ``phantom'' nonconformities are ``observed'' according
to a Poisson process with rate $\lambda_{\mathrm{P}}$ per unit inspected. Then
from perspective B in a nonconformities per unit context, the number of
nonconformities observed on $k$ units is Poisson with mean
\[
k(\lambda(1-m)+\lambda_{\mathrm{P}})\ ,
\]
so that an actual acceptance probability corresponding to the nominal one
given in display (8.8) of V\&J is
\begin{equation}
Pa(\lambda,\lambda_{\mathrm{P}},m)=\sum_{x=0}^{c}\frac{\exp\left(
-k(\lambda(1-m)+\lambda_{\mathrm{P}})\right) \left( k(\lambda(1-m)+\lambda
_{\mathrm{P}})\right) ^{x}}{x!}\ . \label{eq5.2.4}%
\end{equation}
And from perspective A, with a realized per unit defect rate $\lambda$ on $N $
units, let $U_{\lambda,m}\sim\mbox{Binomial}\,(k\lambda,\left( \frac{k}%
{N}\right) (1-m))$ be independent of $V_{\lambda_{\mathrm{P}}}\sim
\mbox{Poisson}\,(k\lambda_{\mathrm{P}})$. Then an actual acceptance
probability corresponding to the nominal one given in display (8.7) of V\&J
is
\begin{equation}
Pa(\lambda,\lambda_{\mathrm{P}},m)=P[U_{\lambda,m}+V_{\lambda_{\mathrm{P}}%
}\leq c]\ . \label{eq5.2.5}%
\end{equation}
And the same kinds of bounding ideas used above for the fraction nonconforming
context might be used with the OC (\ref{eq5.2.4}) in the mean nonconformities
per unit context. Pretty clearly, if one could guarantee that $\lambda
_{\mathrm{P}}\leq a$ and that $m\leq b$, one would have (from display
(\ref{eq5.2.4}))
\begin{equation}
Pa(\lambda,a,0)\leq Pa(\lambda,\lambda_{\mathrm{P}},m)\leq Pa(\lambda,0,b)
\label{eq5.2.6}%
\end{equation}
in the perspective B situation.
The violence done to the OC notion by the possibility of imperfect inspection
in an attributes sampling context is serious, but not completely unmanageable.
That is, where one can determine the likelihood of inspection errors
experimentally, expressions (\ref{eq5.2.1}), (\ref{eq5.2.2}), (\ref{eq5.2.4})
and (\ref{eq5.2.5}) are simple enough characterizations of real OC's. And
where $w_{\mathrm{G}}$ and $w_{\mathrm{D}}$ (or $\lambda_{\mathrm{P}}$ and
$m$) are small, bounds like (\ref{eq5.2.3}) (or (\ref{eq5.2.6})) show that
both the nominal (the $w_{\mathrm{G}}=0$ and $w_{\mathrm{D}}=0$, or
$\lambda_{\mathrm{P}}=0$ and $m=0$ case) OC and real OC are trapped in a
fairly narrow band and can not be too different. Unfortunately, the situation
is far less happy in the variables sampling context.
The origin of the difficulty with admitting there is measurement error when it
comes to variables acceptance sampling is the fundamental fact that standard
variables plans attempt to treat all $(\mu,\sigma)$ pairs with the same value
of $p$ equally. And in short, once one admits to the possibility of
measurement error clouding the evaluation of the quantity $x$ that must say
whether a given item is conforming or nonconforming, \textit{that goal is
unattainable}. For any level of measurement error, there are $(\mu,\sigma) $
pairs (with very small $\sigma$) for which product variation can so to speak
``hide in the measurement noise.'' So some fairly bizarre real OC properties
result for standard plans.
To illustrate, consider the case of ``unknown $\sigma$'' variables acceptance
sampling with a lower specification, $L$ and adopt the basic measurement model
(2.1) of V\&J for what is actually observed when an item with characteristic
$x$ is measured. Now the development in \S8.2 of V\&J deals with a normal
$(\mu,\sigma)$ distribution for observations. An important issue is ``What
observations?'' Is it the $x$'s or the $y$'s of the model (2.1)? It must be
the $x$'s, for the simple reason that $p$ is defined in terms of $\mu$ and
$\sigma$. These parameters describe what the lot is really like, NOT what it
looks like when measured with error. That is, the $\sigma$ of \S8.2 of V\&J
must be the $\sigma_{x}$ of page 19 of V\&J. But then the analysis of \S8.2 is
done essentially supposing that one has at his or her disposal $\bar{x}$ and
$s_{x}$ to use for decision making purposes, \textit{while all that is really
available are} $\bar{y}$ \textit{and} $s_{y}$!!! And that turns out to make a
huge difference in the real OC properties of the standard method put forth in
\S8.2.
That is, applying criterion (8.35) of V\&J to what can really be observed
(namely the noise-corrupted $y$'s) one accepts a lot iff
\begin{equation}
\bar{y}-L\geq ks_{y}\ . \label{eq5.2.7}%
\end{equation}
And under model (2.1) of V\&J, a given set of parameters $(\mu_{x},\sigma
_{x})$ for the $x$ distribution has corresponding fraction nonconforming
\[
p(\mu_{x},\sigma_{x})=\Phi\left( \frac{L-\mu_{x}}{\sigma_{x}}\right)
\]
and acceptance probability
\begin{align*}
Pa(\mu_{x},\sigma_{x},\beta,\sigma_{\mathrm{measurement}}) & =P\left[
\frac{\bar{y}-L}{s_{y}}\geq k\right] \\
& =P\left( \frac{\frac{\bar{y}-\mu_{y}}{\sigma_{y}/\sqrt{n}}-\frac{L-\mu
_{y}}{\sigma_{y}/\sqrt{n}}}{\frac{s_{y}}{\sigma_{y}}}\geq k\sqrt{n}\right)
\end{align*}
where $\sigma_{y}$ is given in display (2.3) of V\&J. But then let
\begin{equation}
\Delta=-\frac{L-\mu_{y}}{\sigma_{y}/\sqrt{n}}=-\frac{(L-\mu_{x})/\sigma
_{x}-\beta/\sigma_{x}}{\sqrt{1+\frac{\sigma_{\mathrm{measurement}}^{2}}%
{\sigma_{x}^{2}}}/\sqrt{n}}\ , \label{eq5.2.8}%
\end{equation}
and note that
\[
\frac{\bar{y}-\mu_{y}}{\sigma_{y}/\sqrt{n}}\sim\mbox{Normal}\,(0,1)
\]
independent of $\frac{s_{y}}{\sigma_{y}}$, which has the distribution of
$\sqrt{U/(n-1)}$ for $U$ a $\chi_{n-1}^{2}$ random variable. That is, with $W
$ a noncentral $t$ random variable with noncentrality parameter $\Delta$ given
in display (\ref{eq5.2.8}), we have
\[
Pa(\mu_{x},\sigma_{x},\beta,\sigma_{\mathrm{measurement}})=P[W\geq k\sqrt
{n}]\ .
\]
And the crux of the matter is that (even if measurement bias, $\beta$, is $0
$) $\Delta$ in display (\ref{eq5.2.8}) is not a function of $(L-\mu
_{x})/\sigma_{x}$ alone \textit{unless one assumes that} $\sigma
_{\mathrm{measurement}}$ \textit{is EXACTLY} $0$.
Even with no measurement bias, if $\sigma_{\mathrm{measurement}}\neq0$ there
are $(\mu_{x},\sigma_{x})$ pairs with
\[
\frac{L-\mu_{x}}{\sigma_{x}}=z
\]
(and therefore $p=\Phi(z)$) and $\Delta$ ranging all the way from $-z\sqrt{n}$
to $0$. Thus considering $z\leq0$ and $p\leq.5$ there are corresponding $Pa$'s
ranging from
\[
P[\mbox{a }t_{n-1}~\mbox{random variable }\geq k\sqrt{n}]
\]
to
\[
P[\mbox{a non-central }t_{n-1}(-z\sqrt{n})\mbox{ random variable }\geq
k\sqrt{n}]\ ,
\]
(the nominal OC), while considering $z\geq0$ and $p\geq.5$ there are
corresponding $Pa$'s ranging from (the nominal OC)
\[
P[\mbox{a non-central }t_{n-1}(-z\sqrt{n})\mbox{ random variable }\geq
k\sqrt{n}]\ ,
\]
to
\[
P[\mbox{a }t_{n-1}\mbox{ random variable }\geq k\sqrt{n}]\ .
\]
That is, one is confronted with the extremely unpleasant and (initially
counter-intuitive) picture of real OC indicated in Figure \ref{fig5.2.1}.%
%TCIMACRO{\FRAME{ftbpFU}{2.9334in}{1.6544in}{0pt}{\Qcb{Typical Real OC for a
%One-Sided Variables Acceptance Sampling Plan in the Presence of Nonzero
%Measurement Error}}{\Qlb{fig5.2.1}}{nfig5-2-1.eps}%
%{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 2.9334in; height 1.6544in; depth 0pt; original-width 6.4558in;
%original-height 3.614in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfig5-2-1.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=1.6544in,
width=2.9334in
]%
{../CLASS/531/Notes/Nfig5-2-1.eps}%
\caption{Typical Real OC for a One-Sided Variables Acceptance Sampling Plan in
the Presence of Nonzero Measurement Error}%
\label{fig5.2.1}%
\end{center}
\end{figure}
%EndExpansion
It is important to understand the picture painted in Figure \ref{fig5.2.1}.
The situation is worse than in the attributes data case. There, if one knows
the efficacy of the inspection methodology it is at least possible to pick a
single appropriate OC curve. (The OC ``bands'' indicated by displays
(\ref{eq5.2.3}) and (\ref{eq5.2.6}) are created only by ignorance of
inspection efficacy.) The bizarre ``OC bands'' created in the variables
context (and sketched in Figure \ref{fig5.2.1}) do not reduce to curves if one
knows the inspection bias and precision, but rather are intrinsic to the fact
that unless $\sigma_{\mathrm{measurement}}$ is exactly 0, different
$(\mu,\sigma)$ pairs with the same $p$ must have different $Pa$'s under
acceptance criterion (\ref{eq5.2.7}). And the only way that one can replace
the situation pictured in Figure \ref{fig5.2.1} with one having a thinner and
more palatable OC band (something approximating a ``curve'') is by
guaranteeing that
\[
\frac{\sigma_{x}^{2}}{\sigma_{\mathrm{measurement}}^{2}}%
\]
is of some appreciable size. That is, given a particular measurement
precision, one must agree to concern oneself only with cases where product
variation cannot hide in measurement noise. Such is the only way that one can
even come close to the variables sampling goal of treating $(\mu,\sigma)$
pairs with the same $p$ equally.
\section{Some Details Concerning the Economic Analysis of Sampling Inspection}
Section 8.5 of V\&J alludes briefly to the possibility of using
economic/decision-theoretic arguments in the choice of sampling inspection
schemes and cites the 1994 \textit{Technometrics} paper of Vander Wiel and
Vardeman. Our first objective in this section is to provide some additional
details of the Vander Wiel and Vardeman analysis. To that end, consider a
stable process fraction nonconforming situation and continue the
$w_{\mathrm{G}}$ and $w_{\mathrm{D}}$ notation used above (and also introduced
on page 493 of V\&J). Note that Table \ref{tab5.2.1} remains an appropriate
description of the results of a single inspection. We will suppose that
inspection costs are accrued on a per item basis and adopt the notation of
Table 8.16 of V\&J for the costs.
As a vehicle to a very quick demonstration of the famous ``all or none''
principle, consider facing $N$ potential inspections and employing a ``random
inspection policy'' that inspects each item independently with probability
$\pi$. Then the mean cost suffered over $N$ items is simply $N$ times that
suffered for $1$ item. And this is
\begin{align}
\text{E}\mathit{Cost} & =\pi\left( k_{\mathrm{I}}+(1-p)w_{\mathrm{G}%
}k_{\mathrm{GF}}+p(1-w_{\mathrm{D}})k_{\mathrm{DF}}+pw_{\mathrm{D}%
}k_{\mathrm{DP}}\right) +(1-\pi)pk_{\mathrm{DU}}\nonumber\\
& =\pi(k_{\mathrm{I}}+w_{\mathrm{G}}k_{\mathrm{GF}}-pK)+pk_{\mathrm{DU}}
\label{eq5.3.1}%
\end{align}
for
\[
K=(1-w_{\mathrm{D}})(k_{\mathrm{DU}}-k_{\mathrm{DF}})+w_{\mathrm{D}%
}(k_{\mathrm{DU}}-k_{\mathrm{DP}})+w_{\mathrm{G}}k_{\mathrm{GF}}%
\]
(as in display (8.50) of V\&J). Now it is clear from display (\ref{eq5.3.1})
that if $K<0$, E$\mathit{Cost}$ is minimized over choices of $\pi$ by the
choice $\pi=0$. On the other hand, if $K>0$, E$\mathit{Cost}$ is minimized
over choices of $\pi$
\[
\mbox{by the choice }\pi=0\mbox{ if }p\leq\frac{k_{\mathrm{I}}+w_{\mathrm{G}%
}k_{\mathrm{GF}}}{K}%
\]
and
\[
\mbox{by the choice }\pi=1\mbox{ if }p\geq\frac{k_{\mathrm{I}}+w_{\mathrm{G}%
}k_{\mathrm{GF}}}{K}\ .
\]
That is, if one defines
\[
p_{\mathrm{c}}=\left\{
\begin{array}
[c]{lll}%
\infty & \mbox{if} & K\leq0\\
\frac{k_{\mathrm{I}}+w_{\mathrm{G}}k_{\mathrm{GF}}}{K} & \mbox{if} & K>0
\end{array}
\right.
\]
then an optimal random inspection policy is clearly
\[
\pi=0\mbox{ (do no inspection) if }pp_{\mathrm{c}}\ .
\]
This development is simple and completely typical of what one gets from
economic analyses of stable process (perspective B) inspection scenarios.
Where quality is poor, all items should be inspected, and where it is good
none should be inspected. Vander Wiel and Vardeman argue that the specific
criterion developed here (and phrased in terms of $p_{\mathrm{c}}$) holds not
only as one looks for an optimal \textit{random inspection policy}, but
completely generally as one looks \textit{among all possible inspection
policies} for one that minimizes expected total cost. But it is essential to
remember that the context is a stable process/perspective B context, where
costs are accrued on a per item basis, and in order to implement the optimal
policy \textit{one must know} $p$! In other contexts, the best (minimum
expected cost) implementable/realizable policy will often turn out to not be
of the ``all or none'' variety. The remainder of this section will elaborate
on this assertion.
For the balance of the section we will consider (Barlow's formulation) of what
we'll call the ``Deming Inspection Problem'' (as Deming's consideration of
this problem rekindled interest in these matters and engendered considerable
controversy and confusion in the 1980s and early 1990s). That is, we'll
consider a lot of $N$ items, assume a cost structure where
\[
k_{1}=\mbox{the cost of inspecting one item (at the proposed inspection site)}%
\]
and
\[
k_{2}=\mbox{the cost of later grief caused by a defective item that is not
detected}%
\]
and suppose that inspection is without error. (This is the Vander Wiel and
Vardeman cost structure with $k_{\mathrm{I}}=k_{1},k_{\mathrm{DF}}=0$ and
$k_{\mathrm{DU}}=k_{2}$, where both $w_{\mathrm{G}}$ and $w_{\mathrm{D}}$ are
assumed to be 0.) The objective will be optimal (minimum expected cost) choice
of a ``fixed $n$ inspection plan'' (in the language of \S8.1 of V\&J, a single
sampling with rectification plan). That is, we'll consider the optimal choice
of $n$ and $c$ supposing that with
\[
X=\mbox{the number nonconforming in a sample of }n\ ,
\]
if $X\leq c$ the lot will be ``accepted'' (all nonconforming items in the
sample will be replaced with good ones and no more inspection will be done),
while if $X>c$ the lot will be ``rejected'' (all items in the lot will be
inspected and all nonconforming items replaced with good ones). (The implicit
assumption here is that replacements for nonconforming items are somehow known
to be conforming and are produced ``for free.'') And we will continue use of
the stable process or perspective B model for the generation of the items in
the lot.
In this problem, the expected total cost associated with the lot is a function
of $n$, $c$ and $p$,
\begin{align}
\text{ETC}(n,c,p) & =k_{1}n+(1-Pa(n,c,p))k_{1}(N-n)+pPa(n,c,p)k_{2}%
(N-n)\nonumber\\
& =k_{1}N\left( 1+Pa(n,c,p)\left( 1-\frac{n}{N}\right) \left(
p\frac{k_{2}}{k_{1}}-1\right) \right) \ . \label{eq5.3.2}%
\end{align}
Optimal choice of $n$ and $c$ requires that one be in the business of
comparing the functions of $p$ defined in display (\ref{eq5.3.2}). How one
approaches that comparison depends upon what one is willing to input into the
decision process in terms of information about $p$.
First, if $p$ is fixed/known and available for use in choosing $n$ and $c$,
the optimization of criterion (\ref{eq5.3.2}) is completely straightforward.
It amounts only to the comparison of numbers (one for each $(n,c)$ pair), not
functions. And the solution is quite simple. In the case that $p>k_{1}/k_{2}$,
$\left( p\frac{k_{2}}{k_{1}}-1\right) >0$ and from examination of display
(\ref{eq5.3.2}) minimum expected total cost will be achieved if $Pa(n,c,p)=0$
or if $\left( 1-\frac{n}{N}\right) =0$. That is, ``all'' is optimal. In the
case that $p\frac{k_{1}}{k_{2}}\ .
\]
So, an optimal choice of $c$ is
\begin{equation}
c_{G}^{\mathrm{opt}}(n)=\max\left\{ x\mid\text{E}_{G}[p\mid X=x]\leq
\frac{k_{1}}{k_{2}}\right\} \ . \label{eq5.3.3}%
\end{equation}
(And it is perhaps comforting to know that the monotone likelihood ratio
property of the binomial distribution guarantees that E$_{G}[p\,|X=x]$ is
monotone in $x$.)
What is this saying? The assumptions 1) that $p\sim G$ and 2) that conditional
on $p$ the variable $X\sim$ Binomial $(n,p)$ together give a joint
distribution for $p$ and $X$. This in turn can be used to produce for each $x$
a conditional distribution of $p|X=x$ and therefore a conditional mean value
of $p$ given that $X=x$. The prescription (\ref{eq5.3.3}) says that one should
find the largest $x$ for which that conditional mean value of $p$ is still
less than the critical cost ratio and use that value for $c_{G}^{\mathrm{opt}%
}(n)$. To complete the optimization of E$_{G}$ETC$(n,c,p)$, one then would
then need to compute and compare (for various $n$) the quantities
\begin{equation}
\text{E}_{G}\text{ETC}(n,c_{G}^{\mathrm{opt}}(n),p)\ . \label{eq5.3.4}%
\end{equation}
The fact is that depending upon the nature of $G$, the minimizer of quantity
(\ref{eq5.3.4}) can turn out to be anything from 0 to $N$. For example, if $G
$ puts all its probability on one side or the other of $k_{1}/k_{2}$, then the
conditional distributions of $p$ given $X=x$ must concentrate all their
probability (and therefore have their means) on that same side of the critical
cost ratio. So it follows that if $G$ puts all its probability to the left of
$k_{1}/k_{2}$, ``none'' is optimal (even though one doesn't know $p$ exactly),
while if $G$ puts all its probability to the right of $k_{1}/k_{2}$, ``all''
is optimal in terms of optimizing E$_{G}$ETC$(n,c,p)$.
On the other hand, consider an unrealistic but instructive situation where
$k_{1}=1,\,k_{2}=1000$ and $G$ places probability $\frac{1}{2}$ on the
possibility that $p=0$ and probability $\frac{1}{2}$ on the possibility that
$p=1$. Under this model the lot is either perfectly good or perfectly bad, and
\textit{a priori} one thinks these possibilities are equally likely. Here the
distribution $G$ places probability on both sides of the breakeven quantity
$k_{1}/k_{2}=.001$. Even without actually carrying through the whole
mathematical analysis, it should be clear that in this scenario the optimal
$n$ is $1$! Once one has inspected a single item, he or she knows for sure
whether $p$ is $0$ or is $1$ (and the lot can be rectified in the latter case).
The most common mathematically nontrivial version of this whole analysis of
the Deming Inspection Problem is the case where $G$ is a Beta distribution. If
$G$ is the Beta$(\alpha,\beta)$ distribution,
\[
\text{E}_{G}[p\,|X=x]=\frac{\alpha+x}{\alpha+\beta+n}%
\]
so that $c_{G}^{\mathrm{opt}}(n)$ is the largest value of $x$ such that
\[
\frac{\alpha+x}{\alpha+\beta+n}\leq\frac{k_{1}}{k_{2}}\ .
\]
That is, in this situation, for $\lfloor y\rfloor$ the greatest integer in
$y$,
\[
c_{G}^{\mathrm{opt}}(n)=\lfloor\frac{k_{1}}{k_{2}}(\alpha+\beta+n)-\alpha
\rfloor=\lfloor\frac{k_{1}}{k_{2}}n-\alpha+\frac{k_{1}}{k_{2}}(\alpha
+\beta)\rfloor\ ,
\]
which for large $n$ is essentially $\frac{k_{1}}{k_{2}}n$. The optimal value
of $n$ can then be found by optimizing (over choice of $n$) the quantity
\[
\text{E}_{G}\left( \text{ETC}(n,c_{G}^{\mathrm{opt}}(n),p)\right) =\int
_{0}^{1}\text{ETC}(n,c_{G}^{\mathrm{opt}}(n),p)\frac{1}{B(\alpha,\beta
)}p^{\alpha-1}(1-p)^{\beta-1}dp\ .
\]
The reader can check that this exercise boils down to the minimization over
$n$ of
\[
\left( 1-\frac{n}{N}\right) \sum_{x=0}^{c_{G}^{\mathrm{opt}}(n)}{\binom
{n}{x}}\int_{0}^{1}p^{x}(1-p)^{n-x}\left( p\frac{k_{2}}{k_{1}}-1\right)
p^{\alpha-1}(1-p)^{\beta-1}dp\ .
\]
(The SAMPLE program of Lorenzen alluded to earlier actually uses a different
approach than the one discussed here to find optimal plans. That approach is
computationally more efficient, but not as illuminating in terms of laying
bare the basic structure of the problem as the route taken in this exposition.)
As two final pieces of perspective on this topic of economic analysis of
sampling inspection we offer the following. In the first place, while the
Deming Inspection Problem is not a terribly general formulation of the topic,
the results here are typical of how things turn out. Second, it needs to be
remembered that what has been described here is the finding of a cost-optimal
\textit{fixed} $n$ inspection plan. The problem of finding a plan optimal
among all possible plans (of the type discussed in \S5.1) is a more
challenging one. For $G$ placing probability on both sides of the critical
cost ratio, not only need it not be that case that ``all'' or ``none'' is
optimal, but in general an optimal plan need not be of the fixed $n$ variety.
While in principle the methodology for finding an overall best inspection plan
is well-established (involving as it does so called ``dynamic programming'' or
``backwards induction'') the details are unpleasant enough that it will not
make sense to pursue this matter further.
\chapter{Problems}
\baselineskip.2in
\renewcommand{\thesection}{\arabic{section}} \setcounter{section}{0}
\section{Measurement and Statistics}
\renewcommand{\labelenumi}{1.\arabic{enumi}.}
\begin{enumerate}
\item Suppose that a sample variance $s^{2}$ is based on a sample of size $n$
from a normal distribution. One might consider estimating $\sigma$ using $s$
or $s/c_{4}(n)$, or even some other multiple of $s$.
\begin{enumerate}
\item Since $c_{4}(n)<1$, the second of these estimators has a larger variance
than the first. But the second is unbiased (has expected value $\sigma$) while
the first is not. Which has the smaller mean squared error, E$(\widehat
{\sigma}-\sigma)^{2}$? Note that (as is standard in statistical theory),
E$(\widehat{\sigma}-\sigma)^{2}=$Var$\ \widehat{\sigma}+($E$\widehat{\sigma
}-\sigma)^{2}$. (Mean squared error is variance plus squared bias.)
\item What is an optimal (in terms of minimum mean squared error) multiple of
$s$ to use in estimating $\sigma$?
\end{enumerate}
\item How do $R/d_{2}(n)$ and $s/c_{4}(n)$ compare (in terms of mean squared
error) as estimators of $\sigma$? (The assumption here is that they are both
based on a sample from a normal distribution. See Problem 1.1 for a definition
of mean squared error.)
\item Suppose that sample variances $s_{i}^{2}$, $i=1,2,\ldots,r$ are based on
independent samples of size $m$ from normal distributions with a common
standard deviation, $\sigma$. A common SQC-inspired estimator of $\sigma$ is
$\bar{s}/c_{4}(m)$. Another possibility is
\[
s_{\mathrm{pooled}}=\sqrt{\left( \frac{s_{1}^{2}+\cdots+s_{r}^{2}}{r}\right)
}%
\]
or
\[
\hat{\sigma}=s_{\mathrm{pooled}}/c_{4}((m-1)r+1)\ .
\]
Standard distribution theory says that $r(m-1)s_{\mathrm{pooled}}^{2}%
/\sigma^{2}$ has a $\chi^{2}$ distribution with $r(m-1)$ degrees of freedom.
\begin{enumerate}
\item Compare $\bar{s}/c_{4}(m)$, $s_{\mathrm{pooled}}$ and $\hat{\sigma}$ in
terms of mean squared error.
\item What is an optimal multiple of $s_{\mathrm{pooled}}$ (in terms of mean
squared error) to use in estimating $\sigma$?\bigskip
\end{enumerate}
(Note: See Vardeman (1999 \textit{IIE Transactions}) for a complete treatment
of the issues raised in Problems 1.1 through 1.3.)\bigskip
\item Set up a double integral that gives the probability that the sample
range of $n$ standard normal random variables is between .5 and 2.0. How is
this probability related to the probability that the sample range of $n$ iid
normal $(\mu,\sigma^{2})$ random variables is between .5$\sigma$ and
2.0$\sigma$?
\item It is often helpful to state ``standard errors'' (estimated standard
deviations) corresponding to point estimates of quantities of interest. In a
context where a standard deviation, $\sigma$, is to be estimated by $\bar
{R}/d_{2}(n)$ based on $r$ samples of size $n$, what is a reasonable standard
error to announce? (Be sure that your answer is computable from sample data,
i.e. doesn't involve any unknown process parameters.)
\item Consider the paper weight data in Problem (2.12) of V\&J. Assume that
the 2-way random effects model is appropriate and do the following.
\begin{enumerate}
\item Compute the $\bar{y}_{ij},\ s_{ij}$ and $R_{ij}$ for all $I\times
J=2\times5=10$ Piece$\times$Operator combinations. Then compute both row
ranges of means $\Delta_{i}$ and row sample variances of means $s_{i}^{2}$.
\item Find both range-based and sample variance-based point estimates of the
repeatability standard deviation, $\sigma$.
\item Find both range-based and sample variance-based point estimates of the
reproducibility standard deviation $\sigma_{\mathrm{reproducibility}}%
=\sqrt{\sigma_{\beta}^{2}+\sigma_{\alpha\beta}^{2}}$\ .
\item Get a statistical package to give you the 2-way ANOVA table for these
data. Verify that $s_{\mathrm{pooled}}^{2}=MSE$ and that your sample
variance-based estimate of $\sigma_{\mathrm{reproducibility}}$ from part (c)
is
\[
\sqrt{\max\left( 0,\frac{1}{mI}MSB+\frac{I-1}{mI}MSAB-\frac{1}{m}MSE\right)
}\ .
\]
\item Find a 90\% two-sided confidence interval for the parameter $\sigma$.
\item Use the material in \S1.5 and give an approximate 90\% two-sided
confidence interval for $\sigma_{\mathrm{reproducibility}}$.
\item Find a linear combination of the mean squares from (d) whose expected
value is $\sigma_{\mathrm{overall}}^{2}=\sigma_{\mathrm{reproducibility}}%
^{2}+\sigma^{2}$. All the coefficients in your linear combination will be
positive. In this case, the you may use the next to last paragraph of \S1.5 to
come up with an approximate 90\% two-sided confidence interval for
$\sigma_{\mathrm{overall}}$. Do so.
\item The problem from which the paper weight data are drawn indicates that
specifications of approximately $\pm4$g/m$^{2}$ are common for paper of the
type used in this gage study. These translate to specifications of about
$\pm.16$g for pieces of paper of the size used here. Use these specifications
and your answer to part (g) to make an approximate 90\% confidence interval
for the gage capability ratio
\[
GCR=\frac{6\sigma_{\mathrm{overall}}}{(U-L)}\ .
\]
Used in the way it was in this study, does the scale seem adequate to check
conformance to such specifications?
\item Give (any sensible) point estimates of the fractions of the overall
measurement variance attributable to repeatability and to reproducibility.
\end{enumerate}
\item In a particular (real) thorium detection problem, measurement variation
for a particular (spectral absorption) instrument was thought to be about
$\sigma_{\mathrm{measurement}}=.002$ instrument units. (Division of a
measurement expressed in instrument units by 58.2 gave values in g/l.) Suppose
that in an environmental study, a field sample is to be measured once
(producing $y_{\mathrm{new}}$) on this instrument and the result is to be
compared to a (contemporaneous) measurement of a lab ``blank'' (producing
$y_{\mathrm{old}}$). \ If the field reading exceeds the blank reading by too
much, there will be a declaration that there is a detectable excess amount of
thorium present.
\begin{enumerate}
\item Assuming that measurements are normal, find a critical value
$L_{\mathrm{c}}$ so that the lab will run no more than a 5\% chance of a
``false positive'' result.
\item Based on your answer to (a), what is a ``lower limit of detection,''
$L_{\mathrm{d}}$, for a 90\% probability ($\gamma$) of correctly detecting
excess thorium? What, by the way, is this limit in terms of g/l?
\end{enumerate}
\item Below are $4$ hypothetical samples of size $n=3$. A little calculation
shows that ignoring the fact that there are 4 samples and simply computing
``$s$'' based on 12 observations will produce a ``standard deviation'' much
larger than $s_{\mathrm{pooled}}$. Why is this?
3,6,5 \qquad4,3,1 \qquad8,9,6 \qquad2,1,4
\item In applying ANOVA methods to gage R\&R studies, one often uses linear
combinations of independent mean squares as estimators of their expected
values. Section 1.5 of these notes shows it is possible to also produce
standard errors (estimated standard deviations) for these linear combinations.
Suppose that $MS_{1},MS_{2},\ldots,MS_{k}$ are independent random variables,
$\displaystyle\frac{\nu_{i}MS_{i}}{\mathrm{E}MS_{i}}\sim\chi_{\nu_{i}}^{2}$.
Consider the random variable
\[
U=c_{1}MS_{1}+c_{2}MS_{2}+\cdots+c_{k}MS_{k}\ .
\]
\begin{enumerate}
\item Find the standard deviation of $U$.
\item Your expression from (a) should involve the means E$MS_{i}$, that in
applications will be unknown. Propose a sensible (data-based) estimator of the
standard deviation of $U$ that does not involve these quantities.
\item Apply your result from (b) to give a sensible standard error for the
ANOVA-based estimators of $\sigma^{2}$, $\sigma_{\mathrm{reproducibility}}%
^{2}$ and $\sigma_{\mathrm{overall}}^{2}$.
\end{enumerate}
\item Section 1.7 of the notes presents ``rounded data'' likelihood methods
for normal data with the 2 parameters $\mu$ and $\sigma$. The same kind of
thing can be done for other families of distributions (which can have other
numbers of parameters). For example, the exponential distributions with means
$\theta^{-1}$ can be used. (Here there is the single parameter $\theta$.)
These exponential distributions have cdf's
\[
F_{\theta}(x)=\left\{
\begin{array}
[c]{lcl}%
1-\mbox{exp}(-\theta x) & \mbox{for} & x\geq0\\
0 & \mbox{for} & x<0\ .
\end{array}
\right.
\]
Below is a frequency table for twenty exponential observations that have been
rounded to the nearest integer.%
\begin{tabular}
[c]{cccccc}%
rounded value & 0 & 1 & 2 & 3 & 4\\\hline
frequency & 7 & 8 & 2 & 2 & 1
\end{tabular}
\begin{enumerate}
\item Write out an expression for the appropriate ``rounded data log
likelihood function'' for this problem,
\[
\mathcal{L}(\theta)=\ln L(data|\theta)\ .
\]
(You should be slightly careful here. Exponential random variables only take
values in the interval $(0,\infty)$.)
\item Make a plot of $\mathcal{L}(\theta)$. Use it and identify the maximum
likelihood estimate of $\theta$ based on the rounded data.
\item Use the plot from (b) and make an approximate 90\% confidence interval
for $\theta$. (The appropriate $\chi^{2}$ value has 1 associated degree of freedom.)
\end{enumerate}
\item Below are values of a critical dimension (in .0001 inch above nominal)
measured on hourly samples of size $n=5$ precision metal parts taken from the
output of a CNC (computer numerically controlled) lathe.
{\footnotesize
\begin{tabular}
[c]{c|c|c|c|c|c|c|c|c}%
sample & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8\\\hline
measurements & 4,3,3,2,3 & 2,2,3,3,2 & 4,1,0,$-$1,0 & 2,0,2,1,4 & 2,2,1,3,4 &
2, $-$2,2,1,2 & 0,0,0,2,0 & 1,$-$1,2,0,2
\end{tabular}
}
\begin{enumerate}
\item Compute for each of these samples the ``raw'' sample standard deviation
(ignoring rounding) and the ``Sheppard's correction'' standard deviation that
is appropriate for integer rounded data. How do these compare for the eight
samples above?
\item For each of the samples that have a range of at least 2, use the CONEST
program to find ``rounded normal data'' maximum likelihood estimates of the
normal parameters $\mu$ and $\sigma$. The program as written accepts
observations $\geq1$, so you will need to add an integer to each element of
some of the samples above before doing calculation with the program. (I don't
remember, but you may not be able to input a standard deviation of exactly 0
either.) How do the maximum likelihood estimates of $\mu$ compare to $\bar{x}$
values? How do the maximum likelihood estimates of $\sigma$ compare to both
the raw standard deviations and to the results of applying ``Sheppard's correction''?
\item Consider sample \#2. Make 95\% and 90\% confidence intervals for both
$\mu$ and $\sigma$ using the work of Johnson Lee.
\item Consider sample \#1. Use the CONEST program to get a few approximate
values for ${\mathcal{L}}^{\ast}(\mu)$ and some approximate values for
${\mathcal{L}}^{\ast\ast}(\sigma)$. (For example, look at a contour plot of
$\mathcal{L}$ over a narrow range of means near $\mu$ to get an approximate
value for ${\mathcal{L}}^{\ast}(\mu)$.) Sketch ${\mathcal{L}}^{\ast}(\mu) $
and ${\mathcal{L}}^{\ast\ast}(\sigma)$ and use your sketches and Lee's tables
to produce 95\% confidence intervals for $\mu$ and $\sigma$.
\item What 95\% confidence intervals for $\mu$ and $\sigma$ would result from
a 9th sample, $\{2,2,2,2,2\}$?
\end{enumerate}
\item A single operator measures a single widget diameter 15 times and obtains
a range of $R=3\times10^{-4}$ inches. Then this person measures the diameters
of 12 different widgets once each and obtains a range of $R=8\times10^{-4}$
inches. Give an estimated standard deviation of widget diameters (\textit{not}
including measurement error).
\item Cylinders of (outside) diameter $O$ must fit in ring bearings of
(inside) diameter $I$, producing clearance $C=I-O$. We would like to have some
idea of the variability in actual clearances that will be obtained by ``random
assembly'' of cylinders produced on one production line with ring bearings
produced on another. The gages used to measure $I$ and $O$ are (naturally
enough) different.
In a study using a single gage to measure outside diameters of cylinders,
$n_{O}=10$ different cylinders were measured once each, producing a sample
standard deviation $s_{O}=.001$ inch. In a subsequent study, this same gage
was used to measure the outside diameter of an additional cylinder $m_{O}=5$
times, producing a sample standard deviation $s_{O\text{gage}}=.0005$ inch.
In a study using a single gage to measure inside diameters of ring bearings,
$n_{I}=20$ different inside diameters were measured once each, producing a
sample standard deviation $s_{I}=.003$ inch. In a subsequent study, this same
gage was used to measure the inside diameter of another ring bearing
$m_{I}=10$ times, producing a sample standard deviation $s_{I\text{gage}}=.001
$ inch.
\begin{enumerate}
\item Give a sensible (point) estimate of the standard deviation of $C$
produced under random assembly.
\item Find a sensible standard error for your estimate in (a).
\end{enumerate}
\end{enumerate}
\section{Process Monitoring}
\noindent\textbf{{\Large Methods}}
\renewcommand{\labelenumi}{2.\arabic{enumi}.}
\begin{enumerate}
\item Consider the following hypothetical situation. A ``variables'' process
monitoring scheme is to be set up for a production line, and two different
measuring devices are available for data gathering purposes. Device A produces
precise and expensive measurements and device B produces less precise and less
expensive measurements. Let $\sigma_{\mathrm{measurement}}$ for the two
devices be respectively $\sigma_{\mathrm{A}}$ and $\sigma_{\mathrm{B}}$, and
suppose that the target for a particular critical diameter for widgets
produced on the line is 200.0.
\begin{enumerate}
\item A single widget produced on the line is measured $n=10$ times with each
device and $R_{\mathrm{A}}=2.0$ and $R_{\mathrm{B}}=5.0$. Give estimates of
$\sigma_{\mathrm{A}}$ and $\sigma_{\mathrm{B}}$.
\item Explain why it would not be appropriate to use one of your estimates
from (a) as a ``$\sigma$'' for setting up an $\bar{x}$ and $R$ chart pair for
monitoring the process based on measurements from one of the devices.\bigskip
\noindent Using device A, 10 consecutive widgets produced on the line (under
presumably stable conditions) have (single) measurements with $R=8.0$.\bigskip
\item Set up reasonable control limits for \textit{both} $\bar{x}$ and $R$ for
the future monitoring of the process based on samples of size $n=10$ and
measurements from device A.
\item Combining the information above about the A measurements on $10$
consecutive widgets with your answer to (a), under a model that says
\[
\mathit{observed~diameter}=\mathit{real~diameter}+\mathit{measurement~error}%
\]
where ``\textit{real diameter}'' and ``\textit{measurement error}'' are
independent, give an estimate of the standard deviation of the real diameters.
(See the discussion around page 19 of V\&J.)
\item Based on your answers to parts (a) and (d), set up reasonable control
limits for \textit{both} $\bar{x}$ and $R$ for the future monitoring of the
process based on samples of size $n=5$ and measurements from the cheaper
device, device B.
\end{enumerate}
\item The following are some data taken from a larger set in
\textit{Statistical Quality Control} by Grant and Leavenworth, giving the
drained weights (in ounces) of contents of size No.\ 2$\frac{1}{2}$ cans of
standard grade tomatoes in puree. 20 samples of three cans taken from a
canning process at regular intervals are represented.%
\begin{tabular}
[c]{cccc}%
Sample & $x_{1}$ & $x_{2}$ & $x_{3}$\\\hline
~\thinspace1 & 22.0 & 22.5 & 22.5\\
~\thinspace2 & 20.5 & 22.5 & 22.5\\
~\thinspace3 & 20.0 & 20.5 & 23.0\\
~\thinspace4 & 21.0 & 22.0 & 22.0\\
~\thinspace5 & 22.5 & 19.5 & 22.5\\
~\thinspace6 & 23.0 & 23.5 & 21.0\\
~\thinspace7 & 19.0 & 20.0 & 22.0\\
~\thinspace8 & 21.5 & 20.5 & 19.0\\
~\thinspace9 & 21.0 & 22.5 & 20.0\\
10 & 21.5 & 23.0 & 22.0\\\hline
\end{tabular}
\ \ \ \ \ \ \ \
\begin{tabular}
[c]{cccc}%
Sample & $x_{1}$ & $x_{2}$ & $x_{3}$\\\hline
11 & 20.0 & 19.5 & 21.0\\
12 & 19.0 & 21.0 & 21.0\\
13 & 19.5 & 20.5 & 21.0\\
14 & 20.0 & 21.5 & 24.0\\
15 & 22.5 & 19.5 & 21.0\\
16 & 21.5 & 20.5 & 22.0\\
17 & 19.0 & 21.5 & 23.0\\
18 & 21.0 & 20.5 & 19.5\\
19 & 20.0 & 23.5 & 24.0\\
20 & 22.0 & 20.5 & 21.0\\\hline
\end{tabular}
\begin{enumerate}
\item Suppose that standard values for the process mean and standard deviation
of drained weights ($\mu$ and $\sigma$) in this canning plant are 21.0 oz and
1.0 oz respectively. Make and interpret standards given $\bar{x}$ and $R$
charts based on these samples. What do these charts indicate about the
behavior of the filling process over the time period represented by these data?
\item As an alternative to the standards given range chart made in part (a),
make a standards given $s$ chart based on the 20 samples. How does its
appearance compare to that of the $R$ chart?\bigskip
\noindent Now suppose that no standard values for $\mu$ and $\sigma$ have been provided.\bigskip
\item Find one estimate of $\sigma$ for the filling process based on the
average of the 20 sample ranges, $\bar{R}$, and another based on the average
of 20 sample standard deviations, $\bar{s}$.
\item Use $\overset{=}{x}$ and your estimate of $\sigma$ based on $\bar{R}$
and make retrospective control charts for $\bar{x}$ and $R$. What do these
indicate about the stability of the filling process over the time period
represented by these data?
\item Use $\overset{=}{x}$ and your estimate of $\sigma$ based on $\bar{s}$
and make retrospective control charts for $\bar{x}$ and $s$. How do these
compare in appearance to the retrospective charts for process mean and
variability made in part (d)?
\end{enumerate}
\item The accompanying data are some taken from \textit{Statistical Quality
Control Methods} by I.W. Burr, giving the numbers of beverage cans found to be
defective in periodic samples of 312 cans at a bottling facility.%
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Sample & Defectives\\\hline
~\thinspace1 & ~\thinspace6\\
~\thinspace2 & ~\thinspace7\\
~\thinspace3 & ~\thinspace5\\
~\thinspace4 & ~\thinspace7\\
~\thinspace5 & ~\thinspace5\\
~\thinspace6 & ~\thinspace5\\
~\thinspace7 & ~\thinspace4\\
~\thinspace8 & ~\thinspace5\\
~\thinspace9 & 12\\
10 & ~\thinspace\ 6\\\hline
\end{tabular}
\ \ \ \ \ \ \ \
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Sample & Defectives\\\hline
11 & ~\thinspace7\\
12 & ~\thinspace7\\
13 & ~\thinspace6\\
14 & ~\thinspace6\\
15 & ~\thinspace6\\
16 & ~\thinspace6\\
17 & 23\\
18 & 10\\
19 & ~\thinspace8\\
20 & ~\thinspace5\\\hline
\end{tabular}
\begin{enumerate}
\item Suppose that company standards are that on average $p=.02$ of the cans
are defective. Use this value and make a standards given $p$ chart based on
the data above. Does it appear that the process fraction defective was stable
at the $p=.02$ value over the period represented by these data?
\item Make a retrospective $p$ chart for these data. What is indicated by this
chart about the stability of the canning process?
\end{enumerate}
\item Modern business pressures are making standards for fractions
nonconforming in the range of 10$^{-4}$ to 10$^{-6}$ not uncommon.
\begin{enumerate}
\item What are standards given 3$\sigma$ control limits for a $p$ chart with
standard fraction nonconforming 10$^{-4}$ and sample size 100? What is the
all-OK ARL for this scheme?
\item If $p$ becomes twice the standard value (of 10$^{-4}$), what is the ARL
for the scheme from (a)? (Use your answer to (a) and the binomial distribution
for $n=100$ and $p=2\times10^{-4}$.)
\item What do (a) and (b) suggest about the feasibility of doing process
monitoring for very small fractions defective based on attributes data?
\end{enumerate}
\item Suppose that a dimension of parts produced on a certain machine over a
short period can be thought of as normally distributed with some mean $\mu$
and standard deviation $\sigma=.005$ inch. Suppose further, that values of
this dimension more than .0098 inch from the 1.000 inch nominal value are
considered nonconforming. Finally, suppose that hourly samples of 10 of these
parts are to be taken.
\begin{enumerate}
\item If $\mu$ is exactly on target (i.e. $\mu=1.000$ inch) about what
fraction of parts will be nonconforming? Is it possible for the fraction
nonconforming to ever be any less than this figure?
\item One could use a $p$ chart based on $n=10$ to monitor process performance
in this situation. What would be standards given 3 sigma control limits for
the $p$ chart, using your answer from part (a) as the standard value of $p$?
\item What is the probability that a particular sample of $n=10$ parts will
produce an out-of-control signal on the chart from (b) if $\mu$ remains at its
standard value of $\mu=1.000$ inch? How does this compare to the same
probability for a 3 sigma $\bar{x}$ chart for an $n=10$ setup with a center
line at 1.000? (For the $p$ chart, use a binomial probability calculation. For
the $\bar{x}$ chart, use the facts that $\mu_{\bar{x}}=\mu$ and $\sigma
_{\bar{x}}=\sigma/\sqrt{n}$.) What are the ARLs of the monitoring schemes
under these conditions?
\item Compare the probability that a particular sample of $n=10$ parts will
produce an out-of-control signal on the $p$ chart from (b) to the probability
that the sample will produce an out of control signal on the ($n=10$) 3 sigma
$\bar{x}$ chart first mentioned in (c), supposing that in fact $\mu=1.005$
inch. What are the ARLs of the monitoring schemes under these conditions? What
moral is told by your calculations here and in part (c)?
\end{enumerate}
\item The article ``High Tech, High Touch,'' by J. Ryan, that appeared in
\textit{Quality Progress} in 1987 discusses the quality enhancement processes
used by Martin Marietta in the production of the space shuttle external
(liquid oxygen) fuel tanks. It includes a graph giving counts of major
hardware nonconformities for each of 41 tanks produced. The accompanying data
are approximate counts read from that graph for the last 35 tanks. (The first
six tanks were of a different design than the others and are thus not included here.)%
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Tank & Nonconformities\\\hline
~\thinspace1 & 537\\
~\thinspace2 & 463\\
~\thinspace3 & 417\\
~\thinspace4 & 370\\
~\thinspace5 & 333\\
~\thinspace6 & 241\\
~\thinspace7 & 194\\
~\thinspace8 & 185\\
~\thinspace9 & 204\\
10 & 185\\
11 & 167\\
12 & 157\\
13 & 139\\
14 & 130\\
15 & 130\\
16 & 267\\
17 & 102\\
18 & 130\\\hline
\end{tabular}
\ \ \ \ \ \ \ \
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Tank & Nonconformities\\\hline
19 & 157\\
20 & 120\\
21 & 148\\
22 & ~\thinspace65\\
23 & 130\\
24 & 111\\
25 & ~\thinspace65\\
26 & ~\thinspace74\\
27 & ~\thinspace65\\
28 & 148\\
29 & ~\thinspace74\\
30 & ~\thinspace65\\
31 & 139\\
32 & 213\\
33 & 222\\
34 & ~\thinspace93\\
35 & 194\\
& \\\hline
\end{tabular}
\begin{enumerate}
\item Make a retrospective $c$ chart for these data. Is there evidence of real
quality improvement in this series of counts of nonconformities? Explain.
\item Consider only the last 17 tanks represented above. Does it appear that
quality was stable over the production period represented by these tanks?
(Make another retrospective $c$ chart.)
\item It is possible that some of the figures read from the graph in the
original article may differ from the real figures by as much as, say, 15
nonconformities. Would this measurement error account for the apparent lack of
stability you found in (a) or (b) above? Explain.
\end{enumerate}
\item Boulaevskaia, Fair and Seniva did a study of ``defect detection rates''
for the visual inspection of some glass vials. Vials known to be visually
identifiable as defective were marked with invisible ink, placed among other
vials, and run through a visual inspection process at 10 different time
periods. The numbers of marked defective vials that were detected/captured,
the numbers placed into the inspection process, and the corresponding ratios
for the 10 periods are below.%
\begin{tabular}
[c]{c|rrrrrrrrrr}%
$X=\mbox{number detected/captured}$ & $6$ & $10$ & $15$ & $18$ & $17$ & $2$ &
$7$ & $5$ & $6$ & $5$\\\hline
$n=\mbox{number placed}$ & $30$ & $30$ & $30$ & $30$ & $30$ & $15$ & $15$ &
$15$ & $15$ & $15$\\\hline
$X/n$ & .$2$ & .$33$ & .$5$ & .$6$ & .$57$ & .$13$ & .$47$ & .$33$ & .$4$ &
.$33$%
\end{tabular}
\noindent(Overall, 91 of the 225 marked vials placed into the inspection
process were detected/captured.)
\begin{enumerate}
\item Carefully investigate (and say clearly) whether there is evidence in
these data of instability in the defect detection rate.
\item $91/225=.404$. Do you think that the company these students worked with
was likely satisfied with the 40.4\% detection rate? What, if anything, does
your answer here have to do with the analysis in (a)?
\end{enumerate}
\item (\textbf{Narrow Limit Gaging}) \ Parametric probability model
assumptions can sometimes be used to advantage even where one is ultimately
going to generate and use attributes data. Consider a situation where process
standards are that widget diameters are to be normally distributed with mean
$\mu=5$ and standard deviation $\sigma=1$. Engineering specifications on these
diameters are $5\pm3$.
As a process monitoring device, samples of $n=100$ of these widgets are going
to be checked with a go/no-go gage, and
$X=$the number of diameters in a sample failing to pass the gaging test
will be counted and plotted on an $np$ chart. The design of the go/no-go gage
is up to you to choose. You may design it to pass parts with diameters in any
interval $(a,b)$ of your choosing.
\begin{enumerate}
\item One natural choice of $(a,b)$ is according to the engineering
specifications, i.e. as (2,8). With this choice of go/no-go gage, a 3$\sigma$
control chart for $X$ signals if $X\geq2$. Find the all-OK ARL for this scheme
with this gage.
\item One might, however, choose $(a,b$) in other ways besides according to
the engineering specifications, e.g. as $(5-\delta,\ 5+\delta)$ for some
$\delta$ other than 3. Show that the choice of $\delta=2.71$ and a control
chart that signals if $X\geq3$ will have about the same all-OK ARL as the
scheme from (a).
\item Compare the schemes from (a) and (b) supposing that diameters are in
fact normally distributed with mean $\mu=6$ and standard deviation $\sigma=1$.
\end{enumerate}
\item A one-sided upper CUSUM scheme is used to monitor
$Q=$ the number of defectives in samples of size $n=400$\ .
Suppose that one uses $k_{1}=8$ and $h_{1}=10$. Use the normal approximation
to the binomial distribution to obtain an approximate ARL for this scheme if
$p=.025$.
\item Consider the monitoring of a process that we will assume produces
normally distributed observations $X$ with standard deviations $\sigma=.04$.
\begin{enumerate}
\item Set up both a two-sided CUSUM scheme and a EWMA scheme for monitoring
the process ($Q=X$), using a target value of .13 and a desired all-OK ARL of
roughly 370, if quickest possible detection of a change in mean of size
$\Delta=.02$ is desired.
\item Plot on the same set of axes, the logarithms of the ARLs for your charts
from (a) as functions of $\mu$, the real mean of observations being CUSUMed or
EWMAed. Also plot on this same set of axes the logarithms of ARLs for a
standard 3$\sigma$ Shewhart Chart for individuals. Comment upon how the 3 ARL
curves compare.
\end{enumerate}
\item Shear strengths of spot welds made by a certain robot are approximately
normal with a short term variability described by $\sigma=60$ lbs. The
strengths in samples of $n$ of these welds are going to be obtained and
$\bar{x}$ values CUSUMed.
\begin{enumerate}
\item Give a reference value $k_{2}$, sample size $n$ and a decision interval
$h_{2}$ so that a one-sided (lower) CUSUM scheme for the $\bar{x}$'s will have
an ARL of about 370 if $\mu=800$ lbs and an ARL of about 5 if $\mu=750$ lbs.
\item Find a sample size and a lower Shewhart control limit for $\bar{x}$, say
\#, so that if $\mu$=800 lbs, there will be about 370 samples taken before an
$\bar{x}$ will plot below \#, and if $\mu=750$ there will be on average about
5 samples taken before an $\bar{x}$ will plot below \#.
\end{enumerate}
\item You have data on the efficiency of a continuous chemical production
process. The efficiency is supposed to be about 45\%, and you will use a CUSUM
scheme to monitor the efficiency. Efficiency is computed once per shift, but
from much past data, you know that $\sigma\approx.7\%$.
\begin{enumerate}
\item If you wish quickest possible detection of a shift of .7\% (one standard
deviation) in mean efficiency, design a two-sided CUSUM scheme for this
situation with an all-OK ARL of about 500.
\item Apply your procedure from (a) to the data below. Are any alarms signaled?%
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Shift & Efficiency\\\hline
~\thinspace1 & 45.7\\
~\thinspace2 & 44.6\\
~\thinspace3 & 45.0\\
~\thinspace4 & 44.4\\
~\thinspace5 & 44.4\\
~\thinspace6 & 44.2\\
~\thinspace7 & 46.1\\
~\thinspace8 & 44.6\\
~\thinspace9 & 45.7\\
10 & 44.4\\\hline
\end{tabular}
\ \ \ \ \ \ \ \
\begin{tabular}
[c]{c@{\extracolsep{.3in}}c}%
Shift & Efficiency\\\hline
11 & 45.8\\
12 & 45.4\\
13 & 46.8\\
14 & 45.5\\
15 & 45.8\\
16 & 46.4\\
17 & 46.0\\
18 & 46.3\\
19 & 45.6\\
& \\\hline
\end{tabular}
\item Make a plot of ``raw'' CUSUMs using a reference value of 45\%. From your
plot, when do you think that the mean efficiency shifted away from 45\%?
\item What are the all-OK and ``$\mu=45.7$\%'' ARLs if one employs your
procedure from (a) modified by giving both the high and low side charts ``head
starts'' of $u=v=h_{1}/2=h_{2}/2$?
\item Repeat part (a) using a EWMA scheme rather than a CUSUM scheme.
\item Apply your procedure from (e) to the data. Are any alarms signaled? Plot
your EWMA values. Based on this plot, when do you think that the mean
efficiency shifted away from 45\%?
\end{enumerate}
\item Consider the problem of designing a EWMA control chart for $\bar{x}$'s,
\textit{where in addition to choosing chart parameters one gets to choose the
sample size}, $n$. In such a case, one can choose monitoring parameters to
produce both a desired (large) on-target ARL and a desired (small) off-target
ARL $\delta$ units away from the target.
Suppose, for example, that a process standard deviation is $\sigma=1$ and one
wishes to design for an ARL of 370 if the process mean, $\mu$, is on target,
and an ARL of no more than 5.0 if $\mu$ is off target by as much as
$\delta=1.0$. Using $\sigma_{Q}=\sigma/\sqrt{n}$ and $\mathit{shift}%
=\delta/\sigma_{Q}$ and reading from one of the graphs in Crowder's 1989
\textit{JQT} paper, values of $\lambda^{\mathrm{opt}}$ for detecting a change
in process mean of this size using EWMAs of $\bar{x}$'s are approximately as below:%
\begin{tabular}
[c]{c|ccccccccc}%
$n$ & $1$ & $2$ & $3$ & $4$ & $5$ & $6$ & $7$ & $8$ & $9$\\\hline
$\lambda^{\mathrm{opt}}$ & .$14$ & .$08$ & .$06$ & .$05$ & .$05$ & .$04$ &
.$04$ & .$04$ & .$03$%
\end{tabular}
Use Crowder's EWMA ARL program (and some trial and error) to find values of
$\mathcal{K}$ that when used with the $\lambda$'s above will produce an
on-target ARL of 370. Then determine how large $n$ must then be in order to
meet the 370 and 5.0 ARL requirements. How does this compare to what Table 4.8
says is needed for a two-sided CUSUM to meet the same criteria?
\item Consider a combination of high and low side decision interval CUSUM
schemes with $h_{1}=h_{2}=2.5$, $u=1$, $v=-1$, $k_{1}=.5$ and $k_{2}=-.5$.
\ Suppose that $Q$'s are iid normal variables with $\sigma_{Q}=1.0$. Find the
ARLs for the combined scheme if $\mu_{Q}=0$ and then if $\mu_{Q}=1.0$. (You
will need to use Gan's CUSUM ARL program and Yashchin's expression for
combining high and low side ARLs.)
\item Set up two different $X$/$MR$ monitoring chart pairs for normal
variables $Q$, in the case where the standards are $\mu_{Q}=5$ and $\sigma
_{Q}=1.715$ and the all-OK ARL desired is 250. For these combinations, what
ARLs are relevant if in fact $\mu_{Q}=5.5$ and $\sigma_{Q}=2.00$? (Run
Crowder's $X/MR$ ARL program to get these with minimum interpolation.)
\item If one has discrete or rounded data and insists on using $\bar{x}$
and/or $R$ charts, \S1.7.1 shows how these may be based on the exact all-OK
distributions of $\bar{x}$ and/or $R$ (and not on normal theory control
limits). Suppose that measurements arise from integer rounding of normal
random variables with $\mu=2.25$ and $\sigma=.5$ (so that essentially only
values 1, 2, 3 and 4 are ever seen). Compute the four probabilities
corresponding to these rounded values (and ``fudge'' them slightly so that
they total to 1.00). Then, for $n=4$ compute the probability distributions of
$\bar{x}$ and $R$ based on iid observations from this distribution. Then run
Karen (Jensen) Hulting's DIST program and compare your answers to what her
program produces.
\item Suppose that standard values of process parameters are $\mu=17$ and
$\sigma=2.4$.
\begin{enumerate}
\item Using sample means $\bar{x}$ based on samples of size $n=4$, design both
a combined high and low side CUSUM scheme (with 0 head starts) and a EWMA
scheme to have an all-OK ARL of 370 and quickest possible detection of a shift
in process of mean of size .6.
\item If, in fact, the process mean is $\mu=17.5$ and the process standard
deviation is $\sigma=3.0$, show how you would find the ARL associated with
your schemes from (a). (You don't need to actually interpolate in the tables,
but do compute the values you would need in order to enter the tables, and say
which tables you must employ.)
\end{enumerate}
\item A discrete variable $X$ can take only values 1, 2, 3, 4 and 5.
Nevertheless, managers decide to ``monitor process spread'' using the ranges
of samples of size $n=2$. Suppose, for sake of argument, that under standard
plant conditions observations are iid and uniform on the values $1$ through
$5$ (i.e. $P[X=1]=P[X=2]=P[X=3]=P[X=4]=P[X=5]=.2$).
\begin{enumerate}
\item Find the distribution of $R$ for this situation. (Note that $R$ has
possible values 0, 1, 2, 3 and $4$. You need to reason out the corresponding probabilities.)
\item The correct answer to part (a) has E$R=1.6$. This implies that if many
samples of size $n=2$ are taken and $\bar{R}$ computed, one can expect a mean
range near $1.6$. Find \textit{and criticize} corresponding normal theory
control limits for $R$.
\item Suppose that instead of using a normal-based Shewhart chart for $R$, one
decides to use a high side Shewhart-CUSUM scheme (for ranges) with reference
value $k_{1}=2$ and starting value $0$, that signals the first time any range
is $4$ or the CUSUM is $3$ or more. Use your answer for (a) and show how to
find the ARL for this scheme. (You need not actually carry through the
calculations, but show explicitly how to set things up.)
\end{enumerate}
\item SQC novices faced with the task of analyzing a sequence of (say) $m$
individual observations collected over time often do the following: Compute
``$\bar{x}$'' and ``$s$'' from the $m$ data values and apply ``control
limits'' $\bar{x}\pm3s$ to the $m$ individuals. Say why this method of
operation is essentially useless. (Compare Problem 1.8.)
\item Consider an $\bar{x}$ chart based on standards $\mu_{0}$ and $\sigma
_{0}$ and samples of size $n$, where only the ``one point outside 3$\sigma$
limits'' alarm rule is in use.
\begin{enumerate}
\item Find ARLs if in fact $\sigma=\sigma_{0}$, but $\sqrt{n}|\mu-\mu
_{0}|/\sigma$ is respectively 0, 1, 2, and 3.
\item Find ARLs if in fact $\mu=\mu_{0}$, but $\sigma/\sigma_{0}$ is
respectively .5, .8, 1, 1.5 and 2.0.
\end{enumerate}
\noindent\textbf{{\Large Theory}}
\setcounter{enumi}{20}
\item Consider the problem of samples of size $n=1$ in variables control
charting contexts, and the notion of there using moving ranges for various
purposes. This problem considers a little theory that may help illustrate the
implications of using an average moving range, $\overline{MR}$, in the
estimation of $\sigma$ in such circumstances.
Suppose that $X_{1}$ and $X_{2}$ are independent normal random variables with
a common variance $\sigma^{2}$, but possibly different means $\mu_{1}$ and
$\mu_{2}$. (You may, if you wish, think of these as widget diameters made at
times 1 and 2, where the process mean has potentially shifted between the
sampling periods.)
\begin{enumerate}
\item What is the distribution of $X_{1}-X_{2}$? The distribution of
$(X_{1}-X_{2})/\sigma$?
\item For $t>0$, write out in terms of values of $\Phi$ the probability
\[
P[|(X_{1}-X_{2})/\sigma|\leq t]\ .
\]
In doing this, abbreviate $(\mu_{1}-\mu_{2})/\sigma$ as $\delta$.
\item Notice that in part (b), you have found the cumulative distribution
function for the random variable $MR/\sigma$. Differentiate your answer to (b)
to find the probability density for $MR/\sigma$ and then use this probability
density to write down an integral that gives the mean of the random variable
$MR/\sigma$, E$(MR/\sigma)$. (You may abbreviate the standard normal pdf as
$\phi$, rather than writing everything out.)\bigskip
Vardeman used his trusty HP 15C (and its definite integral routine) and
evaluated the integral in (c) for various values of $\delta$. Some values that
he obtained are below.%
\begin{tabular}
[c]{c|cccccccc}%
$\delta$ & 0 & $\pm.1$ & $\pm.2$ & $\pm.3$ & $\pm.4$ & $\pm.5$ & $\pm1.0$ &
$\pm1.5$\\\hline
E$(MR/\sigma)$ & 1.1284 & 1.1312 & 1.1396 & 1.1537 & 1.1732 & 1.198 & 1.399 &
1.710
\end{tabular}%
\begin{tabular}
[c]{cccccc}%
$\pm2.0$ & $\pm2.5$ & $\pm3.0$ & $\pm3.5$ & $\pm4.0$ & large $|\delta
|$\\\hline
2.101 & 2.544 & 3.017 & 3.506 & 4.002 & $|\delta|$%
\end{tabular}
(Notice that as expected, the $\delta=0$ value is $d_{2}$ for a sample of size
$n=2$.)\bigskip
\item Based on the information above, argue that for $n$ independent normal
random variables $X_{1},X_{2},\ldots,X_{n}$ with common standard deviation
$\sigma$, if $\mu_{1}=\mu_{2}=\cdots=\mu_{n}$ then the sample average moving
range, $\overline{MR}$, when divided by 1.1284 has expected value $\sigma$.
\item Now suppose that instead of being constant, the successive means,
$\mu_{1},\mu_{2},\ldots,\mu_{n}$ in fact exhibit a reasonably strong linear
trend. That is suppose that $\mu_{t}=\mu_{t-1}+\sigma$. What is the expected
value of $\overline{MR}$/1.1284 in this situation. Does $\overline{MR}$/1.1284
seem like a sensible estimate of $\sigma$ here?
\item In a scenario where the means could potentially ``bounce around``
according to $\mu_{t}=\mu_{t-1}\pm k\sigma$, how large might $k$ be without
destroying the usefulness of $\overline{MR}$/1.1284 as an estimate of $\sigma
$? Defend your opinion on the basis of the information contained in the table above.
\end{enumerate}
\item Consider the kind of discrete time Markov Chain with a single absorbing
state used in \S2.1 to study the run length properties of process monitoring
schemes. Suppose that one wants to know not the mean times to absorption from
the nonabsorbing states, but the \textit{variances} of those times. Since for
a generic random variable $X$, Var$X=$E$X^{2}-($E$X)^{2}$, once one has mean
times to absorption (belonging to the vector $\mbox{\boldmath$L$}%
=(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})^{-1}\mbox{\boldmath$1$}$) it
suffices to compute the expected squares of times to absorption. \ Let
$\mbox{\boldmath$M$}$ be an $m\times1$ vector containing expected squares of
times to absorption (from states $S_{1}$ through $S_{m}$). Set up a system of
$m$ equations for the elements of $\mbox{\boldmath$M$}$ in terms of the
elements of $\mbox{\boldmath$R$},\mbox{\boldmath$L$}$ and $\mbox{\boldmath$M$%
}$. Then show that in matrix notation
\[
\mbox{\boldmath$M$}=(\mbox{\boldmath$I$}-\mbox{\boldmath$R$})^{-1}%
(\mbox{\boldmath$I$}+2\mbox{\boldmath$R$}(\mbox{\boldmath$I$}-\mbox
{\boldmath$R$})^{-1})\mbox{\boldmath$1$}\ .
\]
\item So-called ``Stop-light Control'' or ``Target Area Control'' of a
measured characteristic $X$ proceeds as follows. One first defines ``Green''
(OK), ``Yellow'' (Marginal) and ``Red'' (Unacceptable) regions of possible
values of $X$. One then periodically samples a process according to the
following rules. At a given sampling period, a single item is measured and if
it produces a Green $X$, no further action is necessary at the time period in
question. If it produces a Red $X$, lack of control is declared. If it
produces a Yellow $X$, a second item is immediately sampled and measured. If
this second item produces a Green $X$, no further action is taken at the
period in question, but otherwise lack of control is declared.
Suppose that in fact a process under stop-light monitoring is stable and
$p_{\mbox{G}}=P[X$ is Green], $p_{\mbox{Y}}=P[X$ is Yellow] and $p_{\mbox{R}%
}=1-p_{\mbox{G}}-p_{\mbox{Y}}=P[X$ is Red].
\begin{enumerate}
\item Find the mean number of sampling periods from the beginning of
monitoring through the first out-of-control signal, in terms of the $p$'s.
\item Find the mean total number of items measured from the beginning of
monitoring through the first out-of-control signal, in terms of the $p$'s.
\end{enumerate}
\item Consider the Run-Sum control chart scheme discussed in \S2.2. In the
notes Vardeman wrote out a transition matrix for a Markov Chain analysis of
the behavior of this scheme.
\begin{enumerate}
\item Write out the corresponding system of 8 linear equations in 8 mean times
to absorption for the scheme. Note that the mean times till signal from
``$T=-0$'' and ``$T=+0$'' states are the same linear combinations of the 8
mean times and must thus be equal.
\item Find a formula for the ARL of this scheme. This can be done as follows.
Use the equations for the mean times to absorption from states ``$T=+3$'' and
``$T=+2$'' to find a constant $\kappa_{+2,+3}$ such that $L_{+3}%
=\kappa_{+2,+3}L_{+2}$. Find similar constants $\kappa_{+1,+2}$,
$\kappa_{+0,+1}$, $\kappa_{-2,-3}$, $\kappa_{-1,-2}$ and $\kappa_{-0,-1}$.
Then use these constants to write a single linear equation for $L_{+0}=L_{-0}$
that you can solve for $L_{+0}=L_{-0}$.
\end{enumerate}
\item Consider the problem of monitoring
\[
X=\mbox{the number of nonconformities on a widget}\ .
\]
Suppose the standard for $\lambda$ is so small that a usual 3$\sigma$ Shewhart
control chart will signal any time $X_{t}>0$. On intuitive grounds the
engineers involved find such a state of affairs unacceptable. The replacement
for the standard Shewhart scheme that is then being contemplated is one that
signals at time $t$ if \newline \hspace*{0.63in} i) $X_{t}\geq2$ \newline or
\hspace{0.4in} ii) $X_{t}=1$ and any of $X_{t-1}$, $X_{t-2}$, $X_{t-3}$ or
$X_{t-4}$ is also equal to 1.
Show how you could find an ARL for this scheme. (Give either a matrix equation
or system of linear equations one would need to solve. State clearly which of
the quantities in your set-up is the desired ARL.)
\item Consider a discrete distribution on the (positive and negative) integers
specified by the probability function $p(\cdot)$. This distribution will be
used below to help predict the performance of a Shewhart type monitoring
scheme that will sound an alarm the first time that an individual observation
$X_{t}$ is 3 or more in absolute value (that is, the alarm bell rings the
first time that $|X_{t}|\geq3$).
\begin{enumerate}
\item Give an expression for the ARL of the scheme in terms of values of
$p(\cdot)$, if observations $X_{1},X_{2},X_{3},\ldots$ are iid with
probability function $p(\cdot)$.
\item Carefully set up and show how you would use a transition matrix for an
appropriate Markov Chain in order to find the ARL of the scheme under a model
for the observations $X_{1},X_{2},X_{3},\dots$ specified as follows:
\hspace*{0.5in} $X_{1}$ has probability function $p(\cdot)$, and given
$X_{1},X_{2},\dots,X_{t-1}$, \newline \hspace*{0.5in} the variable $X_{t}$ has
probability function $p(\cdot-X_{t-1})$
You need not carry out any matrix manipulations, but be sure to fully explain
how you would use the matrix you set up.
\end{enumerate}
\item Consider the problem of finding ARLs for a Shewhart individuals chart
supposing that observations $X_{1},X_{2},X_{3},\ldots$ are not iid, but rather
realizations from a so-called AR(1) model. That is, suppose that in fact for
some $\rho$ with $|\rho|<1$
\[
X_{t}=\rho X_{t-1}+\epsilon_{t}%
\]
for a sequence of iid normal random variables $\epsilon_{1},\epsilon
_{2},\ldots$ each with mean 0 and variance $\sigma^{2}$. Notice that under
this model the conditional distribution of $X_{t+1}$ given all previous
observations is normal with mean $\rho X_{t}$ and variance $\sigma^{2}$.
Consider plotting values $X_{t}$ on a Shewhart chart with control limits $UCL
$ and $LCL$.
\begin{enumerate}
\item For $LCL__M$, for a constant $M>k_{1}$.
Carefully derive an integral equation similar to the one above that must be
satisfied by the ARL function of the combined Shewhart-CUSUM scheme.
\item Consider the problem of finding ARLs for CUSUM schemes where
$Q_{1},Q_{2},\ldots$ are iid exponential with mean 1. That is, suppose that
one is CUSUMing iid random variables with common probability density
\[
f(x)=\left\{
\begin{array}
[c]{ll}%
e^{-x} & \mbox{for}~x>0\\
0 & \mbox{otherwise}\ .
\end{array}
\right.
\]
\begin{enumerate}
\item Argue that the ARL function of a high side CUSUM scheme for this
situation satisfies the differential equation
\[
L^{\prime}(u)=\left\{
\begin{array}
[c]{lll}%
L(u)-L(0)-1 & \mbox{for} & 0\leq u\leq k_{1}\\
L(u)-L(u-k_{1})-1 & \mbox{for} & k_{1}\leq u\ .
\end{array}
\right.
\]
(Vardeman and Ray (\textit{Technometrics}, 1985) solve this differential
equation and a similar one for low side CUSUMs to obtain ARLs for exponential
$Q$.)
\item Suppose that one decides to approximate high side exponential CUSUM ARLs
by using simple numerical methods to solve (approximately) the integral
equation discussed in class. For the case of $k_{1}=1.5$ and $h_{1}=4.0$,
write out the $\mbox{\boldmath$R$}$ matrix (in the equation $\mbox
{\boldmath$L$}=\mbox{\boldmath$1$}+\mbox{\boldmath$RL$}$) one has using the
quadrature rule defined by $m=8$, $a_{i}=(2i-1)h_{1}/2m$ and each $w_{i}%
=h_{1}/m$.
\item Consider making a Markov Chain approximation to the ARL referred to in
part (b). For $m=8$ and the discretization discussed in class, write out the
$\mbox{\boldmath$R$}$ matrix that would be used in this case. How does this
matrix compare to the one in part (b)?
\end{enumerate}
\item Consider the problem of determining the run length properties of a high
side CUSUM scheme with head start $u$, reference value $k$ and decision
interval $h$ if iid continuous observations $Q_{1},Q_{2},\ldots$ with common
probability density $f$ and cdf $F$ are involved. Let $T$ be the run length
variable. In class, Vardeman concentrated on $L(u)=$E$T$, the ARL of the
scheme. But other features of the run length distribution might well be of
interest in some applications.
\begin{enumerate}
\item The variance of $T$, Var $T=$E$T^{2}-L^{2}(u)$ might also be of
importance in some instances. Let $M(u)=$E$T^{2}$ and argue very carefully
that $M(u)$ must satisfy the integral equation
\[
M(u)=1+\left( M(0)+2L(0)\right) F(k-u)+\int_{0}^{h}\left(
M(s)+2L(s)\right) f(s+k-u)ds\ .
\]
(Once one has found $L(u)$, this gives an integral equation that can be solved
for $M(u)$, leading to values for Var $T$, since then Var $T=M(u)-L^{2}(u)$.)
\item The probability function of $T$, $P(t,u)=Pr[T=t]$ might also be of
importance in some instances. Express $P(1,u)$ in terms of $F$. Then argue
very carefully that for $t>1$, $P(t,u)$ must satisfy the recursion
\[
P(t,u)=P(t-1,0)F(k-u)+\int_{0}^{h}P(t-1,s)f(s+k-u)ds\ .
\]
(There is thus the possibility of determining successively the function
$P(1,u)$, then the function $P(2,u)$, then the function $P(3,u)$, etc.)
\end{enumerate}
\item In \S2.2, Vardeman considered a ``two alarm rule monitoring scheme'' due
to Wetherill and showed how find the ARL for that scheme by solving two linear
equations for quantities $L_{1}$ and $L_{2}$. It is possible to extend the
arguments presented there and find the \textit{variance} of the run length.
\begin{enumerate}
\item For a generic random variable $X$, express both Var $X$ and E$(X+1)^{2}$
in terms of E$X$ and E$X^{2}$.
\item Let $M_{1}$ be the expected square of the run length for the Wetherill
scheme and let $M_{2}$ be the expected square of the number of additional
plotted points required to produce an out-of-control signal if there has been
no signal to date and the current plotted point is between 2- and 3-sigma
limits. Set up two equations for $M_{1}$ and $M_{2}$ that are linear in
$M_{1}$, $M_{2}$, $L_{1}$ and $L_{2}$.
\item The equations from (b) can be solved simultaneously for $M_{1}$ and
$M_{2} $. Express the variance of the run length for the Wetherill scheme in
terms of $M_{1}$, $M_{2}$, $L_{1}$ and $L_{2}$.
\end{enumerate}
\item Consider a Shewhart control chart with the single extra alarm rule
``signal if 2 out of any 3 consecutive points fall between 2$\sigma$ and
3$\sigma$ limits on one side of the center line.'' Suppose that points
$Q_{1},Q_{2},Q_{3},\ldots$ are to be plotted on this chart and that the $Q$s
are iid.
Use the notation%
\begin{align*}
p_{\text{A}} & =\text{the probability }Q_{1}\text{ falls outside }%
3\sigma\text{ limits}\\
p_{\text{B}} & \text{=}\text{the probability }Q_{1}\text{ falls between
}2\sigma\text{ and }3\sigma\text{ limits above the center line}\\
p_{\text{C}} & \text{=}\text{the probability }Q_{1}\text{ falls between
}2\sigma\text{ and }3\sigma\text{ limits below the center line}\\
p_{\text{D}} & \text{=}\text{the probability }Q_{1}\text{ falls inside
}2\sigma\text{ limits}%
\end{align*}
and set up a Markov Chain that you can use to find the ARL of this scheme
under the iid model for the $Q$s. (Be sure to carefully and completely define
your state space, write out the proper transition matrix and indicate which
entry of ($\mbox{\boldmath$I$}-\mbox{\boldmath$R$})^{-1}\mbox{\boldmath$1$}$
gives the desired ARL.)
\item A process has a ``good'' state and a ``bad'' state. Suppose that when in
the good state, the probability that an observation on the process plots
outside of control limits is $g$, while the corresponding probability for the
bad state is $b$. Assume further that if the process is in the good state at
time $t-1$, there is a probability $d$ of degradation to the bad state before
an observation at time $t$ is made. (Once the process moves into the bad state
it stays there until that condition is detected via process monitoring and
corrected.) Find the ``ARL''/mean time of alarm, if the process is in the good
state at time $t=0$ and observation starts at time $t=1$.
\item Consider the following (nonstandard) process monitoring scheme for a
variable $X$ that has ideal value 0. Suppose $h(x)>0$ is a function with
$h(x)=h(-x)$ that is decreasing in $|x|$. ($h$ has its maximum at $0$ and
decreases symmetrically as one moves away from $0$.) Then suppose that
\newline \hspace*{0.6in} i) control limits for $X_{1}$ are $\pm h(0)$,
\newline and \hspace{0.26in} ii) for $t>1$ control limits for $X_{t}$ are $\pm
h(X_{t-1})$.
(Control limits vary. The larger that $|X_{t-1}|$ is, the tighter are the
limits on $X_{t}$.) Discuss how you would find an ARL for this scheme for iid
$X$ with marginal probability density $f$. (Write down an appropriate integral
equation, briefly discuss how you would go about solving it and what you would
do with the solution in order to find the desired ARL.)
\item Consider the problem of monitoring integer-valued variables $Q_{1}%
,Q_{2},Q_{3},...$(we'll suppose that $Q$ can take any integer value, positive
or negative). Define%
\[
h(x)=4-|x|
\]
and consider the following definition of an alarm scheme:
\quad\quad1) alarm at time $i=1$ if $|Q_{1}|\geq4$, and
\quad\quad2) for $i\geq2$ alarm at time $i$ if $|Q_{i}|\geq h(Q_{i-1})$.
For integer $j$, let $q_{j}=P[Q_{1}=j]$ and suppose the $Q_{i}$ are iid.
Carefully describe how to find the ARL for this situation. (You don't need to
produce a formula, but you do need to set up an appropriate MC and tell me
exactly/completely what to do with it in order to get the ARL.)
\item Consider the problem of monitoring integer-valued variables $Q_{t}$
(we'll suppose that $Q$ can take any integer value, positive or negative). A
combination of individuals and moving range charts will be used according to
the scheme that at time 1, $Q_{1}$ alone will be plotted, while at time $t>1$
both $Q_{t}$ and $MR_{t}=|Q_{t}-Q_{t-1}|$ will be plotted. The alarm will ring
at the first period where $|Q_{t}|>3$ or $MR_{t}>4$. Suppose that the
variables $Q_{1},Q_{2},\ldots$ are iid and $p_{i}=P[Q_{1}=i]$. Consider the
problem of finding an average run length in this scenario.
\begin{enumerate}
\item Set up the transition matrix for an 8 state Markov Chain describing the
evolution of this charting method from $t=2$ onward, assuming that the alarm
doesn't ring at $t=1$. (State $\mbox{S}_{i}$ for $i=-3$, $-2$, $-1$, 0, 1, 2,
3 will represent the situation ``no alarm yet and the most recent observation
is $i$'' and there will be an alarm state.)
\item Given values for the $p_{i}$, one could use the transition matrix from
part (a) and solve for mean times to alarm from the states $\mbox{S}_{i}$.
Call these $L_{-3}$, $L_{-2}$, $L_{-1}$, $L_{0}$, $L_{1}$, $L_{2}$, and
$L_{3}$. Express the average run length of the whole scheme (including the
plotting at time $t=1$ when only $Q_{1}$ is plotted) in terms of the $L_{i}$
and $p_{i}$ values.
\end{enumerate}
\end{enumerate}
\section{Engineering Control and Stochastic Control Theory}
\renewcommand{\labelenumi}{3.\arabic{enumi}.}
\begin{enumerate}
\item Consider the use of the PI(D) controller $\Delta X(t)=.5E(t)+.25\Delta
E(t)$ in a situation where the control gain, $G$, is 1 and the target for the
controlled variable is $T(t) \doteq0$. Suppose that no control actions are
applied before the time $t=0$, but that for $t\geq0$, $E(t)$ and $\Delta E(t)$
are used to make changes in the manipulated variable, $\Delta X(t)$, according
to the above equation. Suppose further that the value of the controlled
variable, $Y(t)$, is the sum of what the process would do with no control, say
$Z(t)$, and the sum of effects at time $t$ of all changes in the manipulated
variable made in previous periods based on $E(0$), $\Delta E(0)$, $E(1)$,
$\Delta E(1)$, $E(2)$, $\Delta E(2),\ldots,E(t-1)$, $\Delta E(t-1)$.
Consider 3 possible patterns of impact at time $s$ of a change in the
manipulated variable made at time $t$, $\Delta X(t)$ :
\begin{tabbing}
\hspace{.3in} \= Pattern 1: \quad\= \kill\> Pattern 1: \> The effect on
$Y(s)$ is $1\times\Delta X(t)$ for all $s\geq t+1$ (a control action takes
its \\ \> \> full effect immediately). \\ \> Pattern 2: \> The effect on
$Y(t+1)$ is 0, but the effect on $Y(s)$ is $1\times\Delta X(t)$ for all \\ \>
\> $s\geq t+2$ (there is one period of dead time, after which a control action
\\ \> \> immediately takes its full effect). \\ \> Pattern 3: \> The effect on
$Y(s)$ is $1\times(1-2^{t-s})\Delta X(t)$ for all $s\geq t+1$ (there is an \\
\> \> exponential/geometric pattern in the way the impact of $\Delta X(t)$ is
felt, \\ \> \> the full effect only being seen for large $s$).
\end{tabbing}
Consider also 3 possible deterministic patterns of uncontrolled process
behavior, $Z(t)$:
\begin{tabbing}
\hspace{.3in} \= Pattern 1: \quad\= \kill\> Pattern A: \> $Z(t)=-3$ for all
$t\geq-1$ (the uncontrolled process would remain \\ \> \> constant, but off
target). \\ \> Pattern B: \> $Z(t)=-3$ for all $-1\leq t\leq5$, while $Z(t)=3$
for all $6\leq t$ (there is a \\ \> \> step change in where the uncontrolled
process would be). \\ \> Pattern C: \> $Z(t)=-3+t$ for all $t\geq-1$ (there is
a linear trend in where the \\ \> \> uncontrolled process would be).
\end{tabbing}
For each of the $3\times3=9$ combinations of patterns in the impact of changes
in the manipulated variable and behavior of the uncontrolled process, make up
a table giving at times $t=-1,0,1,2,\ldots,10$ the values of $Z(t)$, $E(t)$,
$\Delta E(t)$, $\Delta X(t)$ and $Y(t)$.
\item Consider again the PI(D) controller of Problem 3.1. Suppose that the
target is $T(t)$, where $T(t)=0$ for $t\leq5$ and $T(t)=3$ for $t>5$. For the
Pattern 1 of impact of control actions and Patterns A, B and C for $Z(t)$,
make up tables giving at times $t=-1,0,1,2,\ldots,10$ the values of $Z(t)$,
$T(t)$, $E(t)$, $\Delta E(t)$, $\Delta X(t)$ and $Y(t)$.
\item Consider again the PI(D) controller of Problem 3.1 and
\begin{tabbing}
\hspace{.3in} \= Pattern 1: \quad\= \kill\> Pattern D: \> $Z(t)=(-1)^{t}$ (the
uncontrolled process would oscillate around the \\ \> \> target).
\end{tabbing}
For the Patterns 1 and 2 of impact of control actions, make up tables giving
at times $t=-1,0,1,2,\dots,10$ the values of $Z(t)$, $T(t)$, $E(t)$, $\Delta
E(t)$, $\Delta X(t)$ and $Y(t)$.
\item There are two tables here giving some values of an uncontrolled process
$Z(t)$ that has target $T(t)\doteq0$. Suppose that a manipulated variable $X$
is available and that the simple (integral only) control algorithm
\[
\Delta X(t)=E(t)
\]
will be employed, based on an observed process $Y(t)$ that is the sum of
$Z(t)$ and the effects of all relevant changes in $X$.
Consider two different scenarios:
\begin{enumerate}
\item a change of $\Delta X$ in the manipulated variable impacts all
subsequent values of $Y(t)$ by the addition of an amount $\Delta X$, and
\item there is one period of dead time, after which a change of $\Delta X$ in
the manipulated variable impacts all subsequent values of $Y(t)$ by the
addition of an amount $\Delta X$.
\end{enumerate}
Fill in the two tables according to these two scenarios and then comment on
the lesson they seem to suggest about the impact of dead time on the
effectiveness of PID control.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{Table for Problem 3.4(a), No Dead Time}
\begin{tabular}
[c]{|c|c|c|c|c|}\hline
$t$ & $Z(t)$ & $T(t)$ & $Y(t)$ & $E(t)=\Delta X(t)$\\\hline
$0$ & $-1$ & $0$ & $-1$ & \\\hline
$1$ & $-1$ & $0$ & & \\\hline
$2$ & $-1$ & $0$ & & \\\hline
$3$ & $-1$ & $0$ & & \\\hline
$4$ & $-1$ & $0$ & & \\\hline
$5$ & $-1$ & $0$ & & \\\hline
$6$ & $-1$ & $0$ & & \\\hline
$7$ & $-1$ & $0$ & & \\\hline
$8$ & $-1$ & $0$ & & \\\hline
$9$ & $-1$ & $0$ & & \\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{Table for Problem 3.4(a), One Period of Dead Time}
\begin{tabular}
[t]{|c|c|c|c|c|}\hline
$t$ & $Z(t)$ & $T(t)$ & \ $Y(t)$ \ & $E(t)=\Delta X(t)$\\\hline
$0$ & $-1$ & $0$ & $-1$ & \\\hline
$1$ & $-1$ & $0$ & & \\\hline
$2$ & $-1$ & $0$ & & \\\hline
$3$ & $-1$ & $0$ & & \\\hline
$4$ & $-1$ & $0$ & & \\\hline
$5$ & $-1$ & $0$ & & \\\hline
$6$ & $-1$ & $0$ & & \\\hline
$7$ & $-1$ & $0$ & & \\\hline
$8$ & $-1$ & $0$ & & \\\hline
$9$ & $-1$ & $0$ & & \\\hline
\end{tabular}
\
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\item On pages 87 and 88 V\&J suggest that over-adjustment of a process will
increase rather than decrease variation. In this problem we will investigate
this notion mathematically. Imagine periodically sampling a widget produced by
a machine and making a measurement $y_{i}$. Conceptualize the situation as
\[
y_{i}=\mu_{i}+\epsilon_{i}%
\]
where
\begin{tabbing}
\hspace*{.8in} \= \kill\> $\mu_{i}=$ the true machine setting (or widget
diameter) at time $i$ \\ and \> $\epsilon_{i}=$ ``random'' variability at time
$i$ affecting only measurement $i$\ .
\end{tabbing}
Further, suppose that the (coded) ideal diameter is 0 and $\mu_{i}$ is the sum
of natural machine drift and adjustments applied by an operator up through
time $i$. That is, with
\begin{tabbing}
\hspace*{.8in} \= \kill\> $\gamma_{i}=$ the machine drift between time $i-1$
and time $i$ \\ and \> $\delta_{i}=$ the operator (or automatic controller's)
adjustment applied \\ \> \qquad between time $i-1$ and time $i$
\end{tabbing}
suppose that $\mu_{0}=0$ and for $j\geq1$ we have
\[
\mu_{j}=\sum_{i=1}^{j}\gamma_{i}+\sum_{i=1}^{j}\delta_{i}\ .
\]
We will here consider the (integral-only) adjustment policies for the machine
\[
\delta_{i}=-\alpha y_{i-1}\quad\mbox{for an}\quad\alpha\in\lbrack0,1]\ .
\]
It is possible to verify that for $j\geq1$
\begin{tabbing}
\hspace*{.5in} \= under policy A: \quad\= \kill\> if \quad$\alpha= 0:$ \>
$y_{j}=\sum_{i=1}^{j}\gamma_{i}+\epsilon_{j}$ \\ \> if \quad$\alpha= 1:$ \>
$y_{j}=\gamma_{j}-\epsilon_{j-1}+\epsilon_{j}$ \\ and \> if \quad$\alpha
\in(0,1):$ \> $y_{j} = \sum_{i=1}^{j}\gamma_{i}(1-\alpha)^{j-i} -\alpha
\sum_{i=1}^{j}\epsilon_{i-1}(1-\alpha)^{j-i}+\epsilon_{j}$\ .
\end{tabbing}
Model $\epsilon_{0},\epsilon_{1},\epsilon_{2},\ldots$ as independent random
variables with mean 0 and variance $\sigma^{2}$ and consider predicting the
likely effectiveness of the adjustment policies by finding $\displaystyle
\lim_{j\rightarrow\infty}$E$\mu_{j}^{2}$ . (E$\mu_{j}^{2}$ is a measure of how
close to proper adjustment the machine can be expected to be at time $j$.)
\begin{enumerate}
\item Compare choices of $\alpha$ supposing that $\gamma_{i}\doteq0$. (Here
the process is stable.)
\item Compare choices of $\alpha$ supposing that $\gamma_{i}\doteq d$, some
constant. (This is a case of deterministic linear machine drift, and might for
example be used to model tool wear over reasonably short periods.)
\item Compare choices of $\alpha$ supposing $\gamma_{1},\gamma_{2},\ldots$ is
a sequence of independent random variables with mean 0 and variance $\eta^{2}$
that is independent of the $\epsilon$ sequence. What $\alpha$ would you
recommend using if this (random walk) model seems appropriate and $\eta$ is
thought to be about one half of $\sigma$?
\end{enumerate}
\item Suppose that $\ldots,\epsilon(-1),\epsilon(0),\epsilon(1),\epsilon
(2),\ldots$ are iid normal random variables with mean 0 and variance
$\sigma^{2}$ and that
\[
Z(t)=\epsilon(t-1)+\epsilon(t)\ .
\]
(Note that under this model consecutive $Z$'s are correlated, but those
separated in time by at least 2 periods are independent.) As it turns out,
under this model
\[
E_{\mathcal{F}}[Z(t+1)|Z^{t}]=\frac{1}{t+2}\sum_{j=0}^{t}(-1)^{j}%
(t+1-j)Z(t-j)
\]
while
\[
E_{\mathcal{F}}[Z(s)|Z^{t}]=0\mbox{ \ for \ }s\geq t+2\ .
\]
If $T(t)\doteq0$ find optimal (MV) control strategies for two different
situations involving numerical process adjustments $a$.
\begin{enumerate}
\item First suppose that $A(a,s)=a$ for all $s\geq1$. (Note that in the limit
as $t\rightarrow\infty$, the MV controller is a ``proportional-only'' controller.)
\item Then suppose the impact of a control action is similar to that in (a),
except there is one period of delay, i.e.
\[
A(a,s)=\left\{
\begin{array}
[c]{ll}%
a & \mbox{for}~s\geq2\\
0 & \mbox{for}~s=1
\end{array}
\right.
\]
(You should decide that $a(t)\doteq0$ is optimal.)
\item For the situation without dead time in part (a), write out $Y(t)$ in
terms of $\epsilon$'s. What are the mean and variance of $Y(t)$? How do these
compare to the mean and variance of $Z(t)$? Would you say from this comparison
that the control algorithm is effective in directing the process to the target
$T(t)=0$?
\item Again for the situation of part (a), consider the matter of process
monitoring for a change from the model of this problem (that ought to be
greeted by a revision of the control algorithm or some other appropriate
intervention). Argue that after some start-up period it makes sense to
Shewhart chart the $Y(t)$'s, treating them as essentially iid Normal
$(0,\sigma^{2})$ if ``all is OK.'' (What is the correlation between $Y(t)$ and
$Y(t-1)$?)
\end{enumerate}
\item Consider the optimal stochastic control problem as described in \S3.1
with $Z(t)$ an iid normal $(0,1)$ sequence of random variables, control
actions $a\in(-\infty,\infty)$, $A(a,s)=a$ for all $s\geq1$ and $T(s)\doteq0$
for all $s$. What do you expect the optimal (minimum variance) control
strategy to turn out to be? Why?
\item (Vander Wiel) Consider a stochastic control problem with the following
elements. The (stochastic) model, $\mathcal{F}$, for the uncontrolled process,
$Z(t)$, will be
\[
Z(t)=\phi Z(t-1)+\epsilon(t)
\]
where the $\epsilon(t)$ are iid normal $(0,\sigma^{2}$) random variables and
$\phi$ is a (known) constant with absolute value less than 1. ($Z(t)$ is a
first order autoregressive process.) For this model,
\[
E_{\mathcal{F}}[Z(t+1)|\ldots,Z(-1),Z(0),Z(1),\ldots,Z(t)]=\phi Z(t)\ .
\]
For the function $A(a,s)$ describing the effect of a control action $a$ taken
$s$ periods previous, we will use $A(a,s)=a\rho^{s-1}$ for another known
constant $0<\rho<1$ (the effect of an adjustment made at a given period dies
out geometrically).
Carefully find $a(0)$, $a(1)$, and $a(2)$ in terms of a constant target value
$T$ and $Z(0)$, $Y(1)$ and $Y(2)$. Then argue that in general
\[
a(t)=T\left( 1+(\phi-\rho)\sum_{s=0}^{t-1}\phi^{s}\right) -\phi
Y(t)-(\phi-\rho)\sum_{s=1}^{t}\phi^{s}Y(t-s)\ .
\]
For large $t$, this prescription reduces to approximately what?
\item Consider the following stochastic control problem. The stochastic model,
$\mathcal{F}$, for the uncontrolled process $Z(t$), will be
\[
Z(t)=ct+\epsilon(t)
\]
where $c$ is a known constant and the $\epsilon(t)$'s are iid normal
$(0,\sigma^{2})$ random variables. (The $Z(t)$ process is a deterministic
linear trend seen through iid/white noise.) For the function $A(a,s)$
describing the effect of a control action $a$ taken $s$ periods previous, we
will use $A(a,s)=(1-2^{-s})a$ for all $s\geq1$. Suppose further that the
target value for the controlled process is $T=0$ and that control begins at
time 0 (after observing $Z(0)$).
\begin{enumerate}
\item Argue carefully that $\widehat{Z}(t)=$E$_{\mathcal{F}}[Z(t+1)|\ldots
,Z(-1),Z(0),Z(1),\ldots,Z(t)]=c(t+1)$.
\item Find the minimum variance control algorithm and justify your answer.
Does there seem to be a limiting form for $a(t)$?
\item According to the model here, the controlled process $Y(t)$ should have
what kind of behavior? (How would you describe the joint distribution of the
variables $Y(1),Y(2),\ldots,Y(t)$?) Suppose that you decide to set up Shewhart
type ``control limits'' to use in monitoring the $Y(t)$ sequence. What values
do you recommend for $LCL$ and $UCL$ in this situation? (These could be used
as an on-line check on the continuing validity of the assumptions that we have
made here about $\mathcal{F}$ and $A(a,s)$.)
\end{enumerate}
\item Consider the following optimal stochastic control problem. Suppose that
for some (known) appropriate constants $\alpha$ and $\beta$, the uncontrolled
process $Z(t)$ has the form
\[
Z(t)=\alpha Z(t-1)+\beta Z(t-2)+\epsilon(t)
\]
for the $\epsilon$'s iid with mean 0 and variance $\sigma^{2}$. (The
$\epsilon$'s are independent of all previous $Z$'s.) Suppose further that for
control actions $a\in(-\infty,\infty)$, $A(a,1)=0$ and $A(a,s)=a$ for all
$s\geq2$. (There is a one period delay, following which the full effect of a
control action is immediately felt.) For $s\geq1$, let $T(s)$ be an arbitrary
sequence of target values for the process.
\begin{enumerate}
\item Argue that
\[
E_{\mathcal{F}}[Z(t+1)|\ldots,Z(t-2),Z(t-1),Z(t)]=\alpha Z(t)+\beta Z(t-1)
\]
and that
\[
E_{\mathcal{F}}[Z(t+2)|\ldots,Z(t-2),Z(t-1),Z(t)]=(\alpha^{2}+\beta
)Z(t)+\alpha\beta Z(t-1)\ .
\]
\item Carefully find $a(0)$, $a(1)$ and $a(2)$ in terms of $Z(-1)$, $Z(0)$,
$Y(1)$, $Y(2)$ and the $T(s)$ sequence.
\item Finally, give a general form for the optimal control action to be taken
at time $t\geq3$ in terms of $\ldots,Z(-1),Z(0),Y(1),Y(2),\ldots,Y(t)$ and
$a(0),a(1),\ldots,a(t-1)$.
\end{enumerate}
\item Use the first order autoregressive model of Problem 3.8 and consider the
two functions $A(a,s)$ from Problem 3.6. Find the MV optimal control polices
(in terms of the $Y$'s) for the $T\doteq0$ situation. Are either of these PID
control algorithms?
\item A process has a Good state and a Bad state. Every morning a gremlin
tosses a coin with $P[$Heads$]=u>.5$ that governs how states evolve day to
day. Let%
\[
C_{i}=P[\text{change state on day }i\text{ from that on day }i-1]\text{ .}%
\]
Each $C_{i}$ is either $u$ or $1-u$.
\begin{enumerate}
\item Before the gremlin tosses the coin on day $i$, you get to choose whether%
\[
C_{i}=u\ (\text{so that Heads}\Longrightarrow\text{change})
\]
or%
\[
C_{i}=1-u\ \ (\text{so that Heads}\Longrightarrow\text{no change})
\]
(You either apply some counter-measures or let the process evolve naturally.)
Your object is to see that the process is in the Good state as often as
possible. What is your optimal strategy? (What should you do on any morning
$i$? This needs to depend upon the state of the process from day $i-1 $.)
\item If all is as described here, the evolution of the states under your
optimal strategy from (a) is easily described in probabilistic terms. Do so.
Then describe in rough/qualitative terms how you might monitor the sequence of
states to detect the possibility that the gremlin has somehow changed the
rules of process evolution on you.
\item Now suppose that there is a one-day time delay in your counter-measures.
Before the gremlin tosses his coin on day you get to choose only whether%
\[
C_{i+1}=u
\]
or%
\[
C_{i+1}=1-u.
\]
(You do not get to choose $C_{i}$ on the morning of day $i$.) Now what is your
optimal strategy? (What you should choose on the morning of day $i$ depends
upon what you already chose on the morning of day $(i-1)$ and whether the
process was in the Good state or in the Bad state on day $(i-1)$.) Show
appropriate calculations to support your answer.
\end{enumerate}
\end{enumerate}
\section{Process Characterization}
\renewcommand{\labelenumi}{4.\arabic{enumi}.}
\begin{enumerate}
\item The following are depth measurements taken on $n=8$ pump end caps. The
units are inches.
\[
4.9991,\ 4.9990,\ 4.9994,\ 4.9989,\ 4.9986,\ 4.9991,\ 4.9993,\ 4.9990
\]
The specifications for this depth measurement were $4.999 \pm.001$ inches.
\begin{enumerate}
\item As a means of checking whether a normal distribution assumption is
plausible for these depth measurements, make a normal plot of these data. (Use
regular graph paper and the method of Section 5.1.) Read an estimate of
$\sigma$ from this plot.\bigskip
\noindent Regardless of the appearance of your plot from (a), henceforth
suppose that one is willing to say that the process producing these lengths is
stable and that a normal distribution of depths is plausible.\bigskip
\item Give a point estimate and a 90\% two-sided confidence interval for the
``process capability,'' 6$\sigma$.
\item Give a point estimate and a 90\% two-sided confidence interval for the
process capability ratio $C_{p}$.
\item Give a point estimate and a 95\% lower confidence bound for the process
capability ratio $C_{pk}$.
\item Give a 95\% two-sided prediction interval for the next depth measurement
on a cap produced by this process.
\item Give a 99\% two-sided tolerance interval for 95\% of all depth
measurements of end caps produced by this process.
\end{enumerate}
\item Below are the logarithms of the amounts (in ppm by weight) of aluminum
found in 26 bihourly samples of recovered PET plastic at a Rutgers University
recycling plant taken from a \textit{JQT} paper by Susan Albin. (In this
context, aluminum is an impurity.)
5.67, 5.40, 4.83, 4.37, 4.98, 4.78, 5.50, 4.77, 5.20, 4.14, 3.40, 4.94,
4.62,\newline 4.62, 4.47, 5.21, 4.09, 5.25, 4.78, 6.24, 4.79, 5.15, 4.25,
3.40, 4.50, 4.74\,
\begin{enumerate}
\item Set up and plot charts for a sensible monitoring scheme for these
values. (They are in order if one reads left to right, top to bottom.)
Caution: Simply computing a mean and sample standard deviation for these
values and using ``limits'' for individuals of the form $\bar{x}\pm3s$ does
not produce a sensible scheme! Say clearly what you are doing and why.
\item Suppose that (on the basis of an analysis of the type in (a) or
otherwise) it is plausible to treat the 26 values above as a sample of size
$n=26$ from some physically stable normally distributed process. (Note
$\bar{x}\approx4.773$ and $s\approx.632$.)
\begin{enumerate}
\item Give a two-sided interval that you are ``90\% sure'' will contain the
next log aluminum content of a sample taken at this plant. Transform this to
an interval for the next raw aluminum content.
\item Give a two-sided interval that you are ``95\% sure'' will contain 90\%
of all log aluminum contents. Transform this interval to one for raw aluminum contents.
\end{enumerate}
\item Rather than adopting the ``stable process'' model alluded to in part (b)
suppose that it is only plausible to assume that the log purity process is
stable for periods of about 10 hours, but that mean purities can change
(randomly) at roughly ten hour intervals. Note that if one considers the first
25 values above to be 5 samples of size 5, some summary statistics are then
given below:%
\begin{tabular}
[c]{c@{\extracolsep{.3in}}ccccc}%
period & 1 & 2 & 3 & 4 & 5\\\hline
$\bar{x}$ & \multicolumn{1}{r}{5.050} & \multicolumn{1}{r}{4.878} &
\multicolumn{1}{r}{4.410} & \multicolumn{1}{r}{5.114} &
\multicolumn{1}{r}{4.418}\\
$s$ & \multicolumn{1}{r}{.506} & \multicolumn{1}{r}{.514} &
\multicolumn{1}{r}{.590} & \multicolumn{1}{r}{.784} & \multicolumn{1}{r}{.661}%
\\
$R$ & \multicolumn{1}{r}{1.30} & \multicolumn{1}{r}{1.36} &
\multicolumn{1}{r}{1.54} & \multicolumn{1}{r}{2.15} & \multicolumn{1}{r}{1.75}%
\end{tabular}
Based on the usual random effects model for this two-level
``nested/hierarchical'' situation, give reasonable point estimates of the
within-period standard deviation and the standard deviation governing period
to period changes in process mean.
\end{enumerate}
\item A standard (in engineering statistics) approximation due to Wallis (used
on page 468 of V\&J) says that often it is adequate to treat the variable
$\bar{x}\pm ks$ as if it were normal with mean $\mu\pm k\sigma$ and variance
\[
\sigma^{2}\left( \frac{1}{n}+\frac{k^{2}}{2n}\right) \ .
\]
Use the Wallis approximation to the distribution of $\bar{x}+ks$ and find $k$
such that for $x_{1},x_{2},\ldots,x_{26}$ iid normal random variables,
$\bar{x}+ks$ is a 99\% upper statistical tolerance bound for 95\% of the
population. (That is, your job is to choose $k$ so that $P[\Phi\left(
\frac{\bar{x}+ks-\mu}{\sigma}\right) \geq.95]\approx.99$.) How does your
approximate value compare to the exact one given in Table A.9b?
\item Consider the problem of pooling together samples of size $n$ from, say,
five different days to make inferences about all widgets produced during that
period. In particular, consider the problem of estimating the fraction of
widgets with diameters that are outside of engineering specifications. Suppose
that%
\begin{align*}
N_{i} & =\text{the number of widgets produced on day }i\\
p_{i} & =\text{the fraction of widgets produced on day }i\text{ that have}\\
& \text{diameters that are outside engineering specifications}%
\end{align*}
and%
\[
\hat{p}_{i}=\text{the fraction of the }i\text{th sample that have out-of-spec.
diameters .}%
\]
If the samples are simple random samples of the respective daily productions,
standard finite population sampling theory says that
\[
E\widehat{p}_{i}=p_{i}\quad\mbox{and}\quad\text{Var}\,\widehat{p}_{i}=\left(
\frac{N_{i}-1}{N_{i}-n}\right) \frac{p_{i}(1-p_{i})}{n}\ .
\]
Two possibly different estimators of the population fraction of diameters out
of engineering specifications,
\[
p=\frac{\displaystyle\sum_{i=1}^{5}N_{i}p_{i}}{\displaystyle\sum_{i=1}%
^{5}N_{i}}\ ,
\]
are
\[
\widehat{p}=\frac{\displaystyle\sum_{i=1}^{5}N_{i}\widehat{p}_{i}%
}{\displaystyle\sum_{i=1}^{5}N_{i}}\qquad\mbox{and}\qquad\bar{\widehat{p}%
}=\frac{1}{5}\sum_{i=1}^{5}\hat{p}_{i}\ .
\]
Show that E$\hat{p}=p$, but that E$\bar{\widehat{p}}$ need not be $p$ unless
all $N_{i}$ are the same. Assuming the independence of the $\hat{p}_{i}$, what
are the variances of $\hat{p}$ and $\bar{\widehat{p}}$ ? Note that neither of
these needs to equal
\[
\left( \frac{N-1}{N-5n}\right) \ \frac{p(1-p)}{5n}\ .
\]
\item Suppose that the hierarchical random effects model used in Section 5.5
of V\&J is a good description of how 500 widget diameters arise on each of 5
days in each of 10 weeks. (That is, suppose that the model is applicable with
$I=10$, $J=5$ and $K=500$.) Suppose further, that of interest is the grand
(sample) variance of all $10\times5\times500$ widget diameters. Use the
expected mean squares and write out an expression for the expected value of
this variance in terms of $\sigma_{\alpha}^{2}$, $\sigma_{\beta}^{2}$ and
$\sigma^{2}$.
Now suppose that one only observes 2 widget diameters each day for 5 weeks and
in fact obtains the ``data'' in the accompanying table. From these data obtain
point estimates of the variance components $\sigma_{\alpha}^{2}$,
$\sigma_{\beta}^{2}$ and $\sigma^{2}$. Use these and your formula from above
to predict the variance of all $10\times5\times500$ widget diameters. Then
make a similar prediction for the variance of the diameters from the next 10
weeks, supposing that the $\sigma_{\alpha}^{2}$ variance component could be eliminated.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}%
%BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\caption{Data for Problem 4.5}%
\begin{tabular}
[c]{l@{\extracolsep{.2in}}crrrrcr}
& Day & $k=1$ & $k=2$ & $\bar{y}_{ij}$ & $s_{ij}^{2}$ & $\bar{y}_{i.}$ &
$s_{\mbox{B}i}^{2}$\\\hline
& M & $15.5$ & $14.9$ & $15.2$ & .$18$ & \multicolumn{1}{r}{} & \\
& T & $15.2$ & $15.2$ & $15.2$ & $0$ & \multicolumn{1}{r}{} & \\
Week 1 & W & $14.2$ & $14.2$ & $14.2$ & $0$ & \multicolumn{1}{r}{$15.0$} &
.$605$\\
& R & $14.3$ & $14.3$ & $14.3$ & $0$ & \multicolumn{1}{r}{} & \\
& F & $15.8$ & $16.4$ & $16.1$ & .$18$ & \multicolumn{1}{r}{} & \\\hline
& M & $6.2$ & $7.0$ & $6.6$ & .$32$ & \multicolumn{1}{r}{} & \\
& T & $7.2$ & $8.4$ & $7.8$ & .$72$ & \multicolumn{1}{r}{} & \\
Week 2 & W & $6.6$ & $7.8$ & $7.2$ & .$72$ & \multicolumn{1}{r}{$7.0$} &
.$275$\\
& R & $6.2$ & $7.6$ & $6.9$ & .$98$ & \multicolumn{1}{r}{} & \\
& F & $5.6$ & $7.4$ & $6.5$ & $1.62$ & \multicolumn{1}{r}{} & \\\hline
& M & $15.4$ & $14.4$ & $14.9$ & .$50$ & \multicolumn{1}{r}{} & \\
& T & $13.9$ & $13.3$ & $13.6$ & .$18$ & \multicolumn{1}{r}{} & \\
Week 3 & W & $13.4$ & $14.8$ & $14.1$ & .$98$ & \multicolumn{1}{r}{$14.0$} &
.$370$\\
& R & $12.5$ & $14.1$ & $13.3$ & $1.28$ & \multicolumn{1}{r}{} & \\
& F & $13.2$ & $15.0$ & $14.1$ & $1.62$ & \multicolumn{1}{r}{} & \\\hline
& M & $10.9$ & $11.3$ & $11.1$ & .$08$ & \multicolumn{1}{r}{} & \\
& T & $12.5$ & $12.7$ & $12.6$ & .$02$ & \multicolumn{1}{r}{} & \\
Week 4 & W & $12.3$ & $11.7$ & $12.0$ & .$18$ & \multicolumn{1}{r}{$12.0$} &
.$515$\\
& R & $11.0$ & $12.0$ & $11.5$ & .$50$ & \multicolumn{1}{r}{} & \\
& F & $12.3$ & $13.3$ & $12.8$ & .$50$ & \multicolumn{1}{r}{} & \\\hline
& M & $7.5$ & $6.7$ & $7.1$ & .$32$ & \multicolumn{1}{r}{} & \\
& T & $6.7$ & $7.3$ & $7.0$ & .$18$ & \multicolumn{1}{r}{} & \\
Week 5 & W & $7.2$ & $6.0$ & $6.6$ & .$72$ & \multicolumn{1}{r}{$7.0$} &
.$155$\\
& R & $7.6$ & $7.6$ & $7.6$ & $0$ & \multicolumn{1}{r}{} & \\
& F & $6.3$ & $7.1$ & $6.7$ & .$32$ & \multicolumn{1}{r}{} & \\\hline
\end{tabular}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\item Consider a situation in which a lot of 50,000 widgets has been packed
into 100 crates, each of which contains 500 widgets. Suppose that unbeknownst
to us, the lot consists of 25,000 widgets with diameter 5 and 25,000 widgets
with diameter 7. We wish to estimate the variance of the widget diameters in
the lot (which is 50,000/49,999). To do so, we decide to select 4 crates at
random, and from each of those, select 5 widgets to measure.
\begin{enumerate}
\item One (not so smart) way to try and estimate the population variance is to
simply compute the sample variance of the 20 widget diameters we end up with.
Find the expected value of this estimator under two different scenarios: 1st
where each of the 100 crates contains 250 widgets of diameter 5 and 250
widgets with diameter 7, and then 2nd where each crate contains widgets of
only one diameter. What, in general terms, does this suggest about when the
naive sample variance will produce decent estimates of the population variance?
\item Give the formula for an estimator of the population variance that is
unbiased (i.e. has expected value equal to the population variance).
\end{enumerate}
\item Consider the data of Table 5.8 in V\&J and the use of the hierarchical
normal random effects model to describe their generation.
\begin{enumerate}
\item Find point estimates of the parameters $\sigma_{\alpha}^{2}$ and
$\sigma^{2} $ based first on ranges and then on ANOVA mean squares.
\item Find a standard error for your ANOVA-based estimator of $\sigma_{\alpha
}^{2}$ from (a).
\item Use the material in \S1.5 and make a 90\% two sided confidence interval
for $\sigma_{\alpha}^{2}$ .
\end{enumerate}
\item All of the variance component estimation material presented in the text
is based on \textit{balanced data} assumptions. As it turns out, it is quite
possible to do point estimation (based on sample variances) from even
unbalanced data. A basic fact that enables this is the following: If
$X_{1},X_{2},\ldots,X_{n}$ are uncorrelated random variables, each with the
same mean, then
\[
Es^{2}=\frac{1}{n}\sum_{i=1}^{n}\text{Var}\,X_{i}\ .
\]
(Note that the usual fact that for iid $X_{i}$, E$s^{2}=\sigma^{2}$, is a
special case of this basic fact.)
Consider the (hierarchical) random effects model used in Section 5.5 of the
text. In notation similar to that in Section 5.5 (but not assuming that data
are balanced), let%
\begin{align*}
\bar{y}_{ij}^{\ast} & =\text{the sample mean of data values at level
}i\text{ of A and level }j\text{ of B within A}\\
s_{ij}^{\ast2} & =\text{the sample variance of the data values at level
}i\text{ of A and level }j\text{ of B within A}\\
\bar{y}_{i}^{\ast} & =\text{the sample mean of the values }\bar{y}%
_{ij}^{\ast}\text{ at level }i\text{ of A}\\
s_{\text{B}i}^{\ast2} & =\text{the sample variance of the values }\bar
{y}_{ij}^{\ast}\text{ at level }i\text{ of A}%
\end{align*}
and%
\[
s_{\text{A}}^{\ast2}=\text{the sample variance of the values }\bar{y}%
_{i}^{\ast}%
\]
Suppose that instead of being furnished with balanced data, one has a data set
where 1) there are $I=2$ levels of A, 2) level 1 of A has $J_{1}=2$ levels of
B while level $2$ of A has $J_{2}=3$ levels of B, and 3) level 1 of B within
level 1 of A has $n_{11}=2$ levels of C, level 2 of B within level 1 of A has
$n_{12}=4$ levels of C, levels 1 and 2 of B within level 2 of A have
$n_{21}=n_{22}=2$ levels of C and level 3 of B within level 2 of A has
$n_{23}=3$ levels of C.\bigskip
Evaluate the following: \ E$s_{\mathrm{pooled}}^{2}$, E$\left( \frac{1}%
{5}\sum_{i,j}s_{ij}^{\ast2}\right) $, E$s_{\mbox{B1}}^{\ast2}$,
E$s_{\mbox{B2}}^{\ast2}$, E$\frac{1}{2}\left( s_{\mbox{B1}}^{\ast2}%
+s_{\mbox{B}2}^{\ast2}\right) $, E$s_{\mbox{A}}^{\ast2}$. Then find linear
combinations of $s_{\mathrm{pooled}}^{2}$, $\frac{1}{2}\left( s_{\mbox{B1}%
}^{\ast2}+s_{\mbox{B}2}^{\ast2}\right) $ and $s_{\mbox{A}}^{\ast2}$ that
could sensibly used to estimate $\sigma_{\beta}^{2}$ and $\sigma_{\alpha}^{2}$.
\item Suppose that on $I=2$ different days (A), $J=4$ different heats (B) of
cast iron are studied, with $K=3$ tests (C) being made on each. Suppose
further that the resulting percent carbon measurements produce $SSA=.0355$,
$SSB(A)=.0081$ and $SSC(B(A))=SSE=.4088$.
\begin{enumerate}
\item If one completely ignores the hierarchical structure of the data set,
what ``sample variance'' is produced? Does this quantity estimate the variance
that would be produced if on many different days a single heat was selected
and a single test made? Explain carefully! (Find the expected value of the
grand sample variance under the hierarchical random effects model and compare
it to this variance of single measurements made on a single day.)
\item Give point estimates of the variance components $\sigma_{\alpha}%
^{2},\ \sigma_{\beta}^{2}$ and $\sigma^{2}$.
\item Your estimate of $\sigma_{\alpha}^{2}$ should involve a linear
combination of mean squares. Give the variance of that linear combination in
terms of the model parameters and $I,J$ and $K$. Use that expression and
propose a sensible estimated standard deviation (a standard error) for this
linear combination. (See \S1.4 and Problem 1.9.)
\end{enumerate}
\item Consider the ``one variable/second order'' version of the ``propagation
of error'' ideas discussed in Section 5.4 of the text. That is, for a random
variable $X$ with mean $\mu$ and standard deviation $\sigma^{2}$ and ``nice''
function $g$, let $Y=g(X)$ and consider approximating E$Y$ and Var\thinspace
$Y$. A second order approximation of $g$ made at the point $x=\mu$ is
\[
g(x)\approx g(\mu)+g^{\prime}(\mu)(x-\mu)+\frac{1}{2}g^{\prime\prime}%
(\mu)(x-\mu)^{2}\ .
\]
(Note that the approximating quadratic function has the same value, derivative
and second derivative as $g$ for the value $x=\mu$.) Let $\kappa_{3}=$%
E$(X-\mu)^{3}$ and $\kappa_{4}=$E$(X-\mu)^{4}$. Based on the above preamble,
carefully argue for the appropriateness of the following approximations:
\[
EY\approx g(\mu)+\frac{1}{2}g^{\prime\prime}(\mu)\sigma^{2}%
\]
and
\[
\text{Var}\,Y\approx(g^{\prime}(\mu))^{2}\sigma^{2}+g^{\prime}(\mu
)g^{\prime\prime}(\mu)\kappa_{3}+\frac{1}{4}(g^{\prime\prime}(\mu))^{2}%
(\kappa_{4}-\sigma^{4})\ .
\]
\item (Vander Wiel) \ A certain RCL network involving 2 resistors, 2
capacitors and a single inductor has a dynamic response characterized by the
``transfer function''
\[
\frac{V_{\mathrm{out}}}{V_{\mathrm{in}}}\,(s)=\frac{s^{2}+\zeta_{1}\omega
_{1}s+\omega_{1}^{2}}{s^{2}+\zeta_{2}\omega_{2}s+\omega_{2}^{2}}\ ,
\]
where
\begin{align*}
\omega_{1} & =\left( C_{2}L\right) ^{-1\mbox{/2}}\ ,\\
\omega_{2} & =\left( \frac{C_{1}+C_{2}}{LC_{1}C_{2}}\right) ^{1\mbox{/2}%
}\ ,\\
\zeta_{1} & =\frac{R_{2}}{2L\omega_{1}}\ ,\\
&
\end{align*}
and
\[
\zeta_{2}=\frac{R_{1}+R_{2}}{2L\omega_{2}}\ .\qquad
\]
$R_{1}$ and $R_{2}$ are the resistances involved in ohms, $C_{1}$ and $C_{2}$
are the capacitances in Farads, and $L$ is the value of the inductance in
Henries. Standard circuit theory says that $\omega_{1}$ and $\omega_{2}$ are
the ``natural frequencies'' of this network,
\[
\omega_{1}^{2}/\omega_{2}^{2}=C_{1}/(C_{1}+C_{2})
\]
is the ``DC gain,'' and $\zeta_{1}$ and $\zeta_{2}$ determine whether the
zeros and poles are real or complex. Suppose that the circuit in question is
to be massed produced using components with the following characteristics:
\[%
\begin{array}
[c]{lccccl}%
EC_{1}=\frac{1}{399}F & & & & & \text{Var}\,C_{1}=\left( \frac{1}%
{3990}\right) ^{2}\\
& & & & & \\[-0.1in]%
ER_{1}=38\Omega & & & & & \text{Var}\,R_{1}=(3.8)^{2}\\
& & & & & \\[-0.1in]%
EC_{2}=\frac{1}{2}F & & & & & \text{Var}\,C_{2}=\left( \frac{1}%
{20}\right) ^{2}\\
& & & & & \\[-0.1in]%
ER_{2}=2\Omega & & & & & \text{Var}\,R_{2}=(.2)^{2}\\
& & & & & \\[-0.1in]%
EL=1H & & & & & \text{Var}\,L=(.1)^{2}%
\end{array}
\]
Treat $C_{1}$, $R_{2}$, $C_{2}$, $R_{2}$ and $L_{2}$ as independent random
variables and use the propagation of error approximations to do the following:
\begin{enumerate}
\item Approximate the mean and standard deviation of the DC gains of the
manufactured circuits.
\item Approximate the mean and standard deviation of the natural frequency
$\omega_{2}$.\bigskip
Now suppose that you are designing such an RCL circuit. To simplify things,
use the capacitors and the inductor described above. You may choose the
resistors, but their quality will be such that
\[
\text{Var}\,R_{1}=(ER_{1}/10)^{2}\qquad\mbox{and}\qquad\text{Var}%
\,R_{2}=(ER_{2}/10)^{2}\ .
\]
Your design goals are that $\zeta_{2}$ should be (approximately) .5, and
subject to this constraint, Var\thinspace$\zeta_{2}$ be minimum.\bigskip
\item What values of E$R_{1}$ and E$R_{2}$ satisfy (approximately) the design
goals, and what is the resulting (approximate) standard deviation of
$\zeta_{2}$?
\end{enumerate}
(Hint for part (c): The first design goal allows one to write E$R_{2}$ as a
function of E$R_{1}$. To satisfy the second design goal, use the propagation
of error idea to write the (approximate) variance of $\zeta_{2}$ as a function
of E$R_{1}$ only. By the way, the first design goal allows you to conclude
that none of the partial derivatives needed in the propagation of error work
depend on your choice of E$R_{1}$.)
\item Manufacturers wish to produce autos with attractive ``fit and finish,''
part of which consists of uniform (and small) gaps between adjacent pieces of
sheet metal (like, e.g., doors and their corresponding frames). The
accompanying figure is an idealized schematic of a situation of this kind,
where we (at least temporarily) assume that edges of both a door and its frame
are linear. \ (The coordinate system on this diagram is pictured as if its
axes are ``vertical'' and ``horizontal.'' \ But the line on the body need not
be an exactly ``vertical'' line, and whatever this line's intended orientation
relative to the ground, it is used to establish the coordinate system as
indicated on the diagram.)%
%TCIMACRO{\FRAME{ftbpFU}{3.2733in}{3.2664in}{0pt}{\Qcb{Figure for Problem
%4.12}}{}{nfigprob.eps}{\special{ language "Scientific Word"; type "GRAPHIC";
%maintain-aspect-ratio TRUE; display "USEDEF"; valid_file "F";
%width 3.2733in; height 3.2664in; depth 0pt; original-width 8.1128in;
%original-height 8.0955in; cropleft "0"; croptop "1"; cropright "1";
%cropbottom "0";
%filename '../CLASS/531/Notes/Nfigprob.eps';file-properties "XNPEU";}} }%
%BeginExpansion
\begin{figure}
[ptb]
\begin{center}
\includegraphics[
height=3.2664in,
width=3.2733in
]%
{../CLASS/531/Notes/Nfigprob.eps}%
\caption{Figure for Problem 4.12}%
\end{center}
\end{figure}
%EndExpansion
On the figure, we are concerned with gaps $g_{1}$ and $g_{2}$. The first is at
the level of the top hinge of the door and the second is $d$ units ``below''
that level in the body coordinate system ($d$ units ``down'' the door frame
line from the initial measurement). People manufacturing the car body are
responsible for the dimension $w$. People stamping the doors are responsible
for the angles $\theta_{1}$ and $\theta_{2}$ and the dimension $y$. People
welding the top door hinge to the door are responsible for the dimension $x$.
And people hanging the door on the car are responsible for the angle $\phi$.
The quantities $x,y,w,\phi,\theta_{1}$ and $\theta_{2}$ are measurable and can
be used in manufacturing to verify that the various folks are ``doing their
jobs.'' A door design engineer has to set nominal values for and produce
tolerances for variation in these quantities. This problem is concerned with
how the propagation of errors method might help in this tolerancing
enterprise, through an analysis of how variation in $x,y,w,\phi,\theta_{1}$
and $\theta_{2}$ propagates to $g_{1},g_{2}$ and $g_{1}-g_{2}$.\bigskip
If I have correctly done my geometry/trigonometry, the following relationships
hold for labeled points on the diagram:
\[
\mbox{\boldmath$p$}=(-x\sin\phi,x\cos\phi)
\]%
\[
\mbox{\boldmath$q$}=\mbox{\boldmath$p$}+(y\cos\left( \phi+\left( \theta
_{1}-\frac{\pi}{2}\right) \right) ,y\sin\left( \phi+\left( \theta
_{1}-\frac{\pi}{2}\right) \right)
\]%
\[
\mbox{\boldmath$s$}=(q_{1}+q_{2}\tan\left( \phi+\theta_{1}+\theta_{2}%
-\pi\right) ,0)
\]
and
\[
\mbox{\boldmath$u$}=(q_{1}+(q_{2}+d)\tan\left( \phi+\theta_{1}+\theta_{2}%
-\pi\right) ,-d)\ .
\]
Then for the idealized problem here (with perfectly linear edges) we have
\[
g_{1}=w-s_{1}%
\]
and
\[
g_{2}=w-u_{1}\ .
\]
Actually, in an attempt to allow for the notion of ``form error'' in the
ideally linear edges, one might propose that at a given distance ``below'' the
origin of the body coordinate system the realized edge of a real geometry is
its nominal position plus a ``form error.'' Then instead of dealing with
$g_{1}$ and $g_{2}$, one might consider the gaps
\[
g_{1}^{\ast}=g_{1}+\epsilon_{1}-\epsilon_{2}%
\]
and
\[
g_{2}^{\ast}=g_{2}+\epsilon_{3}-\epsilon_{4}\ ,
\]
for body form errors $\epsilon_{1}$ and $\epsilon_{3}$ and door form errors
$\epsilon_{2}$ and $\epsilon_{4}$. (The interpretation of additive ``form
errors'' around the line of the body door frame is perhaps fairly clear, since
``the error'' at a given level is measured perpendicular to the ``body line''
and is thus well-defined for a given realized body geometry. The
interpretation of an additive error on the right side ``door line'' is not so
clear, since in general one will not be measuring perpendicular to the line of
the door, or even at any consistent angle with it. So for a realized geometry,
what ``form error'' to associate with a given point on the ideal line or
exactly how to model it is not completely clear. We'll ignore this logical
problem and proceed using the models above.)\bigskip
We'll use $d=40$ cm, and below are two possible sets of nominal values for the
parameters of the door assembly:%
\begin{tabular}
[c]{l}%
Design A\\\hline
$x=20$ cm\\
$y=90$ cm\\
$w=90.4$ cm\\
$\phi=0$\\
$\theta_{1}=\frac{\pi}{2}$\\
$\theta_{2}=\frac{\pi}{2}$\\\hline
\end{tabular}
\ \
\begin{tabular}
[c]{l}%
Design B\\\hline
$x=20$ cm\\
$y=90$ cm\\
$w=(90\cos\frac{\pi}{10}+.4)$ cm\\
$\phi=\frac{\pi}{10}$\\
$\theta_{1}=\frac{\pi}{2}$\\
$\theta_{2}=\frac{4\pi}{10}$\\\hline
\end{tabular}
Partial derivatives of $g_{1}$ and $g_{2}$ (evaluated at the design nominal
values of $x,y,w,\phi,\theta_{1}$ and $\theta_{2}$) are:%
\begin{tabular}
[c]{l}%
Design A\\\hline
$\frac{\partial g_{1}}{\partial x}=0$\\
$\frac{\partial g_{1}}{\partial y}=-1$\\
$\frac{\partial g_{1}}{\partial w}=1$\\
$\frac{\partial g_{1}}{\partial\phi}=0$\\
$\frac{\partial g_{1}}{\partial\theta_{1}}=-20$\\
$\frac{\partial g_{1}}{\partial\theta_{2}}=-20$\\
\\
$\frac{\partial g_{2}}{\partial x}=0$\\
$\frac{\partial g_{2}}{\partial y}=-1$\\
$\frac{\partial g_{2}}{\partial w}=1$\\
$\frac{\partial g_{2}}{\partial\phi}=-40$\\
$\frac{\partial g_{2}}{\partial\theta_{1}}=-60$\\
$\frac{\partial g_{2}}{\partial\theta_{2}}=-60$\\\hline
\end{tabular}
\ \ \
\begin{tabular}
[c]{l}%
Design B\\\hline
$\frac{\partial g_{1}}{\partial x}=.309$\\
$\frac{\partial g_{1}}{\partial y}=-.951$\\
$\frac{\partial g_{1}}{\partial w}=1$\\
$\frac{\partial g_{1}}{\partial\phi}=0$\\
$\frac{\partial g_{1}}{\partial\theta_{1}}=-19.021$\\
$\frac{\partial g_{1}}{\partial\theta_{2}}=-46.833$\\
\\
$\frac{\partial g_{2}}{\partial x}=.309$\\
$\frac{\partial g_{2}}{\partial y}=-.951$\\
$\frac{\partial g_{2}}{\partial w}=1$\\
$\frac{\partial g_{2}}{\partial\phi}=-40$\\
$\frac{\partial g_{2}}{\partial\theta_{1}}=-59.02$\\
$\frac{\partial g_{2}}{\partial\theta_{2}}=-86.833$\\\hline
\end{tabular}
\begin{enumerate}
\item Suppose that a door engineer must eventually produce tolerances for
$x,y,w,\phi,\theta_{1}$ and $\theta_{2}$ that are consistent with ``$\pm.1 $
cm'' tolerances on $g_{1}$ and $g_{2}$. If we interpret ``$\pm.1$ cm''
tolerances to mean $\sigma_{g_{1}}$ and $\sigma_{g_{2}}$ are no more than
.$033$ cm, consider the set of ``sigmas''%
\begin{tabular}
[c]{l}%
$\sigma_{x}=.01$ cm\\
$\sigma_{y}=.01$ cm\\
$\sigma_{w}=.01$ cm\\
$\sigma_{\phi}=.001$ rad\\
$\sigma_{\theta_{1}}=.001$ rad\\
$\sigma_{\theta_{2}}=.001$ rad
\end{tabular}
First for Design A and then for Design B, investigate whether this set of
``sigmas'' is consistent with the necessary final tolerances on $g_{1}$and
$g_{2}$ in two different ways. Make propagation of error approximations to
$\sigma_{g_{1}}$ and $\sigma_{g_{2}}$. Then simulate 100 values of both
$g_{1}$and $g_{2}$ using independent normal random variables $x,y,w,\phi
,\theta_{1}$ and $\theta_{2}$ with means equal to the design nominals and
these standard deviations. (Compute the sample standard deviations of the
simulated values and compare to the .033 cm target.)
\item One of the assumptions standing behind the propagation of error
approximations is the independence of the input random variables. Briefly
discuss why independence of the variables $\theta_{1}$ and $\theta_{2}$ may
not be such a great model assumption in this problem.
\item Notice that for Design A the propagation of error formula predicts that
variation on the dimension $x$ will not much affect the gaps presently of
interest, $g_{1}$ and $g_{2}$, while the situation is different for Design B.
Argue, based on the nominal geometries, that this makes perfectly good sense.
For Design A, one might say that the gaps $g_{1}$ and $g_{2}$ are ``robust''
to variation in $x$. For this design, do you think that the entire ``fit'' of
the door to the body of the car is going to be ``robust to variation in $x$''? Explain.
(Note, by the way, that the fact that $\frac{\partial g_{1}}{\partial\phi}=0$
for Design A also makes this design look completely ``robust to variation in
$\phi$'' in terms of the gap $g_{1}$, at least by standards of the propagation
of error formula. But the situation for this variable is somewhat different
than for $x$. This partial derivative is equal to 0 because for $y,w,\phi
,\theta_{1}$ and $\theta_{2}$ at their nominal values, $g_{1}$ considered as a
function of $\phi$ alone has a local minimum at $\phi=0$. This is different
from $g_{1}$ being constant in $\phi$. A more refined ``second order''
propagation of error analysis of this problem, that essentially begins from a
quadratic approximation to $g_{1}$ instead of a linear one, would distinguish
between these two possibilities. But the ``first order'' analysis done on the
basis of formula (5.27) of the text is often helpful and adequate for
practical purposes.)
\item What does the propagation of error formula predict for variation in the
difference $g_{1}-g_{2}$, first for Design A, and then for Design B?
\item Suppose that one desires to take into account the possibility of ``form
errors'' affecting the gaps, and thus considers analysis of $g_{1}^{\ast}$ and
$g_{2}^{\ast}$ instead of $g_{1}$ and $g_{2}$. If standard deviations for the
variables $\epsilon$ are all .001 cm, what does the propagation of error
analysis predict for variability in $g_{1}^{\ast}$ and $g_{2}^{\ast}$ for
Design A?
\end{enumerate}
\item The electrical resistivity, $\rho$, of a wire is a property of the
material involved and the temperature at which it is measured. At a given
temperature, if a cylindrical piece of wire of length $L$ and (constant)
cross-sectional area $A$ has resistance $R$, then the material's resistivity
is calculated as
\[
\rho=\frac{RA}{L}\ .
\]
In a lab exercise intended to determine the \ resistivity of copper at
20$^{\circ}$C, students measure the length, diameter and resistance of a wire
assumed to have circular cross-sections. Suppose the length is approximately 1
meter, the diameter is approximately $2.0\times10^{-3}$ meters and the
resistance is approximately $.54\times10^{-2}\Omega$. Suppose further that the
precisions of the measuring equipment used in the lab are such that standard
deviations $\sigma_{L}=10^{-3}$ meter, $\sigma_{D}=10^{-4}$ meter and
$\sigma_{R}=10^{-4}\Omega$ are appropriate.
\begin{enumerate}
\item Find an approximate standard deviation that might be used to describe
the precision associated with an experimentally derived value of $\rho$.
\item Imprecision in \textit{which} of the measurements appears to be the
biggest contributor to imprecision in experimentally determined values of
$\rho$? (Explain.)
\item One should probably expect the approximate standard deviation derived
here to \textit{under-predict} the kind of variation that would actually be
observed in such lab exercises over a period of years. Explain why this is so.
\end{enumerate}
\item A bullet is fired horizontally into a block (of much larger mass)
suspended by a long cord, and the impact causes the block and embedded bullet
to swing upward a distance $d$ measured vertically from the block's lowest
position. The laws of mechanics can be invoked to argue that if $d$ is
measured in feet, and before testing the block weighs $w_{1}$, while the block
and embedded bullet together weigh $w_{2}$ (in the same units), then the
velocity (in fps) of the bullet just before impact with the block is
approximately
\[
v=\left( \frac{w_{2}}{w_{2}-w_{1}}\right) \,\sqrt{64.4\cdot d}\ .
\]
Suppose that the bullet involved weighs about .05 lb, the block involved
weighs about 10.00 lb and that both $w_{1}$ and $w_{2}$ can be determined with
a standard deviation of about .005 lb. Suppose further that the distance $d$
is about .50 ft, and can be determined with a standard deviation of .03 ft.
\begin{enumerate}
\item Compute an approximate standard deviation describing the uncertainty in
an experimentally derived value of $v$.
\item Would you say that the uncertainties in the weights contribute more to
the uncertainty in $v$ than the uncertainty in the distance? Explain.
\item Say why one should probably think of calculations like those in part (a)
as only providing some kind of approximate \textit{lower bound} on the
uncertainty that should be associated with the bullet's velocity.
\end{enumerate}
\item On page 243 of V\&J there is an ANOVA table for a balanced hierarchical
data set. Use it in what follows.
\begin{enumerate}
\item Find standard errors for the usual ANOVA estimates of $\sigma_{\alpha
}^{2}$ and $\sigma^{2}$ (the ``casting'' and ``analysis'' variance components).
\item If you were to later make 100 castings, cut 4 specimens from each of
these and make a single lab analysis on each specimen, give a (numerical)
prediction of the overall sample variance of these future 400 measurements
(based on the hierarchical random effects model and the ANOVA estimates of
$\sigma_{\alpha}^{2}$, $\sigma_{\beta}^{2}$ and $\sigma^{2}$).
\end{enumerate}
\end{enumerate}
\section{Sampling Inspection}
\noindent\textbf{{\Large Methods}}
\renewcommand{\labelenumi}{5.\arabic{enumi}.}
\begin{enumerate}
\item Consider attributes single sampling.
\begin{enumerate}
\item Make type A OC curves for $N=20$, $n=5$ and $c=0$ and 1, for both
percent defective and mean defects per unit situations.
\item Make type B OC curves for $n=5$, $c=0$, 1 and 2 for both percent
defective and mean defects per unit situations.
\item Use the imperfect inspection analysis presented in \S5.2 and find OC
bands for the percent defective cases above with $c=1$ under the assumption
that $w_{\mbox{D}}\leq.1$ and $w_{\mbox{G}}\leq.1$.
\end{enumerate}
\item Consider single sampling for percent defective.
\begin{enumerate}
\item Make approximate OC curves for $n=100$, $c=1$; $n=200$, $c=2$; and
$n=300$, $c=3$.
\item Make AOQ and ATI curves for a rectifying inspection scheme using a plan
with $n=200$ and $c=2$ for lots of size $N=10,000$. What is the AOQL?
\end{enumerate}
\item Find attributes single sampling plans (i.e. find $n$ and $c$) having approximately
\begin{enumerate}
\item $Pa=.95$ if $p=.01$ and $Pa=.10$ if $p=.03$.
\item $Pa=.95$ if $p=10^{-6}$ and $Pa=.10$ if $p=3\times10^{-6}$.
\end{enumerate}
\item Consider a (truncated sequential) attributes acceptance sampling plan,
that for
\[
X_{n}=\mbox{the number of defective items found through the $n$th item
inspected}%
\]
rejects the lot if it ever happens that $X_{n}\geq1.5+.5n$, accepts the lot if
it ever happens that $X_{n}\leq-1.5+.5n$, and further never samples more than
11 items. We will suppose that if sampling were extended to $n=11$, we would
accept for $X_{11}=4$ or 5 and reject for $X_{11}=6$ or 7 and thus note that
sampling can be curtailed at $n=10$ if $X_{10}=4$ or 6.
\begin{enumerate}
\item Find expressions for the OC and ASN for this plan.
\item Find formulas for the AOQ and ATI of this plan, if it is used in a
rectifying inspection scheme for lots of size $N=100$.
\end{enumerate}
\item Consider single sampling based on a normally distributed variable.
\begin{enumerate}
\item Find a single limit variables sampling plan with $L=1.000$,
$\sigma=.015$, $p_{1}=.03$, $Pa_{1}=.95$, $p_{2}=.10$ and $Pa_{2}=.10$. Sketch
the OC curve of this plan. How does $n$ compare with what would be required
for an attributes sampling plan with a comparable OC curve?
\item Find a double limits variables sampling plan with $L=.49$, $U=.51$
$\sigma=.004$, $p_{1}=.03$, $Pa_{1}=.95$, $p_{2}=.10$ and $Pa_{2}=.10$. Sketch
the OC curve of this plan. How does $n$ compare with what would be required
for an attributes sampling plan with a comparable OC curve?
\item Use the Wallis approximation and find a single limit variables sampling
plan for $L=1.000$, $p_{1}=.03$, $Pa_{1}=.95$, $p_{2}=.10$ and $Pa_{2}=.10$.
Sketch an approximate OC curve for this plan.
\end{enumerate}
\item In contrast to what you found in Problem 5.3(b), make use of the fact
that the upper $10^{-6}$ point of the standard normal distribution is about
4.753, while the upper $3\times10^{-6}$ point is about 4.526 and find the $n $
required for a known $\sigma$ single limit variables acceptance sampling plan
to have $Pa=.95$ if $p=10^{-6}$ and $Pa=.10$ if $p=3\times10^{-6}$. What is
the Achilles heel (fatal weakness) of these calculations?
\item Consider the CSP-1 plan with $i=100$ and $f=.02$. Make AFI and AOQ plots
for this plan and find the AOQL for both cases where defectives are rectified
and where they are culled.
\item Consider the classical problem of acceptance sampling plan design.
Suppose that one wants plans whose OC ``drops'' near $p=.03$ (wants
$Pa\approx.5$ for $p=.03$) also wants $p=.04$ to have $Pa\approx.05$.
\begin{enumerate}
\item Design an attributes single sampling plan approximately meeting the
above criteria.\bigskip
\noindent Suppose that in fact ``nonconforming'' is defined in terms of a
measured variable, $X$, being less than a lower specification $L=13$, and that
it is sensible to use a normal model for $X$.\bigskip
\item Design a ``known $\sigma$'' variables plan for the above criteria if
$\sigma=1$.
\item Design an ``unknown $\sigma$'' variables plan for the above criteria.
\end{enumerate}
\textbf{{\Large Theory}}
\setcounter{enumi}{8}
\item Consider variables acceptance sampling based on exponentially
distributed observations, supposing that there is a single lower limit
$L=.2107$.
\begin{enumerate}
\item Find means corresponding to fractions defective $p=.10$ and $p=.19$.
\item Use the Central Limit Theorem to find a number $k$ and sample size $n$
so that an acceptance sampling plan that rejects a lot if $\bar{x}k_{2}$. Adopt perspective B, i.e.
that any given incoming lot was produced under some set of stable conditions,
characterized here by probabilities $p_{\mathrm{G}}$, $p_{\mathrm{M}}$ and
$p_{\mathrm{D}}$ that any given item in that lot is respectively G, M or D.
\begin{enumerate}
\item Argue carefully that the ``All or None'' criterion is in force here and
identify the condition on the $p$'s under which ``All'' is optimal and the
condition under which ``None'' is optimal.
\item If $p_{\mathrm{G}}$, $p_{\mathrm{M}}$ and $p_{\mathrm{D}}$ are not
known, but rather are described by a joint probability distribution, $n$ other
than $N$ or 0 can turn out to be optimal. A particularly convenient
distribution to use in describing the $p$'s is the Dirichlet distribution (it
is the multivariate generalization of the Beta distribution for variables that
must add up to 1). For a Dirichlet distribution with parameters $\alpha
_{\mathrm{G}}>0$, $\alpha_{\mathrm{M}}>0$ and $\alpha_{\mathrm{D}}>0$, it
turns out that if $X_{\mathrm{G}}$, $X_{\mathrm{M}}$ and $X_{\mathrm{D}}$ are
the counts of G's, M's and D's in a sample of $n$ items, then
\[
E[p_{\mathrm{G}}|X_{\mathrm{G}},X_{\mathrm{M}}, X_{\mathrm{D}}]=\frac
{\alpha_{\mathrm{G}}+X_{\mathrm{G}}}{\alpha_{\mathrm{G}}+ \alpha_{\mathrm{M}%
}+\alpha_{\mathrm{D}}+n} \
\]
\[
E[p_{\mathrm{M}}|X_{\mathrm{G}},X_{\mathrm{M}}, X_{\mathrm{D}}]=\frac
{\alpha_{\mathrm{M}}+X_{\mathrm{M}}}{\alpha_{\mathrm{G}}+ \alpha_{\mathrm{M}%
}+\alpha_{\mathrm{D}}+n} \
\]
and
\[
E[p_{\mathrm{D}}|X_{\mathrm{G}},X_{\mathrm{M}}, X_{\mathrm{D}}]=\frac
{\alpha_{\mathrm{D}}+X_{\mathrm{D}}}{\alpha_{\mathrm{G}}+ \alpha_{\mathrm{M}%
}+\alpha_{\mathrm{D}}+n}\ .
\]
Use these expressions and describe what an optimal lot disposal (acceptance or
rejection) is, if a Dirichlet distribution is used to describe the $p$'s and a
sample of $n$ items yields counts $X_{\mathrm{G}}$, $X_{\mathrm{M}}$ and
$X_{\mathrm{D}}$.
\end{enumerate}
\item Consider the Deming Inspection Problem exactly as discussed in \S5.3.
Suppose that $k_{1}=\$50$, $k_{2}=\$500$, $N=200$ and one's \textit{a priori}
beliefs are such that one would describe $p$ with a (Beta) distribution with
mean .1 and standard deviation .090453. For what values of $n$ are
respectively $c=0$, 1 and 2 optimal? If you are brave (and either have a
pretty good calculator or are fairly quick with computing) compute the
expected total costs associated with these values of $n$ (obtained using the
corresponding $c^{\mathrm{opt}}$($n$)). From these calculations, what ($n,c$)
pair appears to be optimal?
\item Consider the problem of estimating the process fraction defective based
on the results of an ``inverse sampling plan'' that samples until 2 defective
items have been found. Find the UMVUE of $p$ in terms of the random variable
$n=$ the number of items required to find the second defective. Show directly
that this estimator of $p$ is unbiased (i.e. has expected value equal to $p$).
Write out a series giving the variance of this estimator.
\item The paper ``The Economics of Sampling Inspection`` by Bernard Smith
(that appeared in \textit{Industrial Quality Control} in 1965 and is based on
earlier theoretical work of Guthrie and Johns) gives a closed form expression
for an approximately optimal $n$ in the Deming inspection problem for cases
where $p$ has a Beta($\alpha,\beta)$ prior distribution and both $\alpha$ and
$\beta$ are integers. Smith says
\[
n^{\mathrm{opt}}\approx\sqrt{\frac{N\cdot\mathrm{B}(\alpha,\beta)p_{0}%
^{\alpha}(1-p_{0})^{\beta}}{2\left( p_{0}\mathrm{Bi}(\alpha|\alpha
+\beta-1,p_{0})-\frac{\alpha}{\alpha+\beta}\mathrm{Bi}(\alpha+1|\alpha
+\beta,p_{0})\right) }}%
\]
for $p_{0}\equiv k_{1}/k_{2}$ the break-even quantity, B($\cdot,\cdot)$ the
usual beta function and Bi$(x|n,p)$ the probability that a binomial $(n,p)$
random variable takes a value of $x$ \textit{or more}. Suppose that
$k_{1}=\$50$, $k_{2}=\$500$, $N=200$ and our \textit{a priori} beliefs about
$p$ (or the ``process curve'') are such that it is sensible to describe $p$ as
having mean .1 and standard deviation .090453. What fixed $n$ inspection plan
follows from the Smith formula?
\item Consider the Deming inspection scenario as discussed in \S5.3. Suppose
that $N=3$, $k_{1}=1.5$, $k_{2}=10$ and a prior distribution $G$ assigns
$P[p=.1]=.5$ and $P[p=.2]=.5$. Find the optimal fixed $n$ inspection plan by
doing the following.
\begin{enumerate}
\item For sample sizes $n=1$ and $n=2$, determine the corresponding optimal
acceptance numbers, $c^{\mathrm{opt}}_{G}(n)$.
\item For sample sizes $n=0$, 1, 2 and 3 find the expected total costs
associated with those sample sizes if corresponding best acceptance numbers
are used.
\end{enumerate}
\item Consider the Deming inspection scenario once again. With $N=100$,
$k_{1}=1$ and $k_{2}=10$, write out the fixed $p$ expected total cost
associated with a particular choice of $n$ and $c$. Note that ``None'' is
optimal for $p<.1$ and ``All'' is optimal for $p>.1$. So, in some sense, what
is exactly optimal is highly discontinuous in $p$. On the other hand, if $p$
is ``near'' .1, it doesn't matter much what inspection plan one adopts,
``All,'' ``None'' or anything else for that matter. To see this, write out as
a function of $p$
\[
\frac{\mbox{worst possible expected total cost}(p)-\mbox{best possible
expected total cost}(p)}{\mbox{best possible expected total cost}(p)}\ .
\]
How big can this quantity get, e.g., on the interval [.09,.11]?
\item Consider the following percent defective acceptance sampling scheme. One
will sample items one at a time up to a maximum of 8 items. If at any point in
the sampling, half or more of the items inspected are defective, sampling will
cease and the lot will be rejected. If the maximum 8 items are inspected
without rejecting the lot, the lot will be accepted.
\begin{enumerate}
\item Find expressions for the type B Operating Characteristic and the ASN of
this plan.
\item Find an expression for the type A Operating Characteristic of this plan
if lots of $N=50$ items are involved.
\item Find expressions for the type B AOQ and ATI of this plan for lots of
size $N=50$.
\item What is the (uniformly) minimum variance unbiased estimator of $p$ for
this plan? (Say what value one should estimate for every possible
stop-sampling point.)
\end{enumerate}
\item Vardeman argued in \S5.3 that if one adopts perspective B with known $p
$ and costs are assessed as the sum of identically calculated costs associated
with individual items, either ``All'' or ``None'' inspection plans will be
optimal. Consider the following two scenarios (that lack one or the other of
these assumptions) and show that in each the ``All or None'' paradigm fails to hold.
\begin{enumerate}
\item Consider the Deming inspection scenario discussed in \S5.3, with
$k_{1}=\$1$ and $k_{2}=\$100$ and suppose lots of $N=5$ are involved. Suppose
that one adopts not perspective B, but instead perspective A, and that $p$ is
known to be .2 (a lot contains exactly 1 defective). Find the expected total
costs associated with ``All'' and then with ``None'' inspection. Then suggest
a sequential inspection plan that has smaller expected total cost than either
``All'' or ``None.'' (Find the expected total cost of your suggested plan and
verify that it is smaller than that for both ``All'' and ``None'' inspection plans.)
\item Consider perspective B with $p$ known to be .4. Suppose lots of size
$N=5$ are involved and costs are assessed as follows. Each inspection costs
\$1 and defective items are replaced with good items at no charge. If the lot
fails to contain at least one good item (and this goes undetected) a penalty
of \$1000 will be incurred, but otherwise the only costs charged are for
inspection. Find the expected total costs associated with ``All'' and then
with ``None'' inspection. Then argue convincingly that there is a better
``fixed $n$'' plan. (Say clearly what plan is superior and show that its
expected total cost is less than both ``All'' and ``None`` inspection.)
\end{enumerate}
\item Consider the following nonstandard ``variables'' acceptance sampling
situation. A supplier has both a high quality/low variance production line
(\#1) and a low quality/high variance production line (\#2) used to
manufacture widgets ordered by Company V. Coded values of a critical dimension
of these widgets produced on the high quality line are normally distributed
with $\mu_{1}=0$ and $\sigma_{1}=1$, while coded values of this dimension
produced on the low quality line are normally distributed with $\mu_{2}=0$ and
$\sigma_{2}=2$. Coded specifications for this dimension are $L=-3$ and $U=3$.
The supplier is known to mix output from the two lines in lots sent to Company
V. As a cost saving measure, this is acceptable to Company V, provided the
fraction of ``out-of-spec.'' widgets does not become too large. Company V
expects
\[
\pi= \mbox{the proportion of items in a lot coming from the high variance line
(\#2)}%
\]
to vary lot to lot and decides to institute a kind of incoming variables
acceptance sampling scheme. What will be done is the following. The critical
dimension, $X$, will be measured on each of $n$ items sampled from a lot. For
each measurement $X$, the value $Y=X^{2}$ will be calculated. Then, for a
properly chosen constant, $k$, the lot will be accepted if $\bar{Y}\leq k$ and
rejected if $\bar{Y}>k$. The purpose of this problem is to identify suitable
$n$ and $k$, if $Pa\approx.95$ is desired for lots with $p=.01$ and
$Pa\approx.05$ is desired for lots with $p=.03$.
\begin{enumerate}
\item Find an expression for $p$ (the long run fraction defective) as a
function of $\pi$. What values of $\pi$ correspond to $p=.01$ and $p=.03$ respectively?
\item It is possible to show (you need not do so here) that E$Y=3\pi+1$ and
Var\,$Y=-9\pi^{2}+39\pi+2$. Use these facts, your answer to (a) and the
Central Limit Theorem to help you identify suitable values of $n$ and $k$ to
use at Company V.
\end{enumerate}
\item On what basis is it sensible to criticize the relevance of the
calculations usually employed to characterize the performance of continuous
sampling plans?
\item Individual items produced on a manufacturer's line may be graded as
``Good'' (G), ``Marginal'' (M) or ``Defective'' (D). Under stable process
conditions, each successive item is (independently) G with probability
$p_{\mathrm{G}}$, M with probability $p_{\mathrm{M}}$ and D with probability
$p_{\mathrm{D}}$, where $p_{\mathrm{G}}+p_{\mathrm{M}}+p_{\mathrm{D}}=1$.
Suppose that ultimately, defective items cause three times as much extra
expense as marginal ones.
Based on the kind of cost information alluded to above, one might give each
inspected item a ``score'' $s$ according to
\[
s= \left\{
\begin{array}
[c]{ll}%
3 & \mbox{if the item is D}\\
1 & \mbox{if the item is M}\\
0 & \mbox{if the item is G}\ .
\end{array}
\right.
\]
It is possible to argue (don't bother to do so here) that E$s=3p_{\mathrm{D}%
}+p_{\mathrm{M}}$ and Var\,$s=9p_{\mathrm{D}}(1-p_{\mathrm{D}})+p_{\mathrm{M}%
}(1-p_{\mathrm{M}})-3p_{\mathrm{D}}p_{\mathrm{M}}$.
\begin{enumerate}
\item Give formulas for standards-given Shewhart control limits for average
scores $\bar{s}$ based on samples of size $n$. Describe how you would obtain
the information necessary to calculate limits for future control of $\bar{s}$.
\item Ultimately, suppose that ``standard'' values are set at $p_{\mathrm{G}%
}=.90$, $p_{\mathrm{M}}=.07$ and $p_{\mathrm{D}}=.03$ and $n=100$ is used for
samples of a high volume product. Use a normal approximation to the
distribution of $\bar{s}$ and find an approximate ARL for your scheme from
part (a) if in fact the mix of items shifts to where $p_{\mathrm{G}%
}=.85,\,p_{\mathrm{M}}=.10$ and $p_{\mathrm{D}}=.05$.
\item Suppose that one decides to use a high side CUSUM scheme to monitor
\textit{individual} scores as they come in one at a time. Consider a scheme
with $k_{1}=1$ and no head-start that signals the first time that a CUSUM of
scores of at least $h_{1}=6$ is reached. Set up an appropriate transition
matrix and say how you would use that matrix to find an ARL for this scheme
for an arbitrary set of probabilities $(p_{\mathrm{G}},p_{\mathrm{M}%
},p_{\mathrm{D}})$.
\item Suppose that inspecting an item costs 1/5th of the extra expense caused
by an undetected marginal item. A plausible (single sampling) acceptance
sampling plan for lots of $N=10,000$ of these items then accepts the lot if
\[
\bar{s}\leq.20\ .
\]
If rejection of the lot will result in 100\% inspection of the remainder,
consider the (``perspective B'') economic choice of sample size for plans of
this form, in particular the comparison of $n=100$ and $n=400$ plans. The
following table gives some approximate acceptance probabilities for these
plans under two sets of probabilities $\mbox{\boldmath$p$}=(p_{\mathrm{G}%
},p_{\mathrm{M}},p_{\mathrm{D}})$.%
\begin{tabular}
[c]{lcc}
& $n=100$ & $n=400$\\\cline{2-3}%
$\mbox{\boldmath$p$}=(.9,.07,.03)$ & \multicolumn{1}{|c|}{$Pa\approx.76$} &
\multicolumn{1}{|c|}{$Pa\approx.92$}\\\cline{2-3}%
$\mbox{\boldmath$p$}=(.85,.10,.05)$ & \multicolumn{1}{|c|}{$Pa\approx.24$} &
\multicolumn{1}{|c|}{$Pa\approx.08$}\\\cline{2-3}%
\end{tabular}
Find expected costs for these two plans ($n=100$ and $n=400$) if costs are
accrued on a per-item and per-inspection basis and ``prior'' probabilities of
these two sets of process conditions are respectively .8 for $\mbox
{\boldmath$p$}=(.9,.07,.03)$ and .2 for $\mbox{\boldmath$p$}=(.85,.10,.05)$.
\end{enumerate}
\item Consider variables acceptance sampling for a quantity $X$ that has
engineering specifications $L=3$ and $U=5$. We will further suppose that $X$
has standard deviation $\sigma=.2$.
\begin{enumerate}
\item Suppose that $X$ is \textit{uniformly distributed} with mean $\mu$. That
is, suppose that $X$ has probability density
\[
f(x) = \left\{
\begin{array}
[c]{ll}%
1.4434 & \mbox{if}~~\mu-.34644.97685$. In fact, a Weibull distribution with shape parameter
$\beta=400$ and scale parameter $\alpha$ is a better description of this
characteristic than the normal distribution the statistician used. This
alternative distribution has cdf
\[
F(x|\alpha)=\left\{
\begin{array}
[c]{ll}%
0 & \mbox{if}~x<0\\
1-\exp(-\left( \frac{x}{\alpha}\right) ^{400}) & \mbox{if}~x>0\ ,
\end{array}
\right.
\]
and mean $\mu\approx.9986\alpha$ and standard deviation $\sigma=.0032\alpha$.
Show how to obtain an approximate OC curve for the statistician's acceptance
sampling plan under this Weibull model. (Use the Central Limit Theorem.) Use
your method to find the real acceptance probability if $p=.03$.
\item Here's a prescription for a possible fraction nonconforming attributes
acceptance sampling plan:
\qquad stop and reject the lot the first time that $X_{n}\geq2+\frac{n}{2}$
\qquad stop and accept the lot the first time that $n-X_{n}\geq2+\frac{n}{2} $
\begin{enumerate}
\item Find a formula for the OC for this ``symmetric wedge-shaped plan.'' (One
never samples more than $7$ items and there are exactly $8$ stop sampling
points prescribed by the rules above.)
\item Consider the use of this plan where lots of size $N=100$ are subjected
to rectifying inspection and inspection error is possible. (Assume that any
item inspected and classified as defective is replaced with one drawn from a
population that is in fact a fraction $p$ defective and has been inspected and
classified as good.) Use the parameters $w_{\mathrm{G}}$ and $w_{\mathrm{D}}$
defined in \S5.2 of the notes and give a formula for the real AOQ of this plan
as a function of $p$, $w_{\mathrm{G}}$ and $w_{\mathrm{D}}$.
\end{enumerate}
\item Consider a ``perspective A'' economic analysis of some fraction
defective ``fixed $n$ inspection plans.'' (Don't simply try to use the type B
calculations made in class. They aren't relevant. Work this out from first principles.)
Suppose that $N=10$, $k_{1}=1$ and $k_{2}=10$ in a ``Deming Inspection
Problem'' cost structure. Suppose further that a ``prior'' distribution for
$p$ (the actual lot fraction defective) places equal probabilities on $p=0,.1$
and $.2$ . Here we will consider only plans with $n=0,1$ or $2$. Let%
\[
X=\text{the number of defectives in a simple random sample from the lot}%
\]
\begin{enumerate}
\item For $n=1$, find the conditional distributions of $p$ given $X=x$.\bigskip
For $n=2$, it turns out that the joint distribution of $X$ and $p$ is:%
\begin{tabular}
[c]{lllcll}
& & & $x$ & & \\
& \multicolumn{1}{c}{} & \multicolumn{1}{c}{$0$} & $1$ &
\multicolumn{1}{c}{$2$} & \multicolumn{1}{c}{}\\\cline{3-5}
& \multicolumn{1}{c}{$0$} & \multicolumn{1}{|c}{$.333$} &
\multicolumn{1}{|c}{$0$} & \multicolumn{1}{|c}{$0$} &
\multicolumn{1}{|c}{$.333$}\\\cline{3-5}%
$p$ & \multicolumn{1}{c}{$.1$} & \multicolumn{1}{|c}{$.267$} &
\multicolumn{1}{|c}{$.067$} & \multicolumn{1}{|c}{$0$} &
\multicolumn{1}{|c}{$.333$}\\\cline{3-5}
& \multicolumn{1}{c}{$.2$} & \multicolumn{1}{|c}{$.207$} &
\multicolumn{1}{|c}{$.119$} & \multicolumn{1}{|c}{$.007$} &
\multicolumn{1}{|c}{$.333$}\\\cline{3-5}
& \multicolumn{1}{c}{} & \multicolumn{1}{c}{$.807$} & $.185$ &
\multicolumn{1}{c}{$.007$} & \multicolumn{1}{c}{}%
\end{tabular}
and the conditionals of $p$ given $X=x$ are:%
\begin{tabular}
[c]{lllcl}
& & & $x$ & \\
& \multicolumn{1}{c}{} & \multicolumn{1}{c}{$0$} & $1$ &
\multicolumn{1}{c}{$2$}\\\cline{3-5}
& \multicolumn{1}{c}{$0$} & \multicolumn{1}{|c}{$.413$} &
\multicolumn{1}{||c}{$0$} & \multicolumn{1}{||c|}{$0$}\\\cline{3-5}%
$p$ & \multicolumn{1}{c}{$.1$} & \multicolumn{1}{|c}{$.330$} &
\multicolumn{1}{||c}{$.360$} & \multicolumn{1}{||c|}{$0$}\\\cline{3-5}
& \multicolumn{1}{c}{$.2$} & \multicolumn{1}{|c}{$.2257$} &
\multicolumn{1}{||c}{$.640$} & \multicolumn{1}{||c|}{$1.00$}\\\cline{3-5}%
\end{tabular}
\bigskip
\item Use your answer to (a) and show that the best $n=1$ plan REJECTS if
$X=0$ and ACCEPTS if $X=1$. (Yes, this is correct!) Then use the conditionals
above for $n=2$ and show that the best $n=2$ plan REJECTS if $X=0$ and ACCEPTS
if $X=1$ or $2$.
\item Standard acceptance sampling plans REJECT FOR LARGE $X$. Explain in
qualitative terms why the best plans from (b) are not of this form.
\item Which sample size ($n=0,1$ or$\ 2$) is best here? (Show calculations to
support your answer.)
\end{enumerate}
\end{enumerate}
\backmatter\appendix
\chapter{A Useful Probabilistic Approximation}
Here we present the general ``delta method'' or ``propagation of error''
approximation that stands behind several variance approximations in these
notes as well as much of \S5.4 of V\&J.\ Suppose that a $p\times1$ random
vector%
\[
\mathop{\mbox{\boldmath$X$}}=\left(
\begin{array}
[c]{c}%
X_{1}\\
X_{2}\\
\vdots\\
X_{p}%
\end{array}
\right)
\]
has a mean vector%
\[
\mathbf{\mu}=\left(
\begin{array}
[c]{c}%
EX_{1}\\
EX_{2}\\
\vdots\\
EX_{p}%
\end{array}
\right) =\left(
\begin{array}
[c]{c}%
\mu_{1}\\
\mu_{2}\\
\vdots\\
\mu_{p}%
\end{array}
\right)
\]
and $p\times p$ variance-covariance matrix%
\begin{align*}
\mathbf{\Sigma} & =\left(
\begin{array}
[c]{ccccc}%
\text{Var}X_{1} & \text{Cov}\left( X_{1},X_{2}\right) & \cdots &
\text{Cov}\left( X_{1},X_{p-1}\right) & \text{Cov}\left( X_{1},X_{p}\right)
\\
\text{Cov}\left( X_{1},X_{2}\right) & \text{Var}X_{2} & \cdots &
\text{Cov}\left( X_{2},X_{p-1}\right) & \text{Cov}\left( X_{2},X_{p}\right)
\\
\vdots & \vdots & \ddots & \vdots & \vdots\\
\text{Cov}\left( X_{1},X_{p-1}\right) & \text{Cov}\left( X_{2}%
,X_{p-1}\right) & \cdots & \text{Var}X_{p-1} & \text{Cov}\left(
X_{p-1},X_{p}\right) \\
\text{Cov}\left( X_{1},X_{p}\right) & \text{Cov}\left( X_{2},X_{p}\right)
& \cdots & \text{Cov}\left( X_{p-1},X_{p}\right) & \text{Var}X_{p}%
\end{array}
\right) \\
& =\left(
\begin{array}
[c]{ccccc}%
\sigma_{1}^{2} & \rho_{12}\sigma_{1}\sigma_{2} & \cdots & \rho_{1,p-1}%
\sigma_{1}\sigma_{p-1} & \rho_{1p}\sigma_{1}\sigma_{p}\\
\rho_{12}\sigma_{1}\sigma_{2} & \sigma_{2}^{2} & \cdots & \rho_{2,p-1}%
\sigma_{2}\sigma_{p-1} & \rho_{2p}\sigma_{2}\sigma_{p}\\
\vdots & \vdots & \ddots & \vdots & \vdots\\
\rho_{2p}\sigma_{2}\sigma_{p} & \rho_{2,p-1}\sigma_{2}\sigma_{p-1} & \cdots &
\sigma_{p-1}^{2} & \rho_{p-1,p}\sigma_{p-1}\sigma_{p}\\
\rho_{1p}\sigma_{1}\sigma_{p} & \rho_{2p}\sigma_{2}\sigma_{p} & \cdots &
\rho_{p-1,p}\sigma_{p-1}\sigma_{p} & \sigma_{p}^{2}%
\end{array}
\right) \\
& =\left( \rho_{ij}\sigma_{i}\sigma_{j}\right)
\end{align*}
\noindent(Recall that if $X_{1}$ and $X_{j}$ are independent, $\rho_{ij}=0$.)
Then for a $k\times p$ matrix of constants%
\[
\mathop{\mbox{\boldmath$A$}}=\left( a_{ij}\right)
\]
consider the random vector%
\[
\mathop{\mbox{\boldmath$Y$}}\limits_{k\times1}=\mathop{\mbox{\boldmath$A$}%
}\limits_{k\times p}\mathop{\mbox{\boldmath$X$}}\limits_{p\times1}%
\]
It is a standard piece of probability that $\mathop{\mbox{\boldmath$Y$}}$ has
mean vector%
\[
\left(
\begin{array}
[c]{c}%
EY_{1}\\
EY_{2}\\
\vdots\\
EY_{k}%
\end{array}
\right) =\mathop{\mbox{\boldmath$A$}}\mathbf{\mu}%
\]
and variance-covariance matrix%
\[
\text{Cov}\mathop{\mbox{\boldmath$Y$}}=\mathop{\mbox{\boldmath$A$}%
}\mathbf{\Sigma}\mathop{\mbox{\boldmath$A$}'}%
\]
\noindent(The $k=1$ version of this for uncorrelated $X_{i}$ is essentially
quoted in (5.23) and (5.24) of V\&J.)
The propagation of error method says that if instead of the relationship
$\mathop{\mbox{\boldmath$Y$}}=\mathop{\mbox{\boldmath$A$}}\mathop
{\mbox{\boldmath$X$}}$, I concern myself with $k$ functions $g_{1}%
,g_{2},...,g_{k}$ (each mapping $\mathbf{R}^{p}$ to $\mathbf{R}$) and define%
\[
\mathop{\mbox{\boldmath$Y$}}=\left(
\begin{array}
[c]{c}%
g_{1}(\mathop{\mbox{\boldmath$X$}})\\
g_{2}(\mathop{\mbox{\boldmath$X$}})\\
\vdots\\
g_{k}(\mathop{\mbox{\boldmath$X$}})
\end{array}
\right)
\]
a multivariate Taylor's Theorem argument and the facts above provide an
approximate mean vector and an approximate covariance matrix for
$\mathop{\mbox{\boldmath$Y$}}$. \ That is, if the functions $g_{i}$ are
differentiable, let%
\[
\mathop{\mbox{\boldmath$D$}}\limits_{k\times p}=\left( \frac{\partial g_{i}%
}{\partial x_{j}}\bigg|_{\mu_{1},\mu_{2},...,\mu_{p}}\right)
\]
A multivariate Taylor approximation says that for each $x_{i}$ near $\mu_{i} $%
\[
\mathop{\mbox{\boldmath$y$}}=\left(
\begin{array}
[c]{c}%
g_{1}(\mathop{\mbox{\boldmath$x$}})\\
g_{2}(\mathop{\mbox{\boldmath$x$}})\\
\vdots\\
g_{k}(\mathop{\mbox{\boldmath$x$}})
\end{array}
\right) \approx\left(
\begin{array}
[c]{c}%
g_{1}(\mathbf{\mu})\\
g_{2}(\mathbf{\mu})\\
\vdots\\
g_{k}(\mathbf{\mu})
\end{array}
\right) +\mathop{\mbox{\boldmath$D$}}\left( \mathop{\mbox{\boldmath$x$}%
}-\mathbf{\mu}\right)
\]
So if the variances of the $X_{i}$ are small (so that with high probability
$\mathop{\mbox{\boldmath$Y$}}$ is near $\mathbf{\mu}$, that is that the linear
approximation above is usually valid) it is plausible that $\mathop
{\mbox{\boldmath$Y$}}$ has mean vector%
\[
\left(
\begin{array}
[c]{c}%
EY_{1}\\
EY_{2}\\
\vdots\\
EY_{k}%
\end{array}
\right) \approx\left(
\begin{array}
[c]{c}%
g_{1}(\mathbf{\mu})\\
g_{2}(\mathbf{\mu})\\
\vdots\\
g_{k}(\mathbf{\mu})
\end{array}
\right)
\]
and variance-covariance matrix%
\[
\text{Cov}\mathop{\mbox{\boldmath$Y$}}\approx\mathop{\mbox{\boldmath$D$}%
}\mathbf{\Sigma}\mathop{\mbox{\boldmath$D$}'}%
\]
\end{document}__