April 19, 2004
Notes on the SPLIDA Splus GUI functions for life data analysis
Copyright 2004 W. Q. Meeker
These notes provide a brief introduction to some of the capabilities
of SPLIDA, including the SPLIDA command-line operation. Detailed
instruction for installation and operation within the SPLIDA GUI are
contained in the SPLIDA documentation (SplidaGui.pdf).
After SPLIDA is installed in the usual way, Splida will appear on the
SPLUS menu bar. The GUI menu structure was designed so that it could be
used without reference to documentation. The use of the menu
structure should be, for the most part, straightforward and
intuitive.
SPLIDA will work with any version of Splus after 6.0 (there is a
different unsupported version for S-PLUS versions 4.5 and 2000 but
it is strongly recommended that SPLIDA users upgrade to S-PLUS 6.0
or better).
The most important statistical tools for the most widely used models
are available for use through the SPLIDA GUI. A few other more
sophisticated models and analyses are available only through the
command line. Some of these more sophisticated models will
eventually be available in the GUI.
These rest of these notes describe Splus command-line functions for
the analysis of censored life data and data from other nonstandard
models. For those who will use the GUI, this information is not
really necessary, but might be interesting to provide glimpse of
what is happening under the GUI and to indicate possibilities for
user-extension and modification of SPLIDA (which Luis Escobar and I
intend to document more fully in the future). One big advantage of
working in S-PLUS is that changes to the functions are easy to
implement. Users who are familiar with the Splus language can rename
my functions and customize them to meet special needs.
The work on SPLIDA has been motivated by research problems,
consulting problems, the need to software my Statistics 533 course,
and to do the examples in Meeker and Escobar (1998), Statistical
Methods for Reliability Data, published John Wiley and Sons, Inc. I
have focused my development to provide high-quality graphical output
of the results, although procedures also provide tabular output,
either automatically or when requested (depending on the length).
This collection of SPLIDA/S-PLUS functions can, roughly, be divided
into two different groups of functions
1. Functions that can be used to analyze censored data with standard
models using nonparametric methods, standard life distributions, and
accelerated life test relationships (Chapters 1-8, 16-21 of Meeker
and Escobar (1998)). These functions can be used by simply giving
commands to Splus (or by sending Splus a file of commands to run in
batch mode). One does not have to know much about Splus or anything
about "programming" in Splus to use these functions.
2. Functions that allow likelihood analysis and maximum likelihood
fitting of nonstandard models, but that require the user to program
a likelihood in the Splus language. This second collection of
functions also allows the user to compute likelihood profiles (one
and two dimensional). This approach was used to fit a number of the
special models and distributions in Chapter 11 of Meeker and
Escobar. This provides a number of examples that could be easily
extended to yet other distributions and models.
These notes describe only the first set of functions. Examples of
all of these functions are given in the "echapter?.q" in the
echapters folder.
An Splus object with the name like xxx.ld is a "life.data" object
containing the information about dataset xxx. Typically there will be a
different .ld file for each data set to be analyzed. The xxx.ld life.data
objects contain information like times, censor codes, case weights
(for ties, interval-count data or multiple censoring at a point),
units of time, information about explanatory variables (e.g., for
regression or acceleration models), data set title, etc.
SPLIDA uses some of the Splus object-oriented programming features.
This makes it very easy to do different analyses and reduces the
number of function names that must be used.
Detailed documentation for these functions has not yet been
prepared. Instead users can rely on the large number of examples
in the echapters folder. Here is a brief description of some of the
most important functions:
***************
> xxx.ld <- frame.to.ld(...)
input: ascii data file or data frame name
column(s) of responses (2 columns needed if there are intervals)
column containing censor indicators (default no censoring)
column containing case weights (or multiplicities) (default
all case have weight 1)
data title (for plots and tables)
units for response (e.g., minutes, hours, days, or cycles)
columns containing explanatory variables (default is none)
names for the x variables (default is x1, x2, ...)
The numerical censor codes are:
0 dummy observations (ignored in analysis)
1 exact failure time
2 right censored observation
3 left censored observation (or interval assumed to start at 0
of -infinity, depending on the support of the specified distribution)
4 interval censored observation
5 small interval around reported exact failure time
(useful when the density approximation to the likelihood is inadequate).
It is possible (actually suggested) that these codes be replaced
with meaningful words or symbols (like "Failed" and "Censored") and
this has been done in the data sets distributed with SPLIDA. The list
of allow synonyms for the censor codes (which are still allowed) can
be seen by using the SPLIDA commands:
failure.censor.names: GetSlidaDefault("SLIDA.FailName")
right.censor.names: GetSlidaDefault("SLIDA.RcName")
left.censor.names: GetSlidaDefault("SLIDA.LcName")
interval.censor.names: GetSlidaDefault("SLIDA.IcName")
sinterval.censor.names: GetSlidaDefault("SLIDA.DefaultSintervalCensorNames")
There is a corresponding
> xxx.rdu <- frame.to.rd(...)
for recurrence data objects (xxx.rdu Chapter 16)
> xxx.rmd <- frame.to.rmd(...)
for repeated measures degradation data, (Chapters 13 and 21)
> xxx.ddd <- frame.to.ddd(...)
for destructive degradation data, (new since our book was
published, but echapter 23.q in the example commands).
See the detailed examples in the echapter files and the corresponding
data in the SPLIDA_textdata folder.
***************
> summary(lzbearing.ld)
> print(lzbearing.lld)
The first command provides a summary of the indicated data set.
The second command prints the data set.
> summary(lzbearing.ld)
Data set name: Ball Bearing Cycles to Failure
Number of rows in data matrix = 23
Response units: Millions of Cycles
Response minimum: 17.88
Response maximum: 173.4
Number of cases in data set = 23
Number of exact failures in data set= 23
Number of right censored observations in data set= 0
Number of left censored observations in data set= 0
Number of interval censored observations in data set= 0
Number of small-interval observations in data set= 0
No explanatory variables
***************
> plot(lzbearing.ld)
Plots the empirical cdf on a linear-by-linear plot with simultaneous
confidence bands, by default or can get a log scale on the x (time)
axis with
> plot(lzbearing.ld, x.axis="log")
To see a table of the output use
> print(plot(lzbearing.ld))
To get instead a set of point-wise confidence intervals with a log time axis use:
> plot(lzbearing.ld, x.axis="log", band.type="Point-wise")
***************
To obtain a probability plot of the requested distribution, use
> plot(lzbearing.ld, distribution="Weibull")
Simultaneous confidence bands (using the method described in Vijay
Nair's 1984 Technometrics paper) are provided by default, but
pointwise nonparametric confidence intervals can be requested
instead.
> plot(lzbearing.ld, distribution="Weibull", band.type="p")
The distribution can be specified using and of the following names.
sev
weibull
Weibull
normal
Normal
lognormal
Lognormal
logistic
Logistic
loglogistic
Loglogistic
exponential
Exponential
In the commands, you can control the axes by using something like
plot(xx, x.range=c(my.min,my.max), y.range=c(my.min,my.max))
If you want SPLIDA to choose any of the above, use something like:
plot(xx, x.range=c(NA,my.max), y.range=c(my.min,NA))
***************
> mleprobplot(lzbearing.ld, distribution="Lognormal")
***************
Makes a probability plot of the requested distribution
and superimposes an ML fit with a set of pointwise parametric
confidence intervals on failure probabilities.
To get tabular output, use
> lzbearing.mlest.out <- mleprobplot(lzbearing.ld, distribution="Lognormal")
print(lzbearing.mlest.out)
quantiles(lzbearing.mlest.out)
failure.probabilities(lzbearing.mlest.out)
***************
> compare.mlprobplot(lzbearing.ld,
main.distribution="Lognormal", compare.distribution="Weibull")
***************
This is similar to mleprobplot(), but also superimposes the ML fit of the
"compare.distribution".
***************
> censored.data.plot(superalloy.ld)
***************
Plot the response versus all explanatory variables.
***************
> groupi.mleprobplot(mylarpoly.ld,distribution="Weibull")
For a set of accelerated life test data with
subexperiments at a small number of stress-levels, produces a
multiple probability plot with ML fits done individually to each
subexperiment and plotted on the plot (slopes may not be equal because
the spread parameters are not constrained to be the same)
***************
> groupm.mleprobplot(mylarpoly.ld, distribution="Weibull",
relationship="log")
For a set of accelerated life test data with subexperiments at a
small number of stress-levels, produces a multiple probability plot
with a model ML tying together the subexperiments and plotted on the
plot (slopes will be equal because the spread parameters are
constrained to be the same).
SPLIDA can also fit models with nonconstant sigma. One warning here,
is that a separate algorithm is being used and this algorithm is not
as robust to problems with ill-conditioned x-matrices. The user has
to make sure that the inputted x matrix is well conditioned. For
example, the quadratic model for location for the Nelson super alloy
fatigue data in chapter 17 can be analyzed using the
parameterization suggested in Nelson (1984) in which the log stress
variable is centered before it is squared. In Meeker and Escobar
(1998) we did not center the x variable as we feel that today, users
of statistical methods should not be forced to do such things. The
the coefficients in our presentation there were worked out in a
different way that has not been programmed in general. With some
more programming effort, we could make our nonconstant-sigma
algorithm robust too, but we have not gotten to this.
To fit models in which there is a log linear model for sigma, one
must generalize the input explan.var. Instead of a vector, it must
be a list of two vectors, one for mu and one for sigma. The
following command fits the quadratic model for location and a
log-linear model for sigma to the Nelson superalloy data. The Splida
data object superalloy.ld contains in its X matrix:
PseudoSress LogPseudoSress LogPseudoSress2
This model is fit in SPLIDA with the following command:
gmlest(superalloy.ld,dist="Weibull",
explan.vars=list(mu.relat=c(2,3), sigma.relat=c(2)))
I would welcome feedback and suggestions for improvement of these
functions. I intend to continue development. Please feel free to call
or send email if you have questions.
The most up-to-date version of Splida can always be found at
http://www.public.iastate.edu/~splida
Please send email to wqmeeker@iastate.edu if you would like to be
notified when new versions have been posted.
This document and other SPLIDA materials may be freely copied for
educational purposes.
Reference:
Meeker, W. Q. and Escobar, L. A. (1998),
Statistical Methods for Reliability Data,
New York: John Wiley and Sons. (800)-526-5368
ISBN 0471143286
---------------------------------------
---------------------------------------
There is a continuing, sophisticated process for checking
computations done with SPLIDA. It is, of course, possible that bugs
exist in the software. I will try to investigate and fix any
problems that are reported to me. Because it is free, however, SPLIDA
comes with NO GUARANTEE OR WARRANTY, IMPLIED OR OTHERWISE.
---------------------------------------
William Q. Meeker
Department of Statistics
Iowa State University
Ames, IA 50011
wqmeeker@iastate.edu