April 19, 2004 Notes on the SPLIDA Splus GUI functions for life data analysis Copyright 2004 W. Q. Meeker These notes provide a brief introduction to some of the capabilities of SPLIDA, including the SPLIDA command-line operation. Detailed instruction for installation and operation within the SPLIDA GUI are contained in the SPLIDA documentation (SplidaGui.pdf). After SPLIDA is installed in the usual way, Splida will appear on the SPLUS menu bar. The GUI menu structure was designed so that it could be used without reference to documentation. The use of the menu structure should be, for the most part, straightforward and intuitive. SPLIDA will work with any version of Splus after 6.0 (there is a different unsupported version for S-PLUS versions 4.5 and 2000 but it is strongly recommended that SPLIDA users upgrade to S-PLUS 6.0 or better). The most important statistical tools for the most widely used models are available for use through the SPLIDA GUI. A few other more sophisticated models and analyses are available only through the command line. Some of these more sophisticated models will eventually be available in the GUI. These rest of these notes describe Splus command-line functions for the analysis of censored life data and data from other nonstandard models. For those who will use the GUI, this information is not really necessary, but might be interesting to provide glimpse of what is happening under the GUI and to indicate possibilities for user-extension and modification of SPLIDA (which Luis Escobar and I intend to document more fully in the future). One big advantage of working in S-PLUS is that changes to the functions are easy to implement. Users who are familiar with the Splus language can rename my functions and customize them to meet special needs. The work on SPLIDA has been motivated by research problems, consulting problems, the need to software my Statistics 533 course, and to do the examples in Meeker and Escobar (1998), Statistical Methods for Reliability Data, published John Wiley and Sons, Inc. I have focused my development to provide high-quality graphical output of the results, although procedures also provide tabular output, either automatically or when requested (depending on the length). This collection of SPLIDA/S-PLUS functions can, roughly, be divided into two different groups of functions 1. Functions that can be used to analyze censored data with standard models using nonparametric methods, standard life distributions, and accelerated life test relationships (Chapters 1-8, 16-21 of Meeker and Escobar (1998)). These functions can be used by simply giving commands to Splus (or by sending Splus a file of commands to run in batch mode). One does not have to know much about Splus or anything about "programming" in Splus to use these functions. 2. Functions that allow likelihood analysis and maximum likelihood fitting of nonstandard models, but that require the user to program a likelihood in the Splus language. This second collection of functions also allows the user to compute likelihood profiles (one and two dimensional). This approach was used to fit a number of the special models and distributions in Chapter 11 of Meeker and Escobar. This provides a number of examples that could be easily extended to yet other distributions and models. These notes describe only the first set of functions. Examples of all of these functions are given in the "echapter?.q" in the echapters folder. An Splus object with the name like xxx.ld is a "life.data" object containing the information about dataset xxx. Typically there will be a different .ld file for each data set to be analyzed. The xxx.ld life.data objects contain information like times, censor codes, case weights (for ties, interval-count data or multiple censoring at a point), units of time, information about explanatory variables (e.g., for regression or acceleration models), data set title, etc. SPLIDA uses some of the Splus object-oriented programming features. This makes it very easy to do different analyses and reduces the number of function names that must be used. Detailed documentation for these functions has not yet been prepared. Instead users can rely on the large number of examples in the echapters folder. Here is a brief description of some of the most important functions: *************** > xxx.ld <- frame.to.ld(...) input: ascii data file or data frame name column(s) of responses (2 columns needed if there are intervals) column containing censor indicators (default no censoring) column containing case weights (or multiplicities) (default all case have weight 1) data title (for plots and tables) units for response (e.g., minutes, hours, days, or cycles) columns containing explanatory variables (default is none) names for the x variables (default is x1, x2, ...) The numerical censor codes are: 0 dummy observations (ignored in analysis) 1 exact failure time 2 right censored observation 3 left censored observation (or interval assumed to start at 0 of -infinity, depending on the support of the specified distribution) 4 interval censored observation 5 small interval around reported exact failure time (useful when the density approximation to the likelihood is inadequate). It is possible (actually suggested) that these codes be replaced with meaningful words or symbols (like "Failed" and "Censored") and this has been done in the data sets distributed with SPLIDA. The list of allow synonyms for the censor codes (which are still allowed) can be seen by using the SPLIDA commands: failure.censor.names: GetSlidaDefault("SLIDA.FailName") right.censor.names: GetSlidaDefault("SLIDA.RcName") left.censor.names: GetSlidaDefault("SLIDA.LcName") interval.censor.names: GetSlidaDefault("SLIDA.IcName") sinterval.censor.names: GetSlidaDefault("SLIDA.DefaultSintervalCensorNames") There is a corresponding > xxx.rdu <- frame.to.rd(...) for recurrence data objects (xxx.rdu Chapter 16) > xxx.rmd <- frame.to.rmd(...) for repeated measures degradation data, (Chapters 13 and 21) > xxx.ddd <- frame.to.ddd(...) for destructive degradation data, (new since our book was published, but echapter 23.q in the example commands). See the detailed examples in the echapter files and the corresponding data in the SPLIDA_textdata folder. *************** > summary(lzbearing.ld) > print(lzbearing.lld) The first command provides a summary of the indicated data set. The second command prints the data set. > summary(lzbearing.ld) Data set name: Ball Bearing Cycles to Failure Number of rows in data matrix = 23 Response units: Millions of Cycles Response minimum: 17.88 Response maximum: 173.4 Number of cases in data set = 23 Number of exact failures in data set= 23 Number of right censored observations in data set= 0 Number of left censored observations in data set= 0 Number of interval censored observations in data set= 0 Number of small-interval observations in data set= 0 No explanatory variables *************** > plot(lzbearing.ld) Plots the empirical cdf on a linear-by-linear plot with simultaneous confidence bands, by default or can get a log scale on the x (time) axis with > plot(lzbearing.ld, x.axis="log") To see a table of the output use > print(plot(lzbearing.ld)) To get instead a set of point-wise confidence intervals with a log time axis use: > plot(lzbearing.ld, x.axis="log", band.type="Point-wise") *************** To obtain a probability plot of the requested distribution, use > plot(lzbearing.ld, distribution="Weibull") Simultaneous confidence bands (using the method described in Vijay Nair's 1984 Technometrics paper) are provided by default, but pointwise nonparametric confidence intervals can be requested instead. > plot(lzbearing.ld, distribution="Weibull", band.type="p") The distribution can be specified using and of the following names. sev weibull Weibull normal Normal lognormal Lognormal logistic Logistic loglogistic Loglogistic exponential Exponential In the commands, you can control the axes by using something like plot(xx, x.range=c(my.min,my.max), y.range=c(my.min,my.max)) If you want SPLIDA to choose any of the above, use something like: plot(xx, x.range=c(NA,my.max), y.range=c(my.min,NA)) *************** > mleprobplot(lzbearing.ld, distribution="Lognormal") *************** Makes a probability plot of the requested distribution and superimposes an ML fit with a set of pointwise parametric confidence intervals on failure probabilities. To get tabular output, use > lzbearing.mlest.out <- mleprobplot(lzbearing.ld, distribution="Lognormal") print(lzbearing.mlest.out) quantiles(lzbearing.mlest.out) failure.probabilities(lzbearing.mlest.out) *************** > compare.mlprobplot(lzbearing.ld, main.distribution="Lognormal", compare.distribution="Weibull") *************** This is similar to mleprobplot(), but also superimposes the ML fit of the "compare.distribution". *************** > censored.data.plot(superalloy.ld) *************** Plot the response versus all explanatory variables. *************** > groupi.mleprobplot(mylarpoly.ld,distribution="Weibull") For a set of accelerated life test data with subexperiments at a small number of stress-levels, produces a multiple probability plot with ML fits done individually to each subexperiment and plotted on the plot (slopes may not be equal because the spread parameters are not constrained to be the same) *************** > groupm.mleprobplot(mylarpoly.ld, distribution="Weibull", relationship="log") For a set of accelerated life test data with subexperiments at a small number of stress-levels, produces a multiple probability plot with a model ML tying together the subexperiments and plotted on the plot (slopes will be equal because the spread parameters are constrained to be the same). SPLIDA can also fit models with nonconstant sigma. One warning here, is that a separate algorithm is being used and this algorithm is not as robust to problems with ill-conditioned x-matrices. The user has to make sure that the inputted x matrix is well conditioned. For example, the quadratic model for location for the Nelson super alloy fatigue data in chapter 17 can be analyzed using the parameterization suggested in Nelson (1984) in which the log stress variable is centered before it is squared. In Meeker and Escobar (1998) we did not center the x variable as we feel that today, users of statistical methods should not be forced to do such things. The the coefficients in our presentation there were worked out in a different way that has not been programmed in general. With some more programming effort, we could make our nonconstant-sigma algorithm robust too, but we have not gotten to this. To fit models in which there is a log linear model for sigma, one must generalize the input explan.var. Instead of a vector, it must be a list of two vectors, one for mu and one for sigma. The following command fits the quadratic model for location and a log-linear model for sigma to the Nelson superalloy data. The Splida data object superalloy.ld contains in its X matrix: PseudoSress LogPseudoSress LogPseudoSress2 This model is fit in SPLIDA with the following command: gmlest(superalloy.ld,dist="Weibull", explan.vars=list(mu.relat=c(2,3), sigma.relat=c(2))) I would welcome feedback and suggestions for improvement of these functions. I intend to continue development. Please feel free to call or send email if you have questions. The most up-to-date version of Splida can always be found at http://www.public.iastate.edu/~splida Please send email to wqmeeker@iastate.edu if you would like to be notified when new versions have been posted. This document and other SPLIDA materials may be freely copied for educational purposes. Reference: Meeker, W. Q. and Escobar, L. A. (1998), Statistical Methods for Reliability Data, New York: John Wiley and Sons. (800)-526-5368 ISBN 0471143286 --------------------------------------- --------------------------------------- There is a continuing, sophisticated process for checking computations done with SPLIDA. It is, of course, possible that bugs exist in the software. I will try to investigate and fix any problems that are reported to me. Because it is free, however, SPLIDA comes with NO GUARANTEE OR WARRANTY, IMPLIED OR OTHERWISE. --------------------------------------- William Q. Meeker Department of Statistics Iowa State University Ames, IA 50011 wqmeeker@iastate.edu