In this issue:
Michael P. Windham
Department of Mathematics and Statistics Utah State University
Logan, UT 84322
windham@math.usu.edu
Let me repeat the questions I posed in the last issue. "We also need information from you. The Board is discussing the feasibility of putting the Newsletter and perhaps the Classification Literature Automated Search Service "on-line". These publications would be easily accessed through Internet, with user friendly interfaces to such facilities as Mosaic. They would possibly be available on a floppy disk. Eventually, meaning soon, no paper copies would be distributed, only magnetic and electronic, PROVIDED you can still get them yourself. So, my questions to you are
1. How do you feel about this change?
2. Do you have convenient access to Internet? (Silence = "yes") If so,
would you use it to access these publications?
3. Would it have any negative impact on your ability to use these
publications?
4. Would lose interest in the Society, because of the change?
Please let me know what you think, my addresses (paper or electronic) are above, or FAX to (801) 797 1822 (paper and electronic!)." So far, the response has been somewhat limited, so I again encourage you to respond. The feeling I have from the responses I have received is that people are enthusiastic about the possibility of on-line access. One person was concerned about not having the necessary facilities to use Mosaic, at least for now. So, in your response, could state any such protocol limitations you might have now. Our goal would be to provide access to as many as possible, even if it requires several media.
Thanks to those who have responded and provided many useful suggestions.
CSNA Nominations
The affairs of the Society are managed by a Board of Directors which includes the Society's officers and six Elected Directors. In mid-November, the Society will conduct elections by mail for Secretary/Treasurer and two Directors. The Secretary/Treasurer will serve two years from January 1, 1995. The Directors will serve three years from January 1, 1995. According to the Society's Bylaws (V.3) any member for whom the Secretary/Treasurer receives at least five nominations from voting members in writing shall be eligible to run for office. The Secretary/Treasurer must receive such direct nominations by 1 November 1994.
The Nominating Committee (P. Bryant, W. Day) has submitted the following candidates:
Secretary/Treasurer: Dawn Iacobucci (Northwestern University)
Directors: David Banks (Carnegie-Mellon University), Hamparsum Bozdogan (University of Tennessee), Mel Janowitz (University of Massachusetts), Dennis Johnston (M.D. Anderson Cancer Center)
Biographies of these candidates and any nominated by the members will appear in the November Newsletter. Voting will commence in mid- November.
Minutes of the 16 June 1994 meeting of the Board of Directors was sent to all regular members of the society.
F.R. McMorris
Department of Mathematics
University of Louisville
Louisville, KY 40292
frmcmo01@homer.louisville.edu
(502)852-6826
Glenn Milligan's first article in Issue 34 stimulated the response from Boris Mirkin that follows. Anybody else out there who has something to say about classification theory and practice, please feel free to write for the new "Forum".
Again this month we have no room for random conference news. This will resume in Issue 37.
DIMACS
Rutgers University
I appreciate very much the efforts of the Editor toward helping the Newsletter become a place to discuss important subjects in metaclassification. The first article by G. Milligan in Issue 34 on applied classification seems a successful step in that direction. On the other hand, this presentation has highlighted an important, though controversial problem of terminology. By this I mean the use of the term "ideal type" by Milligan in the context where earlier such concepts as "standard point", "center", "centroid", "kernel", and "etalon" were used. In my opinion, when introducing a new term one should consult whether that term has been used in the literature and, if so, reconcile the new meaning of the term with the one that existed. Milligan's article seems to be exactly a case when such a consulting has not been done. The "ideal type" concept has been used in several disciplines rather intensively, as, for example, in sociology where it was introduced by Max Weber in his classical account of the "bureaucratical organization" concept. The meaning of "ideal type" has been defined as an "exaggerated" combination of characteristics existing only theoretically in such a way that any really existing entity can be evaluated by the degree of its closeness to the ideal type. It is something rather different than the "ideal type" concept used by G. Milligan in his presentation which seems to resemble the concept of "prototype" in psychology.
Thus the major question that arises is: Shall we maintain a terminology to recommend to the scientific community, or shall we use various specifications rather randomly without any interest to usage in relevant areas.
More specifically, which pattern of behavior would we like to accept as an "ideal type": that of physicists who continue using their term "atom" even after they found that it was divisible, or that of the computer scientists who change the names of their options rather regularly, leading to sequential series like, say, "terminal/display/console/screen" or "exit to DOS/shell/prompt"?
The Ohio State University
"Issues in Applied Classification: Replication Analysis"
Once a classification has been obtained, the applied researcher is faced with the problem of validating the clustering results. The present article will focus on one approach to this problem called replication analysis. References to replication analysis include McIntyre & Blashfield (1980), and Morey, Blashfield, & Skinner (1983). Replication analysis is similar to the process of cross-validation in multiple regression. The major steps in a replication analysis follow.
(1) Obtain Two Samples of Data. Two samples are required to conduct a replication analysis. One way to accomplish this is to divide a larger data set into two separate samples. Data must exist on the same set of variables in both samples. Of course, the researcher must plan to collect an adequate amount of data ahead of time. Lack of planning is likely to make a proper replication analysis impossible.
(2) Cluster Analyze the First Sample and Determine Cluster Centroids. The first sample is subjected to a complete cluster analysis. That is, decisions regarding variable selection, weighting, standardization, clustering method, and selection of the number of clusters are to be completed. Once clusters have been identified, the centroids of the clusters are computed.
(3) Determine the Distances Between the Second Sample Data Points to the Centroids. The distances between the data points in the second sample to the centroids obtained from the clustering of the first sample are computed.
(4) Assign Data Points from the Second Sample to Their Nearest Centroid. Each element in the second sample is assigned to the nearest centroid determined from the first sample. This produces a clustering of the second sample based on the characteristics of the first sample.
(5) Directly Cluster Analyze the Second Sample. The second sample is subjected to the same type of cluster analysis as used for the first sample. Every feature of the clustering process should be the same. This produces a second clustering of the second sample. Unlike the clustering obtained in (4) above, this second classification is based only on the data characteristics of the second sample.
(6) Compute a Measure of Agreement Between the Two Clusterings. A measure of partition agreement is computed between the clustering based on the nearest neighbor centroid assignments and the direct clustering of the second sample. The Hubert and Arabie (1985) corrected Rand index can serve as the measure of agreement. The level of agreement between the two partition sets reflects the stability of the clustering in the data.
The article by Breckenridge (1989) provides some insight into the effectiveness of replication analysis. In a simulation study, Breckenridge measured both the degree of replication agreement and the actual recovery of the underlying cluster structure. The average recovery for each simulation run provided an approximate upper bound on the maximum replication value to be expected. Breckenridge found that the replication means were close in value to average recovery for each clustering method. These results support the view that replication analysis can be used to help validate the results of a cluster analysis.
Breckenridge, J. N. (1989). Replicating cluster analysis: Method, consistency, and validity. Multivariate Behavioral Research, 24, 147- 161.
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193-218.
McIntyre, R. M., & Blashfield, R. K. (1980). A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225-238.
Morey, L. C., Blashfield, R. K., & Skinner, H. A. (1983). A comparison of cluster analysis techniques within a sequential validation frame work. Multivariate Behavioral Research, 18, 309-329.
Phipps Arabie
Editor, Journal of Classification
"The Obligatory, Compelling Substantive Illustration"
Only a few of the techniques covered in the Journal of Classification have caught on as quickly and universally as nonmetric multidimensional scaling did. One of the reasons for its success was the compelling illustrations Shepard offered (e.g., his 1963 analysis of Morse Code data) while introducing his methodology. Few authors have since provided as good examples. I occasionally receive manuscripts in which the authors' manifest boredom with the data sets analyzed exudes nearly audible sighs if not outright and unseemly snoring. When a statistician (e.g., Silverman, 1982) levels a devastating substantive criticism at an illustrative analysis, even the best of new methodology makes a poor showing.
Of course, not every data set affords the opportunity for a successful analysis and substantive interpretation. Overblown efforts at interpretation (e.g., see criticisms of Levelt, van de Geer, and Plomp, 1966, by Shepard, 1974, pp. 387-388) can be cause for embarrassment. Also, naive students first using such techniques tend to expect that their own results will routinely and effortlessly be as successful as Shepard's (1963).
Part of the problem, of course, stems from supersonic models dependent upon biplane-collected data. Daws (1990) aptly characterized many of us as people "who would rather lose a limb than collect our own data". But we have other recourses to facilitate the availability of good data sets. First, as unpaid consultants to substantively motivated colleagues and students, we should urge, when our techniques are appropriate, that the data to be collected fulfil the necessary requirements for solid analyses (e.g., enough stimuli/products/OTU's to make an analysis worthwhile or to justify the number of dimensions/components/features that substantive considerations would demand). We should caution against such shortcuts as small samples and gratuitously missing data. Second, as reviewers for publications, we should insist that the raw data, if they require only a few pages, be printed in the article where they are first introduced. Making this practice the default rather than the exception would confer major benefits both to our constituents and to those authors who have collected such valuable data sets.
References
DAWS, J. (1990), Review of Data Analysis: The Ins and Outs of Solving Real Problems, Eds. J. Janssen, F. Marcotorchino, and J. M. Proth, Journal of Classification, 7, 143-145.
LEVELT, W. J. M., VAN DE GEER, J. P., and PLOMP, R. (1966), "Triadic Comparisons of Musical Intervals," British Journal of Mathematical and Statistical Psychology, 19, 163-179.
SHEPARD, R. N. (1963), "Analysis of Proximities for the Study of Information Processing in Man," Human Factors, 5, 33-48.
SHEPARD, R. N. (1974), "Representation of Structure in Similarity Data: Problems and Prospects," Psychometrika, 39, 373-421.
SILVERMAN, B. W. (1982), "Discussion," Journal of the Royal Statistical Society, Series A, 145, 307.
Preliminary Announcement
CSNA 94
The 1995 annual meeting of the Classification Society of North America will be held in Denver, Colorado from June 22-June 25, 1995 at the Executive Tower Inn, 14th and Curtis Streets. The meeting is supported by the College of Business, University of Colorado at Denver. A short course is planned for Thursday, June 22. The regular meeting, including the CSNA business meeting, conference banquet and regular paper sessions will be scheduled from Friday morning, June 23 until Sunday noon, June 25, 1995. CSNA meetings are traditionally interdisciplinary and informal, with few (if any) parallel sessions. Abstracts of papers presented are distributed, but no formal proceedings are produced. Speakers often discuss work in progress, and both applications and methodological issues are usually represented. Sessions tentatively planned include: Classification and Clustering in Marketing (P. Green, the Wharton School and J. D.Carroll, Rutgers University, organizers); New Optimization and Neural Network Approaches to Discriminant Analysis, with Applications (Fred Glover, University of Colorado at Boulder, organizer); Neural Networks for Classification (Manavendra Misra, Colorado School of Mines, organizer); Model Selection Methods in Classification and Clustering. Graduate Student Session, in which graduate students will present their research and will meet with mentors who will review the work and make suggestions. The organizers of the meeting are particularly interested in including a wide variety of contributed papers in applications of classification, clustering and related methods as well as methodological issues in the 1995 meetings. Suggestions for topics, symposia, panel discussions or other contributions are solicited, and may be directed to the Program Chair: Peter Bryant, College of Business, University of Colorado at Denver, Campus Box 165, Denver, Colorado 80217-3364 USA, telephone (303)-628-1233, Fax (303)-628-1299, E-mail pbryant@cudnvr.denver.colorado.edu. A formal announcement and call for papers will be issued in early 1995. Abstracts will be due in March 1995.
Rian van Blokland-Vogelesang
SWOV Institute for Road Safety Research
P.O.Box 170
2260 AD Leidschendam.
The Netherlands
Blokland@SWOV.nl
M.L. Abell and P. Braselton, The Mathematical Handbook, London: Academic Press, 1994, #30.00. ISBN 0-12-041536-4.
M. Batty and P. Longley, Fractal Cities: A Geometry of Form and Functions, London: Academic Press, 1994, pp. 448, #35.00. ISBN 0-12- 455570-5.
R.S. Bucy, Lectures on Discrete Time Filtering, New York: Springer- Verlag, 1994, pp. 200, DM 98.00. ISBN 3-540-94198-3.
T. Cox and M. Cox, Multidimensional Scaling, New York, Chapman & Hall, 1994, pp.300, $56.00 (plus 3 1/2" disc). ISBN 0-412-49120-6.
E. Dietrich (Ed.), Thinking Computers & Virtual Persons: Essays on the Intentionality of Computers, London: Academic Press, 1994, pp. 400, #38.00. ISBN 0-12-215495-9.
W. Doise, A. Clemence, and F. Lorenzi-Cioldi, The Quantitative Analysis of Social Representations, Hemel Hempstead, Herts (U.K.): Harvester Wheatsheaf, 1993, pp. 200, $25.00. ISBN 7450-1348-1.
M. Falk, J. Husler, and R.-D. Reiss, Laws of Small Numbers: Extremes and Rare Events, Basel (Switzerland): Birkhauser Verlag, 1994, pp. 320, sFr. 68.00. ISBN 3-7643-5071-7.
J. Faraut and A. Koranyi, Analysis on Symmetric Cones, Oxford (U.K.): Oxford University Press, 1994, pp.400, #30.00. ISBN 0-19-853477-9.
M. Greenacre and Jorg Blasius (Eds.), Correspondence Analysis in the Social Sciences, London: Academic Press, 1994, pp. 352, #45.00. ISBN 0- 12-104570-6.
A. Harvey, Time Series Models (2nd ed.), Hemel Hempstead, Herts (U.K.): Harvester Wheatsheaf, 1993, pp. 240, $26.99. ISBN 7450-1348-1.
G.J. Hooley and M.K. Hussey (Eds.), Quantitative Methods in Marketing, London: Academic Press, 1994, pp. 288, #14.95. ISBN 0-12-355485-3.
D.G. Kleinbaum, Logistic Regression: A Self-Learning Text, Springer- Verlag, 1994, pp. 200, DM 88.-. ISBN 3- 540-941142-8.
J. van Leeuwe (Ed.), Graph-Theoretic Concepts in Computer Science (19th International Workshop, WG '93), Utrecht, The Netherlands, June 16-18, 1993. Proceedings, New York: Springer-Verlag, 1994, pp. 435, Lecture Notes in Computer Science, Vol. 70, DM 80.00. ISBN 3-540-57899-4. D.E. Lilienfield and P. D. Stolley, Foundations of Epidemiology (3rd ed.), Oxford (U.K.): Oxford University Press., 1994, pp. 368, #19.50. ISBN 0- 19-505035-5.
P. Marsden, Sociological Methodology, Oxford (U.K.): Basil Blackwell, 1993, pp. 432, #50.00. ISBN 1-55786-464-0.
W.L. Neuman, Social Research Methods: Qualitative and Quantitative Approaches (2nd ed.), Hemel Hempstead, Herts (U.K.): Allyn & Bacon, 1994, pp. 600, $24.95. ISBN 205-14548-5.
L. Oxley, D.A.R. George, C.J. Roberts, and S. Sayer, Surveys in Econometrics, Oxford (U.K.): Basil Blackwell, 1994, pp.400, #18.99. ISBN 0-631-19065-1.
D.B. Petitti, Meta-Analysis, Decision Analysis, and Cost-Effectiveness Analysis: Methods for Quantitative Synthesis in Medicine, Oxford (U.K.): Oxford University Press, 1994, pp. 288, #32.00. ISBN 0-19-507334-7.
E. Rasmussen, Games and Information: An Introduction to Game Theory (2nd ed.), Oxford (U.K.): Basil Blackwell, 1994, pp.437, #19.99. ISBN 1- 55786-502-7.
A.E. Roth, Bargaining Experiments, Hemel Hempstead, Herts (U.K.): Harvester Wheatsheaf, 1994, pp. 288, $28.95. ISBN 7450-1502-6.
A. Schimmel, The Mystery of Numbers, Oxford (U.K.): Oxford University Press, 1994, pp. 328, #10.99, ISBN 0-19-508919-7.
R.B. Wallace and R.F. Woolston, The Epidemiologic Study of the Elderly, Oxford (U.K.): Oxford University Press, 1992, pp. 398, #60.00, ISBN 0- 19-506120-9.
R.J. Webster, Convexity, Oxford (U.K.): Oxford University Press, 1994, pp. 396, #45.00.