Classification Society of North America Newsletter

November 1994, Issue #37
Michael P. Windham, President
F.R. McMorris, Newsletter Editor

In this issue:

::::::: President's Corner :::::::

Michael P. Windham
Department of Mathematics and Statistics Utah State University
Logan, UT 84322
windham@math.usu.edu

I have three things that need your attention.

1. Please VOTE. We have a small Society, so your voice is heard, but not if you do not speak. We have a fine selection of candidates, representing an interesting spectrum of interests in classification and clustering.

2. The response on the question of putting the Newsletter and the Service on line has at last generated some opposition. We have members for whom on-line only service would be inconvenient, if not detrimental to their participation in the Society's activities. So, it is clear that we will maintain, for the time being at least, hardcopies available to those that want them. Thanks for your responses. If you want to see the alternative and have World Wide Web access, try the CSNA Home Page prepared by Stephen Hirtle http://www.pitt.edu/~hirtle/csna.html

3. The IFCS is considering publishing a bibliography on classification. Allan Gordon, IFCS President, has asked me to ask you if you might be interested in buying such a volume. The format is not finalized, but it is envisioned as "a set of chapters, each written by a subject expert and providing an overview and annotated bibliography and related areas of data analysis." More information should be available in the IFCS Newsletter attached to this issue. So, my question to you is would you be interested in buying such a volume for a price of about DM 80?

Please note that the ballots are to be returned to me. You may if you wish, make any comments you have on items 2 or 3 on, or along with, the ballot.

::::::: From the Business Manager :::::::


Stanley Wasserman
Department of Psychology
University of Illinois
603 E. Daniel St.
Champaign, IL 61820
csna@psych.uiuc.edu

1995 DUES

It is time to pay CSNA dues for 1995. Take a look at the address label attached to the outside of this newsletter (or, for international members, on the envelope in which this newsletter was mailed). If the date in the upper right-hand corner of the label is not 1995 or later, you have not paid your dues for 1995.

Dues rates have gone up slightly (as discussed at length in CSNA Newsletter #36). Please use the form attached to the rear of this newsletter to pay your dues, sending either a check in US$ drawn on a US bank, or a VISA or MasterCard number to the business office. Thanks!!

CSNA ELECTIONS

Regular members who paid dues for 1994 are eligible to vote in the Society's 1994 Annual Elections. A ballot is mailed with this newsletter to eligible members (attached at the end). Please mark you ballot and return it to the President by 1 January 1995. If you have not renewed your membership for 1995, you may include a membership renewal and the ballot in the same envelope.

Election of a SECRETARY/TREASURER:

The term of the present Secretary/Treasurer, Stanley Wasserman, expires on 31 December 1994. The nominating committee is pleased to propose a single candidate, Dawn Iacobucci, for Secretary/Treasurer of CSNA, for a two-year term beginning 1 January 1995.

A biosketch of the candidate follows:

Dawn Iacobucci

Dawn Iacobucci is an Associate Professor of Marketing at the Kellogg Graduate School of Management, Northwestern University. She joined Kellogg in 1987 after receiving her M.S. in Statistics, and M.A. and Ph.D. in Quantitative Psychology from the University of Illinois at Urbana-Champaign. Her two research streams focus on the modeling of dyadic interactions and social networks, and the conceptualization and measurement of customer satisfaction and service quality. She has published in a variety of journals including the Journal of Consumer Psychology, the Journal of Marketing, the Journal of Marketing Research, Psychometrika, Psychological Bulletin, and Social Networks.

Prof. Iacobucci teaches Marketing Models, Services Marketing, and Customer Satisfaction to MBA students and executives, and Multivariate Statistics and Methodological Special Topics Seminars to Ph.D. students. She was a visiting professor at the Stockholm School of Economics and the University of Uppsala in Sweden during 1992, and worked at Leo Burnett in the summer of 1986. She has performed various consulting duties for First Chicago, Hewlett-Packard, and Yamaha USA.

Election of Two ELECTED DIRECTORS:

The terms of two members of the Board of Directors, Joseph Kruskal and Fionn Murtagh, expire on 31 December 1994. The nominating committee and the membership is pleased to propose five candidates to fill these two positions on the board for three-year terms beginning 1 January 1995.

Election of members of the board will be conducted using the Hare system (as described on the ballot).

CANDIDATES FOR THE BOARD OF DIRECTORS (in alphabetical order):

David Banks

David Banks received a B.A. degree in anthropology from the University of Virginia in 1977, M.S. degrees in statistics and applied mathematics from Virginia Polytechnic Institute in 1980 and 1982, and a Ph.D. in Statistics from Virginia Polytechnic Institute in 1984. He then spent two years as a National Science Foundation postdoctoral research fellow at the Department of Statistics at the University of California at Berkeley, and one year as visiting assistant lecturer at the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge. He joined the faculty at Carnegie Mellon University in 1987, and is now an associate professor in their Department of Statistics. His research interests include nonparametric computer-intensive data analysis, complex multivariate data, social statistics, human rights data, statistical inference on geometries, and inference for graph-valued random objects. He has published in the Journal of the American Statistical Association, Biometrika, and the Journal of Classification.

Hamparsum Bozdogan

Hamparsum Bozdogan, is an Associate Professor of Statistics & Courtesy Associate Professor of Mathematics at the University of Tennessee in Knoxville, Tennessee. He has been a member of CSNA since 1981, and he was the Chair of the Organizing Committee of the Second International Federation of Classification Societies (IFCS-89) at the University of Virginia in Charlottesville. He is one of the nationally and internationally renowned experts in the area of informational statistical modeling. In particular, he has developed unique measures of informational complexity in statistics for model selection and validation, and extended them in an extensive research program to a wide variety of applications including in multisample cluster analysis, in mixture-model cluster analysis, ideal point discriminant analysis, Bayesian and Non-Bayesian factor analysis.

Melvin Janowitz

Melvin F. Janowitz, Professor, Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA. Ph.D. in Mathematics, Wayne State University, 1963. Member of the Society since 1979. Served on Board of Directors 1991-1993. Chairman of membership committee 1991-1993. Current research interests include mathematical modelling of cluster analysis and consensus methods, properties of clustering algorithms, statistics for evaluating the validity of cluster outputs, connections between standard and fuzzy cluster algorithms, development of a theory of dissimilarity coefficients, generalizations of pyramids, clustering based on interval orders and semiorders, use of dissimilarity measures whose values are cumulative distribution functions as an aid for clustering objects that are themselves groups of other entities (for example, clustering at the species level).

Dennis Johnston

Dennis A. Johnston, Ph.D. is an Associate Professor of Biomathematics and Chief, Section of Bioengineering, Department of Biomathematics, University of Texas M.D. Anderson Cancer Center where he has been employed since 1972. He has been a member of CSNA since 1987. He was host and Co-Chair of the program committee of CSNA94/NT26 held in Houston in June, 1994. His interests in CSNA are the result of his involvement in the application of classification and clustering to radiographic diagnosis, pathologic diagnosis, and chromosome classification.

Stanley Wasserman

Stanley Wasserman (Ph.D. Harvard 1977) is Professor of Psychology, Statistics, and Sociology, and Professor, Beckman Institute for Advanced Science and Technology, at the University of Illinois. He has had appointments at Carnegie-Mellon University and the University of Minnesota. He is currently an associate editor of PSYCHOMETRIKA, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, and JOURNAL OF QUANTITATIVE ANTHROPOLOGY. He is also the Book Review Editor of CHANCE. His research is on applied statistics (in particular categorical data analysis, classification and multivariate methods) and social networks (he is the co-author with Katherine Faust of SOCIAL NETWORK ANALYSIS: METHODS AND APPLICATIONS). He has published articles in many sociology, psychology, and statistics journals.

He is a Fellow of the American Statistical Association, and has been a Visiting Professor at a variety of institutions. A member of CSNA since the mid-1980's, he presented invited talks at the CSNA 1990 and 1992 Annual Meetings. He has experience in the operation of CSNA and in "financial matters" -- he was the Secretary/Treasurer of CSNA from 1992 to 1995.

::::::: From the Newsletter Editor :::::::

F.R. McMorris
Department of Mathematics
University of Louisville
Louisville, KY 40292
frmcmo01@homer.louisville.edu
(502)852-6826

In this issue we have Glenn Milligan's third and final contribution on "Issues in Applied Classification". Thanks Glenn! Another feature is by Bill Day on how to access the mammoth bibliography on sequence analysis that he has compiled. This is a very impressive and anyone interested in computational biology should be interested in browsing it.

Glenn W. Milligan
The Ohio State University

"Issues in Applied Classification: Selection of Variables to Cluster"

When clustering objects, the variables used in an applied classification are selected in the early stages of the analysis. Care must be exercised in selection of the variables. Unfortunately, far too many studies are conducted by including every available variable. Only those variables that are believed to help discriminate the clustering in the data should be included in the analysis. That is, variables that do not help discriminate must be excluded.

The addition of only one or two irrelevant variables can dramatically interfere with the recovery of the underlying clusters. This effect was demonstrated by Milligan (1980). In the study, "masking variables" were added to data sets based on 4, 6, or 8 core dimensions. The core dimensions defined a strong clustering in the data. As each masking variable was added to the data set, recovery of the underlying clusters deteriorated rapidly for a wide variety of clustering methods. Recovery declined even though the original core variables were still present as each masking variable was added. The implications for applied analyses are clear. The inclusion of just one irrelevant variable may serve to mask the underlying clustering in the data, or outright obliterate it.

There has been some research in the problem of variable selection. For example, if Euclidean distances are used with a hierarchical clustering method, then the optimal variable weighting method of De Soete (1988) may offer protection against masking variables. The derivation and computation of De Soete's weights are complex and beyond the scope of this column. Although De Soete's procedure was not designed to detect masking variables, Milligan (1989) found evidence that the weighting system was effective at dealing with the problem. Recovery of the underlying cluster structure was greatly enhanced with the use of De Soete's weights when as many as three masking variables were added to the core cluster dimensions. The procedure exhibited a tendency to assign near zero weights to masking variables, thus nullifying their contribution in the distance formula.

Other proposals for variable weighting have been developed. For example, DeSarbo, Carroll, and Green (1984) proposed a k-means clustering method called SYNCLUS. There is some evidence that indicates that the starting configuration used for the k-means method can be an important factor. Other approaches in dealing with the masking problem do not attempt to provide differential weighting of variables. The method of Fowlkes, Gnanadesikan, and Kettenring (1988) attempts to include or exclude variables in a manner analogous to a forward selection procedure in a regression analysis.

Gnanadesikan, Kettenring, and Tsao (1994) have reported some preliminary trials using De Soete's optimal weighting method, SYNCLUS, the forward selection procedure, and several others. Unfortunately, the authors concluded that "worry-free approaches do not yet exist." Further research on these algorithms as well as other approaches to variable weighting or selection would make a worthwhile contribution to the field of classification.

DeSarbo, W. S., Carroll, J. D., & Green, P. E. (1984). Synthesized clustering: A method for amalgamating alternative clustering bases with different weighting of variables. Psychometrika, 49, 57-78.

De Soete, G., (1988). OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting. Journal of Classification, 5, 101-104.

Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205-228.

Gnanadesikan, R., Kettenring, J. R., & Tsao, S. L. (1994). Weighting and Selection of Variables for Cluster Analysis, Unpublished manuscript, Bellcore, Morristown, NJ.

Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325-342.

Milligan, G. W. (1989). A validation study of a variable weighting algorithm for cluster analysis. Journal of Classification, 6, 53-71.

::::::: Meeting Announcements :::::::

Preliminary Announcement
CSNA 95

The 1995 annual meeting of the Classification Society of North America will be held in Denver, Colorado from June 22-June 25, 1995 at the Executive Tower Inn, 14th and Curtis Streets. The meeting is supported by the College of Business, University of Colorado at Denver. A short course is planned for Thursday, June 22. The regular meeting, including the CSNA business meeting, conference banquet and regular paper sessions will be scheduled from Friday morning, June 23 until Sunday noon, June 25, 1995.

CSNA meetings are traditionally interdisciplinary and informal, with few (if any) parallel sessions. Abstracts of papers presented are distributed, but no formal proceedings are produced. Speakers often discuss work in progress, and both applications and methodological issues are usually represented.

Sessions tentatively planned include: Classification and Clustering in Marketing (P. Green, the Wharton School and J. D.Carroll, Rutgers University, organizers); New Optimization and Neural Network Approaches to Discriminant Analysis, with Applications (Fred Glover, University of Colorado at Boulder, organizer); Neural Networks for Classification (Manavendra Misra, Colorado School of Mines, organizer); Model Selection Methods in Classification and Clustering. Graduate Student Session, in which graduate students will present their research and will meet with mentors who will review the work and make suggestions.

The organizers of the meeting are particularly interested in including a wide variety of contributed papers in applications of classification, clustering and related methods as well as methodological issues in the 1995 meetings. Suggestions for topics, symposia, panel discussions or other contributions are solicited, and may be directed to the Program Chair: Peter Bryant, College of Business, University of Colorado at Denver, Campus Box 165, Denver, Colorado 80217-3364 USA, telephone (303)-628-1233, Fax (303)-628-1299, E-mail pbryant@cudnvr.denver.colorado.edu. A formal announcement and call for papers will be issued in early 1995. Abstracts will be due in March 1995.

Random Conference News

* MAY 17-19, 1995: International Conference on Visualization of Categorical Data, Cologne, Germany. Papers are welcome which bridge the theory of visualization techniques and their interpretation in social science applications, focusing on methodological aspects as well as empirical studies. Methods to be included at the conference are: correspondence analysis, homogeneity analysis, loglinear and association models, latent class analysis, multidimensional scaling, biplot, cluster analysis, ideal point discriminant analysis, CHAID, formal concept analysis, and graphical models. Information: Jorg Blasius, Universitat zu Koln, Bachemer Str. 40, D-50931 Koln. e-mail: blasius@ibm.za.uni-koeln.de

* JUNE 20-23, 1995: International Conference on Ordinal and Symbolic Data Analysis Organized by: INRIA Rocquencourt and TELECOM Paris. This conference is in the tradition of conferences previously held in Antibes (1990), and in Paris (1992) on Numeric and symbolic data analysis, and in Darmstadt (1992) and in Amherst (1993) on Ordinal data analysis. The ordinal and symbolic approaches to data analysis have been successfully developed during these years via stimulating workshops and conferences for experts in the field. Both approaches have particularly emphasized the intentional background of data which is necessary for validation, interpretation and communication.

Since many observed or experimental data sets are of symbolic or ordinal nature, it has become increasingly clear that ordinal and symbolic data analysis has applications in a large number of areas including medicine, biology, social sciences, economics, agronomy, data retrieval, information sciences. etc. Submission Deadline: 30 November, 1994.

A two page abstract on one or more of the following topics is requested: Methods for sequential, spatial, textual and more generally numeric-symbolic, structural and ordinal data. Measurement theory, Proximities, distances and ultrametrics, Geometrical representation. Concept analysis, Aggregation and fusion. Ordinal structures (Graph models, lattices, hierarchies, pyramids, etc.). Consensus. Feature reduction and extraction. Knowledge and rule discovery from data. Philosophical foundation of class, categories, concepts, semiotic and cognitive aspects. Computer software.

Information can be obtained from INRIA Rocquencourt - Conference Secretariat: Claudie Thenault/Relations Exterieures, Domain de Voluceau, BP 105 - 78135 Le Chesnay Cedex, France. Tel: 33 (1) 39 63 56 75, Fax: 33 (1) 39 63 56 38, E-mail: symposia@inria.fr

* JULY 4-7, 1995: 9th European Meeting of the Psychometric Society, Leiden University, Leiden, The Netherlands. Topics include (but are not limited to) Categorical Data Analysis, Classical Test Theory, Classification, Clustering, Correspondence Analysis, Exploratory Data Analysis, Factor Analysis, Graphical Models, Item response Theory, (Generalized) Linear Models, Longitudinal Data Analysis, Multidimensional Scaling, Multivariate Analysis, Optimal Scaling, Statistical Methods, Structural Equations Models, Variance Components Analysis.

The meeting will be held at Leiden University. Leiden is located in the West of the Netherlands, a short distance from Amsterdam, The Hague and Rotterdam. It is easily reached by air from Amsterdam's Schiphol International Airport, as well as by train.

Information on submission of papers or proposals for symposia: Willem J. Heiser, Chair of the Scientific Program Committee, Department of Data Theory, Faculty of Social Sciences, Wassenaarseweg 52, P.O. Box 9555, 2300 RB Leiden, The Netherlands.

General information: Jacqueline J. Meulman, Chair of the Local Organizing Committee, Department of Data Theory, Faculty of Social Sciences, Wassenaarseweg 52, P.O. Box 9555, 2300 RB Leiden, The Netherlands, or, Susanna Verdel, Conference Secretary. phone: +31 71 273829, FAX: +31 71 273865, email: psleiden@rulfsw.leidenuniv.nl

* SEPTEMBER 25-27, 1995: Australasian Biometrics Conference, Coolangatta (Gold Coast, Queensland), Australia. This conference will have a day on agricultural statistics (particularly field variety trials), a day on medical and health statistics, and half-days on the practical application of Markov Chain Monte Carlo and environmental statistics.

Information: Kaye Basford, Department of Agriculture, The University of Queensland, Brisbane Qld 4072, Australia. Phone 61-7-3652810, Fax 61-7-3651177, E-mail k.e.basford@mailbox.uq.oz.au

::::::: Bookshelf :::::::

Rian van Blokland-Vogelesang
SWOV Institute for Road Safety Research
P.O.Box 170
2260 AD Leidschendam.
The Netherlands
Blokland@SWOV.nl

(This is a partial list. The rest will appear next issue. ---editor)

K D. Bailey, Typologies and Taxonomies: An Introduction to Classification Techniques, London: Sage, Series: Quantitative Applications in the Social Sciences, No. 102, 1994, pp. ???, $9.50. ISBN 0-8039-5259-7.

G. Betteley, D. Wilson, N. Mettrick, and E. Sweeney, Using Statistics in Industry: Quality Improvement Through Total Process Control, Hemel Hempstead, Herts (U.K.): Prentice Hall, 1994, pp. 350, $36.00 (pbk), ISBN 13-457862-7.

G. Box, G. Jenkins, and G. Reinsel, Time Series Analysis (3rd ed.), Hemel Hempstead, Herts (U.K.): Prentice Hall, 1994, pp. 592, #79.75 (hbk), ISBN 13-060774-6.

D.N. Burghes, Applying Mathematics: Case Studies in Mathematical Modelling, Hemel Hempstead, Herts (U.K.): Ellis Horwood, 1994, pp. 200, $25.50, ISBN 13-290826-3.

R.T. Clarke, Statistical Modelling in Hydrology, New York: Wiley, 1994, pp. 440, $72.00 (hbk), ISBN 0471-95016-5.

::::::: Forum :::::::

(Material for this section was not available in HTML format. ---sch)
The WWW version of the CSNA newsletter has been prepared and formatted by Stephen Hirtle (sch@lis.pitt.edu) and is made available as a service of the Classification Society of North America. For further information on becoming a member of CSNA, please contact the Business Manager: Stanley Wasserman, Department of Psychology, University of Illinois, 603 E. Daniel St., Champaign, IL 61820, USA, csna@psych.uiuc.edu.