Classification Society of North America Newsletter

February 1995, Issue #38
Michael P. Windham, President
F.R. McMorris, Newsletter Editor

In this issue:

::::::: President's Corner :::::::

Michael P. Windham
Department of Mathematics and Statistics
Utah State University
Logan, UT 84322-3900
windham@math.usu.edu

It is my pleasure to welcome the new officers. Dawn Iacobucci is the new Secretary/Treasurer and Mel Janowitz and Stan Wasserman are the new Directors. Fionn Murtagh has been appointed by the Board as an Editor Director while he is editor of the CSNA Service. I would also like to thank Joseph Kruskal and Fionn Murtagh for their service as Directors and Stan Wasserman for being Secretary/Treasurer.

Now for the big announcement! CSNA is going "online." We believe this move will provide you with better service for less cost as we grow. For example, when an article in the Newsletter refers to a data set, it may be possible for you to obtain the data yourself with the click of a button. The latest versions of classification and clustering software could be made available and, we hope, many other services.

From now on, CSNA Newsletters and the CSNA Service will be available through Internet and this will be the default format. For the time being the Newsletter will be mailed to you as usual. After a time for all of us to adjust, but certainly by the end of this calendar year, the Newsletter will not be mailed except by special request and with a small surcharge. The CSNA Service will be available on disk and in a somewhat crude paper format, both by request and at an additional charge. Reminders of new issues and other important information will be sent to you by e-mail.

TO MAKE SURE YOU ARE PROPERLY INFORMED AND INVOLVED YOU MUST

1. Fill out and return the form "CSNA Network Information" attached to this issue and return it to the Secretary as soon as possible.

2. Get access to the World Wide Web and go to the URL http://www.pitt.edu/~hirtle/csna.html

3. If you did not understand 2, a. don't panic b. read the article "Getting CSNA Online" in this issue.

The simplest and most elegant access will be through the CSNA World-Wide Web home page listed in 2. From there you can read the latest newsletters, search using WAIS the Classification Literature Automated Search Service. It may also be possible to download using ftp or something similar.

Our goal is to improve services and reduce costs, without losing you. So, if you have problems, let me know.

::::::: From the Secretary/Treasurer :::::::

Dawn Iacobucci
Department of Marketing
Kellogg Graduate School of Management
Northwestern University
2001 Sheridan Road
Evanston, IL 60208

Hi everybody, thanks for putting your trust of Secretarial/Treasurer duties in me. Stanley Wasserman and his assistant, Moises Balassiano have been very helpful to me in making this transition as smooth as possible. Moises drove up to Evanston to deliver the CSNA computer and archives and spent the day teaching me the various software to become sec'y/treasurer- literate. (Thank you Moises, you have been a God-send!)

I have been asked to report that the CSNA officers are as follows:

OFFICERS

Michael Windham, President (1 Jan 1996) windham@math.usu.edu
Herman Friedman, Past President (1 Jan 1996) friedman@murray.fordham.edu
Peter Bryant, President Elect (1 Jan 1996) pbryant@cudnvr.denver.colorado.edu
Dawn Iacobucci, Sec/Treas (1 Jan 1997) dawni@nuacvm.acns.nwu.edu

DIRECTORS

Stephen Hirtle (1 Jan 1996) sch@lis.pitt.edu
Stanley Sclove (1 Jan 1996) U37331@UICVM.BITNET
Douglas Carroll (1 Jan 1997) dcarroll@gandalf.rutgers.edu
Larry Hubert (1 Jan 1997) lhubert@psych.uiuc.edu
Mel Janowitz (1 Jan 1998) melj@math.umass.edu
Stanley Wasserman (1 Jan 1998) stanwass@uiuc.edu
Phipps Arabie, Editor Dir., (ex off.) 2018001@rutvm1.rutgers.edu
F.R. McMorris, Editor Dir., (ex off.) frmcmo01@homer.louisville.edu

REPRESENTATIVES TO IFCS COUNCIL

Stephen Hirtle (1 Jan 1996) sch@lis.pitt.edu
Pierre Legendre (1 Jan 1998) legendre@ere.umontreal.ca

IFCS President, Allan Gordon adg@st-andrews.ac.uk

EDITORS

Phipps Arabie Journal of Classification
F. R. McMorris CSNA Newsletter
Fionn Murtagh CSNA Service
Stephen Hirtle Information Coordinator

P.S. Don't forget to check your address mailing label--if the upper right corner does not say "95" (or later), please consider paying your 1995 dues asap. Thanks!

::::::: From the Newsletter Editor :::::::

F.R. McMorris
Department of Mathematics
University of Louisville
Louisville, KY 40292
frmcmo01@homer.louisville.edu
(502)852-6826

In the last issue of the Newsletter I thanked Glenn Milligan for his third and final essay, but I am very happy to say that HEE'S BAAACK! Remember, he is putting his ideas out there for some food for thought. If you agree/disagree - - - write something yourself. Appearing after Glenn's column is Mike's aid to WWW surfing. If I can learn how to do it (and I did), anybody can!

Glenn W. Milligan
The Ohio State University

"Issues in Applied Classification: Variable Standardization"

Variable standardization can be a source of confusion and misunderstanding to individuals new to the practice of classification. Standardization is often viewed as a routine requirement. Some individuals conclude that it is essential when the variables possess differing variances. Otherwise, those variables with the larger variances will have an undue influence on the clustering results. However, the strategy may backfire. For example, a variable may possess enhanced variation because it nicely separates two or more clusters in the data. Routine standardization would serve to diminish the contribution of such a variable. Similarly, a variable with limited variation may provide no cluster separation information. Standardization would serve to enhance the importance of the variable in the analysis.

A different issue involves the form of standardization to be used in the analysis. Many researchers automatically use the traditional z-score formula. However, as documented by Sneath and Sokal (1973), there are many other approaches to variable standardization. In fact, the selection of the traditional z-score may be a poor choice indeed. Milligan and Cooper (1988) considered eight forms of standardization in a large scale simulation study. The only form that was found to perform consistently well involved standardization by range: (X - Min)/(Max - Min).

The problem of variable standardization involves other complexities as well. Several authors have noted that standardization should not be computed globally on a variable (Cormack 1971, Fleiss & Zubin 1969). Rather, standardization should be computed within each cluster separately. Of course, this creates a circularity problem in that standardization cannot be computed until the clusters are known, and the clusters cannot be determined until the standardization is completed.

Alternatives have been proposed to circumvent this dilemma. Gnanadesikan, Harvey, and Kettenring (1994) proposed using the data points with the smallest interpoint distances to estimate the variance of each variable. Logically, such data points should be found in the same cluster. However, the Gnanadesikan et al. approach assumes homogeneity of within-cluster variances on any given variable.

A different approach, based on the use of a multi-stage k-means method to standardize within-cluster variances allows for heterogeneous variances. In the first stage, raw data or globally standardized data are subjected to a k-means analysis. After the first iteration, the tentative cluster memberships can be used to standardize within-cluster. Subsequent iterations can continue to refine the within-cluster variance estimates. Preliminary results are promising, but more work is needed on the strategy. Certainly, this and other developments on the issue of variable standardization deserve further work by the classification community.

Cormack, R. M. (1971). A review of classification. Journal of Royal Statistical Society, Series A, 134, 321-367.

Fleiss, J. L., & Zubin, J. (1969). On the methods and theory of clustering. Multivariate Behavioral Research, 4, 235-250.

Gnanadesikan, R., Harvey, J. W., & Kettenring, J. R. (1994). Mahalanobis metrics for cluster analysis, Sankhya, in press.

Milligan, G. W., & Cooper, M. C. (1988). A study of variable standardization. Journal of Classification, 5, 181-204.

Sneath, P. H. & Sokal, R. R. (1973). Numerical Taxonomy. San Francisco: Freeman.

Getting CSNA Online

Michael P. Windham

Surfin' the net is an amazing experience. If you have not had it, now is as good a time as any to start. The basic idea with the World-Wide Web is that you have a program that begins by finding a document on some machine somewhere in the world. The document has links hidden in to documents on, perhaps, other machines. In the simplest case, these links are indicated by numbers in the document, ask for a number and your program knows how to go out and get the document to which the number refers. The new document possibly has links and you just keep going.

What you need: the ability to logon other machines in the world, that's the minimum.

Of course, the more power you have the more wonderful it gets. But first, the simple way is to log on to the machine: www0.cern.ch

No user name or password is required, all this machine does is run program that allows you to access the World-Wide Web. You will immediately notice three things.

1. At the bottom of your screen is a line where you can interact with the program. In the text above are numbers in brackets, if you enter one of them you will go to another document related to the topic beside the number, and you are off and running. For example, there is a line saying "Places to start exploring[3]." Entering the number 3 at the prompt takes you to another document with some interesting places to go, eventually leading you, perhaps, to "Impressionist paintings at the Lourve."

2. You can find several alternatives to logging on to www0.cern.ch.

3. All this is happening incredibly slowly! After all, you and who knows how many others are logged on to a machine in Switzerland, so you need one of the alternatives.

I will describe just three of them. To get the programs, you will need a program to transfer files from a remote machine to yours.

(i). If you are a Mac or PC user, there are many books available that tell you the whole story and have software to get you started. One such is "Navigating the Internet" by Richard J. Smith and Mark Gibbs, Sams Publishing, ISBN 0-672-30485-6. I mention it only because I bought it and understood it, there are no doubt many just as good.

(ii). A copy of the program you were running on www0.cern.ch can be obtained for many kinds of machines. For example, using ftp you would logon as anonymous on ftp.w3.org and look in /pub/www/bin. It is likely that you will need a machine with direct connection to internet.

(iii). Mosaic is a windows based program that is very nice, free, and available for Mac, PC (with Windows) and many work stations, e.g. Sun. It can be obtained from ftp.ncsa.uiuc.edu. Just get there and start looking around for the machine of your choice. You will probably need some other software to take full advantage, but all the information is there.

One other thing of use to you is the meaning of URL (Universal Resource Locator). A URL is an instruction for telling a network browsing program to obtain a document. For example, if you were logged on to www0.cern.ch and entered the command
"go http://www.pitt.edu/~hirtle/csna.html",
you would have told the program to connect to a WWW server (http) on the machine at the University of Pittsburgh (www.pitt.edu) and obtain the document csna.html in the directory ~hirtle. You would receive the so called CSNA home page, the document that gives you access to CSNA information and services. You might also say "ftp://ftp.ncsa.uiuc.edu/", which would start the program ftp, if you have it, to login as anonymous on the machine ftp.ncsa.uiuc.edu. You could then obtain files from this archive site. So, you travel the net by picking numbers or clicking with your mouse in documents you already have, or with a URL, tell your browser exactly where to go.

There are no doubt many other ways to get started, and I may not have told you what you really need to know. All I can tell you is to give it a try. I really did not know much of anything and understood even less. I did know how to use ftp to transfer files. I used it to get a copy of mosaic, this mystical program I had heard of. I started it up and found out that I had done everything I needed to do. Mosaic explained itself, the WWW and everything else.

::::::: Meeting Announcement :::::::

Preliminary Announcement and Call for Papers
CSNA 95

The 1995 annual meeting of the Classification Society of North America will be held in Denver, Colorado from June 22-June 25, 1995 at the Executive Tower Inn, 14th and Curtis Streets. The meeting is supported by the College of Business, University of Colorado at Denver. A short course is planned for Thursday, June 22. The regular meeting, including the CSNA business meeting, conference banquet and regular paper sessions will be scheduled from Friday morning, June 23 until Sunday noon, June 25, 1995.

CSNA meetings are traditionally interdisciplinary and informal, with few (if any) parallel sessions. Abstracts of papers presented are distributed, but no formal proceedings are produced. Speakers are encouraged to present work in progress. CONTRIBUTED PAPERS RELATED TO CLASSIFICATION AND CLUSTERING FROM THE PERSPECTIVES OF STATISTICS, BIOLOGY, THE PHYSICAL SCIENCES, BUSINESS, LIBRARY SCIENCE, COMPUTER SCIENCE, PSYCHOLOGY, AND OTHER FIELDS ARE WELCOME AND ARE SOLICITED. We are particularly interested in including a wide variety of contributed papers concerned with applications of classification and clustering as well as methodological issues. In addition, the following invited session are planned:

Classification and Clustering in Marketing (P. Green, the Wharton School and J. D.Carroll, Rutgers University, organizers); New Optimization and Neural Network Approaches to Discriminant Analysis, with Applications (Fred Glover, University of Colorado at Boulder, organizer); Neural Networks for Classification (Manavendra Misra, Colorado School of Mines, organizer); Model Selection Methods in Classification and Clustering (Hamparsum Bozdogan, University of Tennessee, organizer); Authorship Attribution (David Banks, Carnegie Mellon University, organizer). A Graduate Student Session, in which graduate students will present their research and will meet with mentors who will review and discuss the work will also be offered.

Abstracts of papers to be considered for presentation at the meetings should be submitted as soon as possible, so as to arrive no later than March 15, 1995. Please limit abstracts to one page or less, and include appropriate keywords indicating the topics the paper addresses. We encourage you to submit abstracts via electronic mail. Abstracts submitted electronically may be in LaTex format or in unformatted text, and should be submitted to the email address listed below. Abstracts submitted in text format should avoid the use of formulae if at all possible. Written abstracts may be mailed to the program chair at the address below, or sent via FAX. Authors will be notified of acceptance of abstracts in late March or early April. If your abstract is intended for the graduate student session, please indicate this clearly.

It will help the organizers plan appropriate conference facilities if those intending to attend the meeting and/or submit abstracts will inform the committee (preferably via e-mail) of those intentions as soon as possible, so that appropriate facilities can be arranged. Further information may be obtained from the Program Chair. Detailed registration and hotel information will be distributed in late March or early April.

All inquiries, abstracts, etc. should be directed to:

Peter Bryant, CSNA-95
College of Business
University of Colorado at Denver
Campus Box 165
Denver, Colorado 80217-3364 USA
Telephone (303)-556-5833
Fax (303)-556-5899
e-mail csna95@castle.cudenver.edu

Random Conference News

* MARCH 28-30, 1995: DCC'95: Data Compression Conference. Snowbird, UT. Information: Myrna Fox; phone: (617)736-2700, e-mail: maf@cs.brandeis.edu.

* APRIL 9-13, 1995: High Performance Computing '95. Phoenix, AZ. Information: Adrian Tentner, High Performance Computing '95, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439. e-mail: tentner@pepper.ra.anl.gov.

* MAY 17-19, 1995: International Conference on Visualization of Categorical Data, Cologne, Germany. Papers are welcome which bridge the theory of visualization techniques and their interpretation in social science applications, focusing on methodological aspects as well as empirical studies. Methods to be included at the conference are: correspondence analysis, homogeneity analysis, loglinear and association models, latent class analysis, multidimensional scaling, biplot, cluster analysis, ideal point discriminant analysis, CHAID, formal concept analysis, and graphical models. Information: Jorg Blasius, Universitat zu Koln, Bachemer Str. 40, D-50931 Koln. e-mail: blasius@ibm.za.uni-koeln.de

* MAY 25-26, 1995: Seventeenth Symposium on Mathematical Programming with Data Perturbations. George Washington University, Washington, D.C. This symposium is designed to bring together practitioners who use mathematical programming optimization models and deal with questions of sensitivity analysis with researchers who are developing techniques applicable to these problems. Information: A.V. Fiacco, Department of Operations Research and the Institute for Management Science and Engineering, School of Engineering and Applied Science, The George Washington University, Washington, D.C. 20052. phone: 202-994-7511.

* JUNE 7-10, 1995: Fifth International Conference of the International Society for Scientometrics ad Informetrics. Rosary College Graduate School of Library and Information Science, River Forest, Illinois. The scope of the conference can be broadly defined as those topics which treat in quantitative fashion the creation, flow, dissemination, and use of scholarly or substantive information. Information: M. Koenig, Dean, Graduate School of Library and Information Science, Rosary College, River Forest, IL 60305. phone: 708-524-6849, Fax: 708-524-6657, e-mail: roskoenigm@crf.cuis.edu.

* JUNE 20-23, 1995: International Conference on Ordinal and Symbolic Data Analysis Organized by: INRIA Rocquencourt and TELECOM Paris. This conference is in the tradition of conferences previously held in Antibes (1990), and in Paris (1992) on Numeric and symbolic data analysis, and in Darmstadt (1992) and in Amherst (1993) on Ordinal data analysis. The ordinal and symbolic approaches to data analysis have been successfully developed during these years via stimulating workshops and conferences for experts in the field. Both approaches have particularly emphasized the intentional background of data which is necessary for validation, interpretation and communication.

Since many observed or experimental data sets are of symbolic or ordinal nature, it has become increasingly clear that ordinal and symbolic data analysis has applications in a large number of areas including medicine, biology, social sciences, economics, agronomy, data retrieval, information sciences. etc. Information: INRIA Rocquencourt - Conference Secretariat: Claudie Thenault, Relations Exterieures, Domain de Voluceau, BP 105 - 78135 Le Chesnay Cedex, France. Tel: 33 (1) 39 63 56 75, Fax: 33 (1) 39 63 56 38, e-mail: symposia@inria.fr

* JULY 4-7, 1995: 9th European Meeting of the Psychometric Society, Leiden University, Leiden, The Netherlands. Topics include (but are not limited to) Categorical Data Analysis, Classical Test Theory, Classification, Clustering, Correspondence Analysis, Exploratory Data Analysis, Factor Analysis, Graphical Models, Item response Theory, (Generalized) Linear Models, Longitudinal Data Analysis, Multidimensional Scaling, Multivariate Analysis, Optimal Scaling, Statistical Methods, Structural Equations Models, Variance Components Analysis.

The meeting will be held at Leiden University. Leiden is located in the West of the Netherlands, a short distance from Amsterdam, The Hague and Rotterdam. It is easily reached by air from Amsterdam's Schiphol International Airport, as well as by train. Information on submission of papers or proposals for symposia: Willem J. Heiser, Chair of the Scientific Program Committee, Department of Data Theory, Faculty of Social Sciences, Wassenaarseweg 52, P.O. Box 9555, 2300 RB Leiden, The Netherlands. General information: Jacqueline J. Meulman, Chair of the Local Organizing Committee, Department of Data Theory, Faculty of Social Sciences, Wassenaarseweg 52, P.O. Box 9555, 2300 RB Leiden, The Netherlands, or, Susanna Verdel, Conference Secretary. phone: +31 71 273829, FAX: +31 71 273865, e-mail: psleiden@rulfsw.leidenuniv.nl

* SEPTEMBER 22, 1995: Statistical Symposium on Bootstrap, Discrimination and Regression, Paris, France. This symposium will feature Professor B. Efron, Stanford University, who will give talks on the Bootstrap as well as being a discussant for the complementary lectures. Information: C.I.S.I.A. Secretariat Symposium, 1 avenue Herbillon, 94160 Saint-Mande, France. Phone: (33-1) 43 74 20 20, Fax: (33-1) 43 74 17 29.

* SEPTEMBER 25-27, 1995: Australasian Biometrics Conference, Coolangatta (Gold Coast, Queensland), Australia. This conference will have a day on agricultural statistics (particularly field variety trials), a day on medical and health statistics, and half-days on the practical application of Markov Chain Monte Carlo and environmental statistics. Information: Kaye Basford, Department of Agriculture, The University of Queensland, Brisbane Qld 4072, Australia. Phone 61-7-3652810, Fax 61-7- 3651177, e-mail k.e.basford@mailbox.uq.oz.au

::::::: Bookshelf :::::::

Rian van Blokland-Vogelesang
SWOV Institute for Road Safety Research
P.O.Box 170
2260 AD Leidschendam
The Netherlands
Blokland@SWOV.nl

R.D. Cook and S. Weisberg, An Introduction to Regression Graphics, New York: Wiley, Wiley Series in Statistics and Applied Probability, 1994, pp. 280, $62.95 (book/disk), ISBN 0471-95016-5.

P.J. Diggle, K.-Y. Liang, and S.L. Zeger, The Analysis of Longitudinal Data, Oxford (U.K.): Clarendon Press, pp. 272, 1994, #30.00 (hbk). ISBN 0-19-852284-3.

D.B. Ellis, Becoming a Master Student, (7th ed.), Boston: Houghton Mifflin, 1994, pp. 384, $19.95 (pbk), ISBN 0-395-69293-8.

B. Francis, M. Green, C. Payne, The GLIM System: Release 4 Manual, Oxford (U.K.): Clarendon Press, pp. 836, 1993, #55.00 (pbk). ISBN 0-19-852231-2.

Genstat Committee, Rothamsted Experimental Station, chaired by R.W. Payne, Genstat 5 Release 3, Reference Manual, Oxford (U.K.): Clarendon Press, pp. 812, 1993, #45.00 (pbk). ISBN 0-19-852312-2.

R. Gibbons, Statistical Methods for Groundwater Monitoring, New York: Wiley, 1994, pp. 320, $80.50 (hbk), ISBN 0471-58707-9.

D.R. Hill and D.E. Zitarelli, Linear Algebra Labs with Matlab, Hemel Hempstead, Herts (U.K.): Prentice Hall/Macmillan, 1994, pp. 350, #16.95, ISBN 02-354811-8.

A. Hoenig, Applied Finite Mathematics (2nd ed.), Boston: Houghton Mifflin, 1994, pp. 740, $34.95 (pbk), ISBN 0-395-63778-3.

R.I. Jennrich, An Introduction to Computational Statistics: Regression Analysis, Hemel Hempstead, Herts (U.K.): Prentice Hall, 1994, pp. 432, #16.95, ISBN 13-454810-8.

N.L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, Vol. 1 (2nd ed.), New York: Wiley, Wiley Series in Probability and Mathematical Statistics, 1994, pp. 544, $93.50 (hbk), ISBN 0471-58495-9.

I.T. Jolliffe and B. Jones, Statistical Inference, Hemel Hempstead, Herts (U.K.): Ellis Horwood, 1994, pp. 200, #42.50 (hbk), ISBN 13-847260-2.

D.W. Jordan and P. Smith, Mathematical Techniques: An Introduction for the Engineering, Physical, and Mathematical Sciences, Oxford (U.K.): Oxford University Press, 1994, pp. 672, #14.95 (pbk, ISBN 0-19-856267-5), #40.00 (hbk, ISBN 0-19-856268-3).

N.T. Longford, Random Coefficient Models, Oxford (U.K.): Clarendon Press, pp. 284, 1994, #30.00 (pbk). ISBN 0-19-852264-9.

J. Lorriman and T. Kenjo, Japan's Winning Margins: Management, Training, and Education, Oxford (U.K.): Oxford University Press, 1994, pp. 232, #15.00 (hbk), ISBN 0-19-856374-4.

E. Marubini and M. Valsecchi, Survival Analysis in Biomedicine, New York: Wiley, 1994, pp. 250, $72.00 (hbk), ISBN 0471-93987-0.

I. Olkin, L. Gleser, and C. Derman, Probability Models and Applications (2nd ed.), Hemel Hempstead, Herts (U.K.): Prentice Hall/Macmillan, 1994, pp. 576, #16.95, ISBN 02-389220-X.

Sir Roger Penrose, Shadows of the Mind: A Search for the Missing Science of Consciousness, Oxford (U.K.): Oxford University Press, 1994, pp. 320, #16.99 (hbk), ISBN 0-19-853978-9.

V. Seshadri, The Inverse Gaussian Distribution: A Case Study in Exponential Families, Oxford (U.K.): Clarendon Press, pp. 268, 1994, #40.00 (pbk). ISBN 0-19-852243-6.

K. Sydstaeter and P. Hammond, Mathematical Analysis for Economists, Hemel Hempstead, Herts (U.K.): Prentice Hall, 1994, pp. 800, $38.95 (pbk), ISBN 13-112160-X.

T. Waldron, Counting the Dead: The Epidemiology of Skeletal Populations, New York: Wiley, 1994, pp. 80, $19.00, ISBN 0471-95138-2.

E.W. Williams, The CD-ROM and Optical Disc Recording Systems, Oxford (U.K.): Oxford University Press, 1994, pp. 176, #25.00 (hbk), ISBN 0-19- 859373-2.


Click here to return to top of the newsletter.
The WWW version of the CSNA newsletter is made available as a service of the Classification Society of North America. For further information on becoming a member of CSNA, please contact the CSNA Business Manager.