Classification Society of North America Newsletter

April 1996, Issue #44
Peter Bryant, President
F.R. McMorris, Newsletter Editor

In this issue:

::::::: President's Corner :::::::

Peter Bryant
College of Business
University of Colorado at Denver
Denver, CO 80217-3364
pbryant@castle.cudenver.edu
303-556-5833

IFCS-96 in Kobe is just completed. Our hosts and colleagues in the Japanese Classification Society, Local Arangements, and Program Committees set a high standard for those of us who follow them. The program included some 200 papers and presentations, as well as a fine reception and memorable banquet.

I had thought of summarizing the research presented there for this column, but the sheer diversity of the topics, styles, application areas, and emphases makes that impractical. Perhaps that was the main message: the multiple languages, interpretations, emphases, presentation and mathematical styles and level of detail presented were occasionally frustrating. Papers sometimes turned out to be about something other than what I anticipated. Yet that diversity is a mark of the vitality of the field, too, and part of its continuing interest for me. Many people with many perspectives and interests talk about many things under the general umbrella of "classification." I hope it stays that way.

I thought the conference offered a nice variety of applications, but mostly I was struck by the emphasis on validation and interpretation of clusters and classes rather than on computational or algorithmic aspects of the field. Allan Gordon's plenary talk set the stage for many of these papers, I thought, particularly his emphasis on the importance of null models, deviations from which are to be considered important or significant or in some sense valid. But by null models some of us mean the hypothesis that the labels on the tree have been randomly permuted, while others mean the whole tree (or its links) have somehow been selected at random. Even those who seek the equivalent a classical statistical test have differences: some suggest the null hypothesis is that there is one cluster; others, that each of the n data points is its own cluster. Some have an external "truth" to compare their results to, others must seek internal measures. Each of these null models seems plausible in its own context and hard to support in other contexts. The problems we deal with seem to continue unchanged, or perhaps they evolve. What I thought might have begun to change is that we are getting a clearer view of just how many different things we mean, and how important it is that we be specific about them. The fundamental questions of "what is a cluster?" and "why do we care?" still drive us.

This is the first CSNA Newsletter to be distributed solely by electronic means, so if you are reading this, you probably are doing so electronically or you requested us to send you a hard copy (and paid the small surcharge for that). In either case, we want to know how well this system is working, so please let us know of your experiences with it I hope to see many of you in Amherst in June.

Return to top of newsletter

::::::: From the Newsletter Editor :::::::

F.R. McMorris
Department of Mathematics
University of Louisville
Louisville, KY 40292
frmcmo01@homer.louisville.edu
(502)852-6826

Well here it is, our electronic Newsletter. Its not too fancy. In fact its downright plain and lean.

As always, I would appreciate hearing from any of you regarding information that you would like to see in your Newsletter.

In this issue we have two special columns; one by David Banks and one by Phil Sutcliffe. Such articles, I feel, are an important part of the newsletter. Thank you gentlemen. Due to the added length, there is no Bookshelf in this issue.

Phipps Arabie relays the following contents for the next issue of the

Journal of Classification
Volume 13 Number 1 1996
(to be mailed in early June)

Vadim Pliner - Metric Unidimensional Scaling and Global Optimization

Evangelia Simantiraki - Unidimensional Scaling: A Linear Programming Approach Minimizing Absolute Deviations

Eric W. Holman - Quantitative Properties of the Evolution and Classification of Languages

John T. Daws - The Analysis of Free-Sorting Data: Beyond Pairwise Cooccurrences

John C. Gower & Michael J. Greenacre - Unfolding a Symmetric Matrix

Pierre Hansen, Brigitte Jaumard & Bruno Simeone - Espaliers: A Generalization of Dendrograms

Olivier Gascuel & Denise Levy - A Reduction Algorithm for Approximating a (nonmetric) Dissimilarity by a Tree Distance

Zhenmin Chen & John W. Van Ness - Space-Conserving Agglomerative Algorithms

Robert Saltstone & Ken Strange - A Computer Program to Calculate Hubert and Arabie's Adjusted Rand Index

Book Reviews

Return to top of newsletter

:::::::: Forum ::::::::

QUESTIONING AUTHORITY

David Banks, Dept. of Statistics
Carnegie Mellon University, Pittsburgh, PA 15123
banks@stat.cmu.edu

Authorship attribution is that area of scholarship that tries to determine who wrote a given piece of text. It is a problem with a long history; one story holds that the divine authorship of the _Pentateuch_ was proved by the letter- perfect agreement of 72 independent translations of it from Hebrew into Greek (the _Septuagint_), around 250 B.C.

Statisticians work on a humbler scale, which has been emphasized by our conspicuous failure to contribute much that seems definitive to the authorship attribution arena. Generally, this area includes two major classification problems:

1) The determination of which among a small set of authors wrote a specified text.

2) The determination of whether a particular person wrote a specified text.

Also, sometimes one tries to determine whether a work was written collaboratively between two specified people, or whether a particular individual translated a given work.

The first problem is the simpler; Mosteller and Wallace (1984) addressed it in attempting to decide the authorship of certain of the _Federalist_ papers, known to have been written by either Madison or Hamilton or Jay. Expert historical opinion has congealed around Mosteller and Wallace's assignments, and their effort is now regarded as a model application of Bayesian data analysis.

The second problem is more common and harder. Examples of such cases include determining whether (1) Shakespeare wrote the ``Shall I fly, shall I die'' poem, (2) Daniel Defoe wrote a number of anonymous political and religious pamphlets, (3) Mark Twain wrote the ``Quintus Curtius Snodgrass Letters,'' and (4) Joseph Smith wrote The Book of Mormon. Without belaboring details, these examples point up both the range of the problems and the scholarly and societal importance that could be attached to definitive conclusions.

From a literary standpoint, much attention has been given to fairly obvious issues. In order to assess authorship in the second problem, one needs a long document and a thorough knowledge of the history pertaining to the document and the conjectured author. Additionally, one needs a substantial body of similar text from both the conjectured author and plausible alternative writers, all matched by genre, time period, subject, and so forth. And there is a consensus among English professors that statistical argument should defer entirely to historical clues, and take very seriously all principled aesthetic opinion (e.g., many feel that ``Shall I fly, shall I die'' is not good enough to be Shakespeare).

From a statistical standpoint, modern work is a branch of stylometrics, a discipline devoted to quantitative measurement of literary text. The early work focused on counting sentence length, word lengths, and vocabulary richness. Later, people began to look for words that were discriminatory; i.e., words used by specific authors far more commonly or far more rarely than was typical of their contemporaries. This was the core approach taken by Mosteller and Wallace. Most recently, statisticians have applied multivariate analysis, to take account of dependencies in an author's usage patterns. Also, people are looking at more subtle measures of style, such as the frequency of use of relative clauses or figurative imagery.

My own sense is that a key unresolved problem in this arena is that of variable selection. For any text, there are an enormous number of possible quantities that might be measured, but only a random few are likely to be distinctive, and the distinctive traits are different for different authors. It is necessary to find which of the variables are informative; otherwise, the analysis is swamped by noise.

Many researchers proceed unautomatically; they use human intuition to notice unusual stylistic features in the uncertain text or the candidate authors, and then base their analyses solely on those. This is close to what human experts do, but it courts the danger of bias; people tend to notice things that conform to their prior hopes.

Other researchers measure almost everything that is convenient, then look for variables that discriminate among texts of known authorship. But this can lead to cherry-picking. Between any two bodies of text, some of the enormously many measurable variables will be consistently similar solely by chance, even if both texts are written by different people; conversely, some features will be consistently different, even though the works have common authorship.

To make progress, I urge people to continue efforts to identify the latent dimensions of writing. Authorship attribution will always be a high- dimensional problem, but we need to separate the real variables from the noise. For example, specific vocabulary may noisily reflect the author's recent reading, but a Latinate- vulgar axis seems plausible and consistent.

For a slightly dated survey of statistics in this area, I recommend David Holmes (1985); a current application is given in Holmes and Forsythe (1995). Holmes gave a more recent review of strategies in this area, including some wonderful principal components results, at last year's CSNA meeting.

In the 1996 CSNA meeting, we will be fortunate to hear about cutting-edge work in this area from Donald Foster, of Vassar, whose work was profiled in the _New York Times_ on Jan. 14. Foster is quickly marshalling a scholarly consensus that Shakespeare was the author of a previously unattributed elegy, using a combination of statistical and literary arguments.

References

Holmes, D. (1985). The Analysis of Literary Style---A Review, _Journal of the Royal Statistical Society, Series A, Vol. 148, 328-341.

Holmes, D. and Forsythe, R. S. (1995). The _Federalist_ Revisited: New Directions in Authorship Attribution. _Literary and Linguistic Computing_, Vol. 10, 111-127.

Mosteller, F. and Wallace, D. L. (1984) ._Applied Bayesian and Classical Inference_, 2nd ed., Springer-Verlag, New York.

**************************************************************

AN ENQUIRY INTO CURRENT UNDERSTANDINGS OF THE NOTION "CLASSIFICATION" WITH IMPLICATIONS FOR FUTURE DIRECTIONS OF RESEARCH

J. P. Sutcliffe
Department of Psychology
University of Sydney
Sydney, NSW 2006, Australia
jps@psychvax.psych.su.oz.au

In the textbooks, journals, and conference programmes of the field one finds a variety of topics listed under the rubric of "classification". (See, for example, Diday et al. (1995), and the Journal of Classification.) During 1994, instrumental to an attempt to relate contemporary to traditional conceptions of classification, an open-ended questionnaire was mailed to 66 prominent researchers - most of them working within the ideology of "numerical taxonomy" (d'apres Sokal et Sneath, 1963). Each respondent was requested to nominate a "very best instance" of classification both [i] at large, and [ii] within the more limited context of "numerical taxonomy", indicating at the same time what criterion of "best" was being used in each case. Then with respect to each nomination made, the respondent was asked (A) "by what method was the classification produced?" (classification of all objects in the chosen universe of discourse), and (B) "how is it decided whether some arbitrarily presented object is (is not) a member of a specified subclass of that classification?"(classification of any individual object relative to the previously specied sub-classes of the chosen universe of discourse).

The return was significant, both with respect to degree of response (there being very few returns despite reminders), and with respect to the quality of the few responses which were given (answers being for the most part non- specific, or inadequate when relevant, or evasive). Details follow.

From 66 persons contacted, 18 replies were received, 10 in response to the initial mailing and another 8 after a reminder. Of the 66 persons surveyed, 48 replied neither to the first nor to the second mailing. Only 7 of the 18 who responded gave in reply appropriate answers to the questions posed. Five others gave some comment relevant to the enquiry but did not attempt to answer the questions as put. The remaining six declined to participate on various grounds: "not his field"; "does not work in the area of classification"; "currently too busy"; "did not wish to answer"; "did not know how to answer"; "questions do not apply to his area of research."

From the 7 respondents who seriously attempted to reply to the questions posed, the answers given can be summarized as follows.The classifications cited in context [i] (open terms of reference) were: [1] Aristotle's "categories" (Barnes, 1984); [2] Linnaeus's "Species Plantarum" (Linnaeus, 1753; Stearn, 1959); [3] Mendeleiev's "periodic table of the elements" (Mendeleiev, 1879, 1889,1899; Kolodkine, 1963); [4] the Yerkes "atlas of stellar spectra" (Kellman, 1943; Keenan, 1963); [5] an "atlas of selected galaxies" (Takase et al., 1984); and [6] the "Azotobacteraceae" (Thompson and Skerman, 1979). Scientific systematization was noted as the principle merit of these classifications, but practical merit was also noted in some cases. The classifications cited in context [ii] (restricted terms of reference) were: [7] Coombs' (1964) "theory of data"; [8] Ruhlen's (1987) "languages of the world"; [9] Tannen's (1984) "conversational styles"; [10] Dyen's (1992) "dialects of Indo- European languages"; [11] Reboussin's (1992) "geographic patterns of serial rape"; [12] Sibley's (1988) "living birds of the world"; and [13] "images of landscape features constructed from satellite signals" (Rosenfeld and Kak, 1982). The last five are instance of "polythetic classification", but only the last two were claimed as examples of "numerical taxonomy", and that provenance was specifically disavowed in the other three cases.

The question (A) of how any classification was produced was answered specifically only in one case, [4]. Concerning the other three "traditional" classifications - [1], [3], and [6] - the respondents admitted that they did not know how the classifications were made. Non-specific answers were given in the nine remaining cases.

In response to the (B) question of how to decide class membership for an arbitrarily presented object, appropriate but non-specific answers were given in cases [1], [2], [4], [6], [7], [8], and [12]; respondents either did not answer or they admitted that they did not know in cases [3], [9], and [11]; and in the remaining cases it was said either [5], [10] that the question did not apply to the kind of "classification" under consideration, or [13] that the question applied to "discrimination" rather than to "classification".

In summary: Of the minority of 18 who responded to the enquiry, most were able to give only vague and non-specific answers to the (A) and (B) questions, some said that they were not able to answer some of the questions, and some said that they were not able to answer any of the questions. Such difficulties for that minority of 18 who did respond to the enquiry most likely applied also for that majority of 48 who did not respond to the enquiry. Evidently the vast majority of researchers eminent in the field of "numerical taxonomy" are currently ill prepared to answer questions (A) and (B) which are central to the issue of what constitutes classification.

If approached from the point of view of "logical taxonomy" - the tradition going back to Aristotle and Porphyry (Latta & McBeath, 1959) - as distinct from contemporary "numerical taxonomy" (Sokal and Sneath, 1963), the questions posed in the questionnaire can be straight-forwardly answered in the terms which the former's philosophy of "monotypy" provides. Under "monotypy", any class is specified by some definition , viz. that which expresses the intension of that concept of which the class being specified is the extension. (See Sutcliffe, 1995a,1995b.) A definition by genus and differentia is a statement of concept intension, providing necessary and sufficient conditions for class membership (membership of concept extension) relative to the specified universe of discourse. Under "monotypy" one can readily explain classification in both of the senses: (A) establishing a classification, effecting that by definition - the statement of concept intension; and (B) using the classification so established to decide the membership status of individual objects, effecting that by determining whether or not an object in question satisfies the necessary and sufficient conditions stipulated in the chosen definition - the satisfaction of concept intension. It is unlikely that many (if any) "numerical taxonomists" will have had formal study of the history and logic of classification in preparation for their researches in "classification". Thus it may be lack of certain formal training which accounts for the inadequacies of the survey returns with respect to context [i].

It can be argued, however, additionally if not exclusively, that it is the foundation of the "numeric taxonomic" conception of classification - the philosophy of "polytypy" itself - which is the source of the respondents' difficulties with the question posed within context [ii].

"Polythetic classes" are taken to be specified not by definitional identity but by reference to inter-object similarity. Gordon (1981), for example, talks of "classes such that objects in the same class are similar to one another and dissimilar to objects in other classes." As it is proposed by Wittgenstein (1953), Bechner (1959), Sokal and Sneath (1963), (Rosch, 1978), and others, the theory of "polythetic classes" (family, cluster, category, etc.), denies the need for definition: there is said to be no property common to "members" of any such polythetic class. However, to be consistent with that denial, one cannot then use terms such as "cluster" or "category" as if they were classes in the sense of "monothetic class". Lacking that sense, terms such as "class" (family, category, cluster) "kind", and "membership" then become meaningless because in discourse one can make no categorical distinctions between things. Hence there can be no classification, and hence there can be no polythetic classification. This shows that the theory of polytypy is logically incoherent. (See Sutcliffe, 1994.) Correspondingly this explains the difficulties confronting the (polytypic) numerical taxonomists surveyed: In strictly "polytypic" terms, no coherent answers can be given within context [ii] to questions (A) and (B) as stated above. If instead, while professing adherence to "polytypy" one attempted (inconsistently) to answer de facto in "monotypic" terms, then one would have to know the relevant concept intentions - the actual necessary and sufficient conditions for an object to be a member of a cluster (category, family, .). For any cluster such (complex) conditions must be determinable, but to date, with rare exceptions (Jardine & Sibson 1971) no explicit attention has been given to the matter. In the absence of such knowledge one could not answer questions (A) or (B) within context [ii].

Wherever one puts the emphasis upon the issues raised by this enquiry - upon ignorance of the history and logic of classification on the one hand, or upon the logical inadequacies of polythetic classification on the other - there is a case for critical reconsideration of future directions for research in classification.

References

BARNES, J. Ed., (1984), The Complete Works of Aristotle, (Aristotle, Categories, trans. J. L. Akrill), Princeton, N.J.: Princeton University Press, 1, 3- 24.

BECHNER, M. (1959), The Biological Way of Thought, New York: Columbia University Press.

COOMBS, C.H. (1964), A Theory of Data. New York: Wiley.

DIDAY, E., LECHEVALLIER, Y., SCHRADER, M. BERTRAND, P. and BURTCHY, B. (1994), Eds., New Approaches in Classification and Data Analysis. Berlin: Springer-Verlag.

DYEN, I., KRUSKAL, J.B., and BLACK, P. (1992), "An IndoEuropean classification: a lexicostatistical experiment". Transactions of the American Philosophical Society, 82, part 5.

GORDON, A.D. (1981), Classification, London: Chapman and Hall.

JARDINE, N., and SIBSON, R. (1971), Mathematical Taxonomy., London: Wiley.

KEENAN, P.C. (1963), "Classification of stellar spectra", in Basic Astronomical Data., Ed., K. A. Strand, Chicago, Chicago: University of Chicago Press, 78-122.

KELLMAN, E. (1943), Yerkes Atlas of Stellar Spectra, Chicago: University of Chicago Press.

KOLODKINE, P. (1963), Mendeleiev: Decouvertes de la Loi Periodique, Paris: Savants du monde entier, editions Seghers.

LATTA, R., and MCBEATH, A. (1956), The Elements of Logic, 8th edn., London: Macmillan.

LINNAEUS, C. (1753), Species Plantarum, Stockholm: Laurentius Salvius.

MENDELEIEV, D. (1879), "La loi periodique des elements chimiques", Le Moniteur Scientifique, 9, No 3, 695-787.

MENDELEIEV, D. (1889), "La loi periodique des elements chimiques", Le Moniteur Scientifique, 33, No. 2, 695-787.

MENDELEIEV, D. (1899), "Comment j'ai trouve le systeme periodique des elements. Revue Generale, No. 1, 211-214; 510-512.

REBOUSSIN, R., HAZELWOOD, R., and WARREN, J. (1992), "Classifying geographic patterns of serial rape", Paper presented at CSNA meeting, June, Michigan State University.

ROSCH, E., and LLOYD, B.B. (1978), Eds., Cognition and Categorization., Hillsdale, N.J.: Lawrence Erlbaum.

ROSENFELD, A., and KAK, A. (1982), Digital Picture Processing. New York: Academic Press.

RUHLEN, M. (1987), A Guide to the World's Languages, Volume I, Classification, Stanford: Stanford University Press.

SIBLEY, C.G., AHLQUIST, J.E., and MUNROE, B.L. (1988), "A classification of living birds of the world based on DNA-DNA hybridization", Auk, 105, 409- 423.

STEARN, W.T. (1959), "The background to Linnaeus's contributions to the nomenclature and methods of systematic biology", Systematic Zoology, 8, 4- 22.

SOKAL, R.R. and SNEATH, P.H.A. (1963), Principles of Numerical Taxonomy, San Francisco: W. H. Freeman.

SUTCLIFFE, J.P. (1994), "On the logical necessity and priority of a monothetic conception of class, and on the consequent inadequacy of polythetic accounts of category and categorization", in New Approaches in Classification and Data Analysis., Eds., E. Diday et al. cited above, 55-63.

SUTCLIFFE, J.P. (1995a), "Mecanisme logique pour decider de ce qui releve ou non de la classification", Bulletin de la Societe Francophone de Classification, 15 (mars 4), 5-10.

SUTCLIFFE, J.P. (1995b), "Logical machinery for deciding what is or is not classification", The Newsletter of the Classification Society of North America. Number 41, September, 2-6.

TAKASE, B., KODAIRA, K. and OKAMURA, S. (1984), An Atlas of Selected Galaxies with llustrations of Photometric Analyses, Tokyo: University of Tokyo Press.

TANNEN, D. (1984), Conversational Style. Norwood, N.J.: Ablex Publications.

THOMPSON, J.P., and SKERMAN, V.B.D. (1979) Azotobacteraceae, London: Academic Press.

WITTGENSTEIN, L. (1953), Philosophical Investigations., G.E.M. Anscombe trans., New York: Macmillan.

Return to top of newsletter

:::::::: Meeting Reminder ::::::::

FINAL REMINDER FOR CSNA-NT96

This year CSNA is holding its meeting jointly with the numerical taxonomy group. The dates are June 13-16, 1996 and the place is the University of Massachusetts at Amherst. Registration forms and further information can be obtained from the local organizer M. F. Janowitz, Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 01003 USA, phone (413)- 545-2871, fax (413)-545-1801, e-mail csna96@math.umass.edu. The organizer for the NT meeting is Pierre Legendre, phone (514)-343- 7591, fax (514)-343-2293, e-mail legendre@ere.umontreal.ca. If you intend to come it is important that you register as soon as possible so that appropriate housing will still be available. All talks will be held at the Campus Center Hotel. There will be dormitory housing available at approximately $25 per night; hotel rooms are discounted to a rate of $58 for a single room and $68 for a double. There is an attached enclosed parking garage. The traditional short course on cluster analysis will occur on June 13, with instructors Stephen C. Hirtle, Pierre Legendre, and Glen Milligan. There will be a CSNA Social that evening. Friday will consist of CSNA sessions, followed by the CSNA business meeting and an evening banquet. Saturday will be mostly CSNA talks, but there will be an afternoon joint CSNA-NT session, with the CSNA portion of the conference ending late that afternoon. Following this, there will be an NT mixer. On Sunday, there will be NT sessions, the NT business meeting, and a buffet lunch. Complimentary continental breakfasts will be served Friday, Saturday and Sunday mornings. As of this moment the following special sessions are scheduled:

General Consensus Theory (Robert C. Powers)
Pathfinder networks and proximity graphs (Donald Dearholt)
Classification in social network analysis (Stanley Wasserman)
Graduate student session (Peter Bryant)
Information retrieval and classification (Stephen Hirtle)
Image analysis and estimation (Sridhar Lakshaman and Anil K. Jain)

Software demonstrations will be held during the NT mixer by Pierre Legendre, F. James Rohlf, David Swofford, and David Wishart.

Here are the invited speakers and titles of their talks.
Donald Foster: Shakespeare's Who Done It: The Case of "A Funeral Elegy" (see the article by David Banks on Authorship that appears elsewhere in this Newsletter)
Herman Friedman: Classification and Clustering: A Perspective on Applications (after dinner speaker)
Donald Geman: Tree Structured Shape Recognition
Bruno Leclerc: A Survey on the Consensus of Classification Trees
Phillipa Pattison: Algebraic Bases for the Analysis of Binary Data
F. James Rohlf: Application of Geometric Morphometric Methods to Evolutionary Studies
David Swofford: Advances in Methodology for Reconstructing Evolutionary Trees: Bringing the Technology back to the Biologists.

Further information appears in the February, 1996 and November, 1995 CSNA Newsletters.

:::::::: Random Conference News ::::::::

* JUNE 22 - 25, 1996: The 3rd International Meeting of the Society for Social Choice and Welfare at the University of Maastricht. The meeting will cover the field of Social Choice in a broad sense, including related topics from utility theory, game theory, cost allocation, public economics, economic and political equilibrium and other areas. For information contact Prof. Hans Peters, Department of Quantitative Economics, University of Maastricht, P.O. Box 616, 6200 MD Maastricht, The Netherlands. (Phone +31-43-88-38.35, email scw@ke.rulimburg.nl )

* JUNE 27 - 30, 1996: Annual Meeting of the Psychometric Society at the Banff Centre for Conferences, Banff, Alberta, Canada. For information contact S. Nishisato, President, Psychometric Society, PS-96 Program chair, OISE, 252 Bloor Street West, Toronto, Ontario M5S 1V6, Canada. (email snishisato@oise.on.ca fax: 416-926-4725 )

* JULY 15 - 19, 1996: 11th International Workshop on Statistical Modelling in Orvieto, Italy. As in previous years this meeting will focus on the various aspects of statistical modelling, including theoretical developments, applications and computational methods. The workshop aims to concentrate on papers that are motivated by real practical problems and that make a novel contribution to the subject. Theoretical contributions addressing problems of practical importance or related to software developments are also welcome. For information contact Antonio Forcina, Dipartimento di Scienze Statistiche, Universita di Perugia via A. Pascoli, Casella Postale 1315/PG1m 06100 Perugia, Italy. (Phone: +39-75-585-5227. email wks96@stat.unipg.it )

* AUGUST 3 - 5 , 1996: Second International Conference on Knowledge Discovery and Data Mining in Portland, Oregon. Knowledge Discovery in Databases (KDD), also referred to as Data Mining, is an area of common interest to researchers in machine discovery, statistics, databases, knowledge acquisition, machine learning, data visualization, high performance computing, and knowledge-based systems. The rapid growth of data and information has created a need and an opportunity for extracting knowledge from databases, and both researchers and application developers have been responding to that need. KDD applications have been developed for astronomy, biology, finance, insurance, marketing, medicine, and many other fields. For information, visit the KDD- 96 WWW page at http://www-aig.jpl.nasa.gov/kdd96 or email kdd96@aig.jpl.nasa.gov.

* AUGUST 20-23, 1996: ISIS: Information, Statistics and Induction in Science in Melbourne, Australia. This conference will explore the use of computational modelling to understand and emulate inductive processes in science. The problems involved in building and using such computer models reflect methodological and foundational concerns common to a variety of academic disciplines, especially statistics, artificial intelligence (AI) and the philosophy of science. This conference aims to bring together researchers from these and related fields to present new computational technologies for supporting or analysing scientific inference and to engage in collegial debate over the merits and difficulties underlying the various approaches to automating inductive and statistical inference. For information contact Dr David Dowe, ISIS chair, Department of Computer Science, Monash University, Clayton, Victoria 3168 Australia Phone: +61-3-9 905 5226 FAX: +61-3-9 905 5146 Email: isis96@cs.monash.edu.au

Return to top of newsletter


The WWW version of the CSNA Newsletter is made available as a service of the Classification Society of North America. For further information on becoming a member of CSNA, please contact the hirtle+@pitt.edu, CSNA Webmaster.