Classification Society of North America Newsletter

November 1996, Issue #47
Peter Bryant, President
F.R. McMorris, Newsletter Editor

In this issue:

::::::: President's Corner :::::::

Peter Bryant
College of Business
University of Colorado at Denver
Denver, CO 80217-3364
pbryant@castle.cudenver.edu
303-556-5833

In the rest of this newsletter, you will find announcements of a wide variety of activities in CSNA and our affiliated societies. They are important parts of what we do as a society, and I will let them speak for themselves.

Return to top of newsletter

::::::: From the Secretary/Treasurer :::::::

Dawn Iacobucci
Department of Marketing
Kellogg Graduate School of Management
Northwestern University
2001 Sheridan Road
Evanston, IL 60208

By now all members should have all received a letter containing the bios for new candidates for our board positions and for that of the business manager. Please note the letter also contains a voting ballot. Please complete it and send to me by December 20, 1996. Please also that it might be time to pay for 1997 membership so that you will receive the journal uninterrupted (a membership renewal form was also included in the mailing). Thank you.

Return to top of newsletter

::::::: From the Newsletter Editor :::::::

F.R. McMorris
Department of Mathematics
University of Louisville
Louisville, KY 40292
frmcmo01@homer.louisville.edu
(502)852-6826

We have two Forum articles this issue! Because of this, and the fact that one of the authors is our normal Bookshelf editor, there will be no Bookshelf in this issue. Anyone else out there ready to give the Forum a go?

Return to top of newsletter

:::::::::::::: Forum, Part 1 ::::::::::::::

QUI METABARIT IPSOS MENSORES?
(WHO WILL MEASURE THE MEASURERS?)

David Banks, Dept. of Statistics
Carnegie Mellon University, Pittsburgh, PA 15123
banks@stat.cmu.edu

The academic readers of the CSNA Newsletter readers classify on a regular basis, whenever they assign grades to their students. But things that are done so regularly become routine, and perhaps we do not examine the underlying issues as carefully as our training suggests we ought.

There is much prior work on educational testing; Lord and Novick (1968) is the classic, and despite W. H. Auden's dictum in _Victor_

Thou shalt not do as the dean pleases,
Thou shalt not write your doctor's thesis
On education

many others (some of whom are quite intellectually respectable) have contributed to the literature. Given this springboard, the natural place to start in assessing the accuracy of our educational assessments is to frame the problem in terms of item response curves.

The usual formulation assumes that student S_i has ability alpha_i and that question Q_j has difficulty lambda_j. Imposing the constraint that

alpha_i = lambda_j -> P[ S_i answers Q_j correctly] = .5

and injecting a small of amount of mathematical tractability leads to the Rasch model:

P[S_i answers Q_j correctly ] = e^{beta (alpha_i - lambda_j)}
--------------------------------
1 + e^{beta(alpha_i - lambda_j)}

where beta is a positive constant chosen by the modeller.

Graphing this probability as a function of alpha_i gives the item response curve for question Q_j. The curve increases monotonically, indicating that high-ability students have better chances of giving a right answer. If the curve is steep (which is controlled by beta), then the question has good resolution for separating sheep from goats; the point at which the slope is greatest is the ability level that question Q_j aims at discriminating.

In the context of classroom classification, the well-calibrated professor should (to be most efficient) write only three kinds of questions: those that distinguish D-students from C-students; those that distinguish C's from B's; and those that distinguish B's from A's. (In my classes a grade of F is punitive, rather than evaluative---a D suffices to designate dolts, without subjecting me to their company the following semester.) The numbers of each type of question could be chosen from a priori knowledge of the proportions of each category of student, or one can work out other schemes for optimizing those, but this turns out to be an issue of second-order importance.

Let us suppose that the well-calibrated professor constructs an examination with k questions of each kind. Let us further suppose that the professor is so clever an examiner that his questions have item response curves that are all wonderfully steep. Specifically, the probability that a student whose alpha_i falls exactly at the boundary between a C and a D rating has probability .5 of answering easy questions, but only probability .2 of answering a B/C distinguisher and .05 of answering an A/B distinguisher. In this fashion, a table showing the probability of a correct answer (i.e., the discriminating power of the question) against ability level would be:

Question Type

Easy(C/D) Moderate(B/C) Hard(A/B)

Student Type

C/D border .5 .2 .05 B/C border .8 .5 .2 A/B border .95 .8 .5

These numbers are intended to be realistic, but generous in their assessment of the professor's ability to generate good questions. The probabilities are all consistent with the Rasch model.

Now the professor administers the exam, and then marks each question as right or wrong (0-1 grading). The sum T of a student's points turns out to be the sufficient statistic under the Rasch model for making inference about alpha_i. One can show that the minimum risk rule (under a constant penalty for misclassifying students) is to

Classify as a D if T <= 3/4 k Classify as a C if 3/4 k < T <= 3/2 k Classify as a B if 3/2 k < T <= 9/4 k Classify as a A if T > 9/4 k.

If a student in the class has an ability level that is a random draw from a standard normal distribution, and if the grade levels are intended to correspond to the quartiles of the normal distribution, then one finds that the probability of misclassifying the student as a function of the number of questions on the exam is as follows:

No. of questions (3k) P[misgrading] 48 .147 96 .103 192 .066 384 .045

The good news is that misgrading by more than one letter is very rare. Most of the difficulty comes from a student whose ability level lies near a grade cutpoint. The probability of having a few difficult cases increases with the size of the class. If there are 10 students and 96 questions, the probability that all assigned grades are accurate is only .337.

I typically give three exams and a final in my undergraduate classes (which have enrollments of between 80 and 240). The total number of questions that I ask is on the close order of 96, and I am sure I am not as well-calibrated or as clever in writing questions with steep item-response curves as the professor in this example. Thus I am quite positive that I mismark many students each semester. Knowing my own fallibility, I am not dismayed by this, but I dread the possibility that an astute student would make the kind of calculation I have shown above, and then visit me as a modern-day Spartacus, demanding justice for his peers.

Probably (at least upon reflection) no member of the CSNA is startled by these misclassification rates, but I think it salutary for us to undertake this sort of analysis. At the very least, we develop a stronger sense of how bad we are, and how undesirable it would be for students to discover this. (In fact, I am surprised that these results are not widely known among students already.)

I should, in fairness, close with another quote from _Victor_:

Thou shalt not sit with statisticians,
Nor commit a social science.

How much worse it must be to do both, as I've (in a small way) attempted in this commentary.

References:

Fienberg, S. E. (1986). ``The Rasch Model,'' _Encyclopedia of Statistical Sciences_, Vol. 7, 627-632.

Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading MA.

Return to top of newsletter

:::::::::::::: Forum, Part 2 ::::::::::::::

EMPIRICAL BAYESIAN METHODS IN ROAD SAFETY RESEARCH

Rian van Blokland-Vogelesang
SWOV Institute for Road Safety Research
P.O. Box 170
2260 AD Leidschendam
The Netherlands
Blokland@SWOV.nl

'Imagine now what would have happened had the Ministry hired a shaman to pray for accident reduction on road sections that had in the first year seven or more accidents. The apparent effectiveness of this 'treatment' is (...) about 30 percent.' (Hauer et al., 1983).

Central topics in road safety research are: estimation of 'safety': frequency of accidents at crossings, intersections, or of drivers, etc.; identification of unsafe entities; estimation of the effect of remedial treatment of crossings, intersections or drivers, including prediction of future accidents; migration of accidents, and accident clustering.

Both in pure Bayes and in Empirical Bayes (EB) estimation, the parameter value (l) itself is regarded as a realization of a random variable L with distribution function G(l), the prior distribution. In the pure Bayesian approach, G(l) is not necessarily interpreted in terms of relative frequencies, in the EB approach it is given a frequency interpretation (Maritz & Lwin, 1989). Also, the availability of previous data, suitable of estimation of G(l), is assumed. The EB estimation rule generally depends on all past and current x- values.

Estimation of Safety
Safety is an attribute of a specific intersection, a driver, etc. The estimation of safety is by assessing numbers of accidents and their negative consequences. Counting numbers of fatalities is an estimate of unsafety, but not an unbiased one, because of the effect of 'regression to the mean'. Intersections that had a bad accident record one year are, on average, expected to have less accidents next year.

Numbers of accidents x at a fixed location are Poisson distributed, and are aggregated over comparable sites to form a group. Also, the means m of the separate groups show variation. Because accidents at distinct sites within a group do vary, their distribution is a compound ('overdispersed') Poisson distribution: a negative binomial distribution. This is the predictive distribution for expected numbers of accidents in the future. The EB point estimate is the mean of the posterior distribution of L for given X = x and is corrected for regression to the mean (cf. Jarrett et al., 1982; Maritz et al, 1989).

The assessment of the effectiveness remedial measures is based on a comparison between accident frequency before and after treatment. However, this comparison is complicated by a number of factors, such as regression-to- mean and accident migration.

The Effect of Remedial Treatment
Bayesian analysis differs significantly from the classical analysis of accident data. The motivation for the Bayesian analysis is the desire to treat the actual accident rate at a particular site as a random variable and to use a combination of regional accident characteristics and accident history at that location to determine the probability that the location is hazardous. In a first step, accident histories are aggregated over sites (or years) to estimate the probability distribution of accident rates. In the second step this (empirical) prior distribution and the accident history at a particular site - which is assumed to be Poisson - are used to obtain an estimation of the (posterior) probability distribution at the particular site. For this, non parametric methods and a Bayesian methods are available (e.g., Hauer, 1980, 1986)

Regression-to-mean
When the duration of the observation period increases, the relative size of the regression effect diminishes. This applies to drivers as well: drivers with one or more violations in one period are seen to have on average 60-80 percent less violations in the next period (Hauer et al., 1983). The magnitude of the effect is much larger for drivers than for road segments. A problem is how to obtain unbiased estimates in before-and-after studies in the presence of the regression effect. Let B denote the number of accidents occurring on a system during some period before treatment and let A denote the number of accidents expected to occur on the same system during an after period of the same duration had treatment not been applied. We have to estimate A. For this we have A(x) = B(x+1), and A(x) - A(x+1) = B(x+1) - B(x+2), and x = number of accidents. The left-hand side of the second expression is the expected value of the difference on systems that in period 1 had x and x+1 accidents, respectively, and is obtained by summing over systems, assuming that the number of accidents on each system is Poisson distributed. Most empirical Bayesian estimators are shrinkage estimators, like the LEB (linear empirical Bayes) estimator, they 'shrink' the observed accident frequency towards the population mean. This results in a smaller mean squared error.

Accident migration
When an accident blackspot is treated, the reduction in accidents is often accompanied by an increase in accident numbers elsewhere, because of reduced awareness of the need for caution (Boyle et al., 1984) and risk compensation. The hypothesis of risk compensation has been criticized fiercefully. Statistical explanations are the (reversed) regression-to-mean effect, conversions that bring about a degradation in safety, or positive correlation between true accident rates of adjacent sites, since traffic flow tends to be correlated at adjacent sites (Maher, 1987).

Accident clustering
While the means are equal, the variance of the negative binomial is greater than for the gamma distribution. The accident counts will exhibit a higher level of clustering (or inequality) than the underlying true accident rates. Indices of accident clustering are, (mean) accident count, the coefficient of variation (CV), and indices based on information theory (Nicholson, 1995).

References:

Boyle, A.J. & Wright, C.C. (1984). Accident 'migration' after remedial treatment at accident blackspots. Traff. Engng. Control, 25(5), 260-267.

Hauer, E. (1980). Bias-by-selection: overestimation of the effectiveness of safety counter measures caused by the process of selection for treatment. Acc. Anal. Prev., 12, 113-117.

Hauer, E. (1986). On the estimation of the expected number of accidents. Acc. Anal. Prev., 18, 1-12.

Hauer, E. & Persaud, B. (1983). Common bias in before-and-after comparisons and its elimination. Transp. Res. Rec., 905, 164-174.

Hauer, E. & Persaud, B. (1984). Problem of identifying hazardous locations using accident data. Transp. Res. Rec., 975, 36-43.

Jarrett, D.F., Abbess, C.R., & Wright, C.C. (1982). Bayesian methods applied to road accident blackspot studies: some recent progress. Seminar on Short-term and Area-wide Evaluation of Safety Measures, Institute for Road Safety Research (SWOV), Amsterdam, 1982.

Maher, M.J. (1987), Accident Migration - a statistical explanation? Traff. Engng & Control, 28, pp. 480-483.

Maritz, J.S. & Lwin, T. (1989). Empirical Bayes Methods (2nd). New York: Chapman & Hall.

Nicholson, A. (1995). Indices of accident clustering: a re-evaluation. Traff. Engng & Control, 36, pp. 291-295.

Vogelesang, A.W. (1996, december). Bayesian Methods in Road Safety Research: An Overview. Institute for Road Safety Research, SWOV, Leidschendam, The Netherlands.

Return to top of newsletter

:::::::::::::: UPDATE FROM OLGA CORDERO-BRANA AND DAVID BANKS ABOUT CSNA-97 ::::::::::::::

Here is an update on the current status of the CSNA-97 meeting. It will be held at the American University in Washington, DC, June 12-15, 1997. The first day will include at least the traditional short course on exploratory data analysis; there may also be a more advanced short course, perhaps on new nonparametric regression technology (MARS, neural nets, projection pursuit regression, and so forth). If people are interested in this (or some other) topic, please let the organizers know.

The next issue of the Newsletter will contain a more detailed announcement, together with a copy of the registration form. Incidentally, the registration form should also be available on the CSNA WWW page, or by requesting a copy from one of the organizers.

A number of special sessions are in the planning stages. Some of the key topics, and their chairs or organizers, include: Environmental Applications (Lara Wolfson), Inference on Phylogenetic Trees (David Banks), DNA Fingerprinting (Joe Gastwirth), a Graduate Student Session (Pete Bryant), Challenges in Information Science (Stephen Hirtle), Classification in Public Health Medicine (Demissie Alemayehu), New Problems in Biomedical Research (John Nolan), and Clustering Problems in Marketing (Paul Green). Invited speakers include: John Hartigan of Yale, Rob Tibshirani of the University of Toronto, Bruce Budowle of the FBI Forensics Laboratory, Mike Newton of the University of Wisconsin, and other luminaries.

The organizers of the meeting encourage the presentation of contributed papers that cover a wide range of applications and methodology that involve exploratory data analysis viewed in its broadest sense. Papers related to all kinds of classification and clustering problems are warmly solicited. Short abstracts of papers should be sent to the program chair, as well as any suggestions for symposia, special sessions, topics, panel discussions, requests for further information, or other contributions. The deadline for submission of abstracts is March 31, 1997. The Conference Chair for the meeting is Olga Cordero- Brana, Department of Mathematics and Statistics, American University, Washington DC, 20016-8050, telephone (202)-885-3130, email olgacb@american.edu. The Program Chair is David Banks, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 USA, telephone (412)-268-2721, fax (412)-268-7828, e-mail banks@stat.cmu.edu.

So in short, please plan to come!

Return to top of newsletter

:::::::::::::: Other Conference News ::::::::::::::

* MARCH 12-14, 1997: 21st Annual Conference of the GfKl, University of Potsdam, Germany. The conference program will include methods and applications of Classification, Data Analysis and Information Processing. Interdisciplinary aspects and interrelations between theory and practice will be particularly emphasized. For information contact Prof. Dr. I. Balderjahn, Lehrstuhl BWL - Marketing, Univ. Potsdam, August-Bebel-Str. 89, D- 14482 Potsdam, Germany. (Phone +49 331/977-3595. Fax +49 331/977- 3331. e-mail balderja@rz.uni-potsdam.de)

* MAY 4-9, 1997: ThRee-way methods in Chemistry and Psychology, Lake Chelan, Washington, USA. The meeting will focus on methods for analyzing 3-way and higher order data in the chemical sciences and psychology. The goal is to provide a relaxed atmosphere where ideas may be freely exchanged. Successes and failures will be discussed and opportunities for future research and application will be highlighted. For information contact Dr. Barry M. Wise, Eigenvector Research, Inc., 830 Wapato Lake Road, Manson, WA 98831. (Phone 509-687-2022. Fax 509-687-7033. e-mail 73633.2451@compuserve.com)

* JULY 21-24, 1998: 6th Conference of the International Federation of Classification Societies, Rome, Italy. Put it on your calendars - - - more information will be appearing later.

Return to top of newsletter


The WWW version of the CSNA Newsletter is made available as a service of the Classification Society of North America. For further information on becoming a member of CSNA, please contact the CSNA Business Manager.

Stephen Hirtle, hirtle+@pitt.edu, CSNA Webmaster.