Differences in the Coding of Spatial Relations in
Face Identification and Basic Level Object Recognition
Eric E. Cooper and Tim J. Wojan
Iowa State University
The purpose of the current investigation was to determine whether the relations among the primitives used in face identification and basic level object recognition are represented using co-ordinate or categorical relations. Two experiments used photographs of famous people's faces as stimuli in which each face had been altered to have either one of its eyes moved up from its normal position or both of its eyes moved up (but the relations between them kept constant). Participants performed either a face identification task (in which the person in the photograph was identified) or a basic level object recognition task (in which participants decided whether a stimulus was a human face). In the face identification task, one eye moved faces were easier to recognize than two eyes moved faces, while the basic level object recognition task showed the opposite pattern of results. The results suggest that face identification uses a shape representation employing co-ordinate relations in which the precise locations of visual primitives are specified while basic level object recognition uses relations which are coded categorically.
Differences in the Coding of Spatial Relations in
Face Identification and Basic Level Object Recognition
A consensus has developed that the process underlying face identification (meaning the process by which a person recognizes a visual stimulus as being "Aunt Bertha", "my mail carrier", or "Arnold Schwarzenegger") and the process underlying most forms of basic level object recognition (meaning the process by which a person recognizes a visual stimulus as being a "table", a "boat", or a "human face") are different. A number of lines of evidence showing dissociations between face identification and basic level object recognition support this conclusion. For example, faces are more difficult to identify in photographic negatives than are basic level objects (Bruce & Langton, 1994; Galper, 1970; Galper & Hochberg, 1971; Phillips, 1972), and faces show greater recognition costs when turned upside-down than do basic level objects (Carey & Diamond, 1977; Scapinello & Yarmey, 1970; Yin, 1969; see Valentine, 1988 for a review).
Additional evidence that face identification and basic level object recognition are accomplished by different processes comes from work in neuroscience. Sergent, Ohta and MacDonald (1992), using positron emission tomography (PET), found regions of the right hemisphere that become active during face identification that are not active during basic level object recognition. Further, a right hemisphere advantage for identifying faces is well documented (Davidoff, 1982 and Ellis, 1983 provide reviews) while the evidence for hemispheric specialization during basic level object recognition is far less clear with some studies finding a left hemisphere advantage (Bryden & Rainey, 1963; McKeever & Jackson, 1979; Wyke & Ettlinger, 1961; Young, Bion, & Ellis, 1980), others finding a right hemisphere advantage (Schmuller & Goodman, 1980), and still others finding no advantage for one hemisphere over the other (Biederman & Cooper, 1991; Kimura & Durnford, 1974; Levine & Banich, 1982). Perhaps the most persuasive evidence that basic level object recognition and face identification are accomplished by different processes comes from studies of brain damaged patients showing a double dissociation between the two processes. Farah (1994) found 27 cases in the literature in which a patient showed impaired face identification but intact basic level object recognition and 16 cases in which a patient showed impaired basic level object recognition but intact face identification arguing strongly that different neural substrates underlie the two tasks.
Given that face identification and basic level object recognition occur through different processes, the next logical question to consider is how the memory representations used for the two processes might differ. The most common speculation in the current literature is that faces use "configural" or "holistic" representations while basic level objects use "featural" representations. Unfortunately, this method of characterizing the differences in the representations is rather vague, and as O'Toole, Abdi, Deffenbacher, and Valentin (1995) and Bruce and Humphreys (1994) pointed out, has different meanings for different researchers. When they say that face identification uses "configural" or "holistic" representations, researchers generally mean that the interrelationships among the elements of the face are important in recognition. However, some form of "configural" representation must be used for basic level object recognition as well else one would be able to recognize scrambled versions of objects as readily as intact versions. When researchers say that basic level objects are recognized with "featural" representations, they generally mean that some form of visual primitives are extracted from the image as part of the recognition process. However, features of some type must be extracted during face identification as well because all shape recognition systems must posit some sort of primitive features (even if they are simple pixels).
In this article, we discuss a new dissociation we have found between the processes of face identification and basic level object recognition which sheds light on how the representations used during the two processes might differ. Specifically, using altered photographs of famous faces as stimuli, we observed a dissociation between the types of alterations that most affect face identification (i.e., Whose face is this?) and those that most affect basic level object recognition (i.e., Is this object a human face?). The results suggest that one of the principal differences between the face identification and basic level object recognition processes may be the manner in which the spatial positions of the visual primitives in each process are coded.
Coding of Spatial Relations in Current Theories of Shape Recognition
Current theories of how shape is represented in memory differ on a number of dimensions (see Hummel & Stankiewicz, 1996; Tarr, 1995 for discussions). As Tarr (1995) pointed out, modern theories of shape recognition differ on the frame of reference used (e.g., viewer centered vs. object centered vs. some combination of the two), the nature of the primitives out of which the memory representation is constructed (e.g., pixels, edges, volumetric primitives, local fourier components), and the number of spatial dimensions used (e.g., 2-D vs. 3-D vs. an abstract description in which dimensionality is irrelevant).
Hummel and Stankiewicz (Hummel, 1994; Hummel & Stankiewicz, 1996) have argued that the chief distinction between modern theories of shape recognition is the manner in which the positions of the primitives in the shape representation are coded. Any plausible theory of human shape representation must, of course, posit some means of specifying the spatial positions of the primitives used in the representation as otherwise a scrambled version of the shape would be recognized as readily as an intact version. Hummel and Stankiewicz (1996) suggested that modern theories of shape recognition fall into two different theoretical camps regarding how the spatial relations among primitives are coded. Specifically, some theories of shape recognition, known as structural description theories, posit primitive-to-primitive, categorical relations (hereafter referred to as categorical relations) (e.g., Biederman, 1987; Dickenson, Pentland, & Rosenfeld, 1993; Hummel & Biederman, 1992; Sutherland, 1968; Winston, 1975) in which each primitive is related to other primitives in the representation using directional categorical descriptors (e.g., "above", "below", "side of"). In contrast, other theories posit primitive-to-reference point, co-ordinate relations (hereafter referred to as co-ordinate relations) (e.g., BΈlthoff & Edelman, 1992; Edelman & Weinshall, 1991; Intrator & Cooper, 1992; Intrator & Gold, 1993; Lowe, 1987; Olhausen, Anderson & Van Essen, 1993; Poggio & Edelman, 1990; Poggio & Vetter, 1992; Siebert & Waxman, 1992; Ullman, 1989; Ullman & Basri, 1991) in which the precise distance of each primitive from a fixed reference point (or set of fixed reference points) is represented. The theoretical implications of these two different ways of coding the spatial positions of primitives are discussed below.
Categorical relations. The manner in which the spatial positions of the primitives in an object would be coded using a categorical relations scheme can be illustrated by examining the left face in Figure 1. Suppose, solely for the purposes of illustration, that the primitives used for the recognition of a face consist of the two eyes, the nose and the mouth. Other features (e.g., hairline, chin) are almost certainly used during face identification as well, however they are not included in the example below for the sake of simplicity, nor would their inclusion alter the predictions of the experiments reported here. A theory positing categorical relations among the primitives might describe the spatial position of the left eye in the following manner: the left eye is directly to the side of the right eye, both above and to the side of the nose, and both above and to the side of the mouth. Further, the spatial positions of each of the other primitives (i.e., the right eye, the nose and the mouth) would similarly be specified using categorical descriptors (above, below, side of) relative to the other primitives in the representation. Note that in a categorical representation, the distances between the primitives are not specified, only the directions are specified.
What would be the computational advantages of using a representational system in which the positions of primitives are related to one another categorically? The most obvious advantage of such a scheme is that it allows one to place different specific objects having wide variations in the exact positions of their primitive elements into the same basic level category quickly. For example, if a "mug" is represented in memory as being a "cylinder to the side of a curved cylinder," such a description would allow the activation of the same representation for all the different mugs depicted in Figure 2 despite the wide variations in the precise positions of the cylinder and handle as well as the large differences in the sizes and aspect ratios of the mugs and of their individual parts (Biederman, 1987). An additional advantage of using a representational scheme with categorical relations is that, because such a scheme does not rely on determining distances precisely, greater toleration of noise is possible both in the input stimulus as well as in the recognition system itself than in systems positing the specification of precise distances among primitive elements. Further, as a viewer's perspective on an object changes, the size, position, and orientation in depth of the object can all change. However, a shape recognition system using categorical relations allows immediate recognition despite such viewpoint changes (e.g., the mug remains a "cylinder to the side of a curved cylinder" regardless of the input size, position or orientation provided that both the cylinder and curved cylinder remain in view). In contrast, many theories postulating that precise distances among primitives are represented must also posit time-consuming alignment procedures (e.g., Ullman, 1989) in which the size, position, and orientation of the input are normalized to some standard set of values prior to recognition.
Because representations using categorical relations among the primitives allow such easy generalization over variations in both individual instances of an object and in viewing conditions, they have been most popular with theorists (e.g., Biederman, 1987; Biederman & Hummel, 1992) seeking to account for basic level object recognition (i.e., our ability to identify a particular object as a "mug" or a "suitcase" or a "human face"). In addition to coding the spatial positions of primitives categorically, such theories also posit very general primitives (such as Biederman's (1987) "geons" which are a small set of geometric primitives) so that recognition can occur despite wide variations in the sizes and aspect ratios of the parts of an object. Note, however, that because of their capacity to generalize across wide stimulus variations, such theories would be inadequate to account for our ability to recognize individual faces. Just as all the different mugs in Figure 2 would activate exactly the same structural description, different instances of human faces would also activate exactly the same representation in all extant structural description theories, making such a representation inadequate to distinguish between individual faces. Of course, a structural description theory with a richer set of primitives than those posited by current theorists (e.g., one that uses local fourier components as primitives) would be able to capture the variations needed to distinguish among human faces while still coding the positions of those primitives categorically.
Co-ordinate relations. In contrast to categorical relations, it is also possible to specify the spatial positions of the primitives in a shape representation using the co-ordinates of each of the primitives relative to a fixed reference point (or set of fixed reference points). Such a representation is illustrated in the right side of Figure 1 and is analogous to laying graph paper over the stimulus and specifying the vertical and horizontal co-ordinates of each of the primitives relative to the origin of the graph. A theory positing co-ordinate relations for the primitives might describe the spatial position of the primitive corresponding to the left eye in the following manner: the left eye is 5 units above the reference point and 3.5 units to the left of the reference point. Note that, in contrast to the categorical representations discussed earlier, the precise distances of the primitives from the reference point (or set of reference points) are specified in a co-ordinate representation.
Compared with the categorical representations of shape discussed earlier, co-ordinate representations cannot be generalized as easily to all the variations among the individual members of a basic level class of objects and generally require more intensive computations in order to overcome changes in the size, position, and orientation in depth of the input stimulus (Cooper, Biederman, & Hummel, 1992; Hummel & Stankiewicz, 1996). However, the great virtue of a co-ordinate shape representation is its ability to capture small variations among stimuli that cannot be captured in a categorical shape representation. As such, a co-ordinate representation would be well suited to distinguish among different types of stimuli within a basic level class. For the purposes of this paper, the most interesting advantage of a co-ordinate representation is that it would be able to distinguish between the faces of different people.
The strengths and weakness of categorical and co-ordinate shape representations suggest that the sort of representation used may depend upon the specific demands of the task that is being performed (Hummel & Stankiewicz, 1996). Specifically, categorical shape representations would likely be very good for performing basic level object recognition in which highly disparate stimuli must be grouped into the same category while co-ordinate shape representations would be well suited for performing face identification (or, more generally, for discriminations within a basic level class of objects) in which highly similar stimuli must be distinguished from one another. The central hypothesis being tested in this paper is whether a representation using categorical relations might be used for basic level object recognition while a representation using co-ordinate relations might be used for face identification.
First and Second Order Relations
Diamond and Carey (1986) also proposed that the coding of spatial relations for different recognition tasks might change depending on the demands of the task (see also Carey, 1992; Rhodes, Brake, & Atkinson, 1993; Tanaka & Farah, 1991). Diamond & Carey (1986) proposed that recognition tasks in which distinctions must be made among stimuli that have similar features, but which, in their terms, do not "share the same configuration" (such as two different landscapes) can be distinguished by first order relations which are "the spatial relations among similar parts" (p 110). An example of first order relations would be "the distance between a foreground rock and a background tree" (Diamond & Carey, 1986, p. 110) in a landscape scene. Recognition tasks among stimuli that "share the same configuration" (such as distinguishing between the faces of two different people) use second order relations which are "distinctive relations among the elements that define the shared configuration" (Diamond & Carey, 1986, p. 110).
Diamond and Carey's (1986) concepts of first order and second order relations bear some similarity to the categorical and co-ordinate relations being tested in this paper. Although Diamond and Carey (1986) never mention exactly how first order relations are coded in their paper (e.g., categorical coding is never mentioned), one can infer that first order relations (like categorical relations) could be somewhat less metrically specific than second order relations (which, like co-ordinate relations, have greater metric specificity). First order relations and categorical relations differ, however, in that first order relations express a distance between the primitives in a stimulus. Diamond & Carey's (1986) example of first order relations is "the distance between a foreground rock and a background tree" (emphasis added) in a landscape (p. 110). In contrast, categorical relations theories posit only that the direction ("above", "below", and "side of") between the primitives is specified (and not the distance). The computational advantage of specifying only the direction of a spatial relationship is that although the distance between the primitives in the retinal projection of an object will change depending on how close or far the viewer is from the object, the direction of the relationship between the primitives will remain constant. Thus, Diamond & Carey's (1986) theory predicts that changes in the distance between the primitives in a stimulus would be more disruptive to recognition than changes in the direction while categorical theories make the opposite prediction.
A second difference between Diamond & Carey's (1986) first order/second order relations scheme and categorical/co-ordinate relations schemes is that each theory makes different predictions about the sort of stimuli that will activate the same representations in memory. Diamond & Carey (1986) propose that first order relations are used to distinguish stimuli that do not "share a configuration" while second order relations are used to distinguish stimuli that "share a configuration." Critical to the first order/second order relations theory, then, is a definition of when two stimuli share a "configuration". Diamond & Carey (1986) say that different human faces all share a configuration because "corresponding points may be identified on any two faces, and the faces...may be averaged. The resulting figure is also recognizable as a face. This...is what is meant by sharing the same configuration" (p. 110). As Diamond and Carey (1986) point out, these criteria for establishing a configuration are the same as Rosch's (1978) superimposition test. In contrast, categorical relations schemes (e.g., Biederman, 1987) posit that two stimuli will share a configuration only if they share the same primitives (e.g., "geons" in Biederman, 1987) and if their primitives have the same "above", "below", and "side of" relationships to one another. Many stimuli that would share the same configuration by Diamond and Carey's (1986) criteria would not share the same configuration in categorical relations theories. Experiment 2 in the current paper will provide a critical test of whether it is Rosch's (1978) superimposition test (as Diamond & Carey (1986) would predict) or whether it is the "above", "below", and "side of" relationships between the features of different stimuli (as categorical relations theories would predict) that define a configuration.
The Logic of the Experiments
In the current set of experiments, we tested whether the way of coding relations in the representations for face identification and basic level object recognition is qualitatively different using altered photographs of famous people as stimuli. Specifically, we altered face photographs by moving either one eye up from its normal position on the face or by moving both eyes up from their normal position on the face while preserving the relations between them. Examples of the manipulations used in the experiment can be seen in Figure 3 (photographs (not drawings) were used in the actual experiments).
The key element of interest about the eye manipulations illustrated in Figure 3 is that the two types of shape representation, categorical and co-ordinate, make different predictions about which sort of eye movement should be most disruptive to performance. Note that in the Two Eyes Moved face, the categorical relationships between the elements of the face have been preserved (i.e., the left eye is directly to the side of the right eye and both eyes are above and to the side of both the nose and mouth). In contrast, in the One Eye Moved face, the categorical relations have been more greatly disrupted, with the one of the eyes above, as well as to the side of, the other. Thus, theories positing categorical relations among primitives would predict that the Two Eyes Moved face should be more easily recognized than the One Eye Moved face, because the Two Eyes Moved face preserves the categorical relations of the original face. In contrast, less overall disruption of the locations of the elements of the face has occurred in the One Eye Moved condition than in the Two Eyes Moved condition as more of the elements of the One Eye Moved face are at their appropriate co-ordinates than in the Two Eyes Moved face. Thus, a recognition process in which shape is represented using co-ordinate relations would be able to recognize the One Eye Moved face more readily than the Two Eyes Moved face.
Given the strengths and weaknesses of the two methods of coding relations discussed earlier, we are predicting that the representation used for basic level object recognition will use categorical relations while that for identifying a particular face will use co-ordinate relations. In order to test this hypothesis, two different experimental tasks were conducted using the face stimuli with altered eye positions; in Experiment 1 participants performed a face identification task in which they had to determine the identity of a particular face, and in Experiment 2 participants performed an object recognition task in which they had to decide whether an object was or was not a human face (regardless of its identity). Because we hypothesize that face identification might use co-ordinate relations, we predicted that the "Whose face is it?" task would be better performed with the One Eye Moved faces (which have a small disruption in co-ordinate relations) than with the Two Eyes Moved faces (which have greater disruption of co-ordinate relations). On the other hand, because we hypothesize that basic level object recognition uses categorical relations, we predicted that the "Is this a face?" task would be more easily accomplished with the Two Eyes Moved faces (in which categorical relations are preserved) compared with the One Eye Moved faces (in which categorical relations have been disrupted).
Experiment 1: Whose Face Is It?
The purpose of both variations (A and B) of Experiment 1 was to determine whether face identification ("Whose face is it?") is disrupted more by moving one of the eyes of the face from its normal position or by moving both of the eyes of the face from their normal positions while keeping the relations between them constant. Shape recognition theories positing categorical relations would predict that the Two Eyes Moved faces should be easier to recognize in this task while theories positing co-ordinate relations would predict that the One Eyes Moved faces should be recognized more readily.
Experiment 1A tested whether moving one eye or moving two eyes would be more disruptive to recognition using a verification task in which participants are presented with a famous person's face (which sometimes has its eyes moved), followed by a famous person's name. The participant's task was to decide, as quickly as possible, whether the name and the face matched. This verification procedure was used to assess face identification because the task required the participant to access the identity of the pictured face, but did not require participants to generate the names of the faces themselves. Previous research has established that generating names for faces is a difficult task (Cohen, 1990; Cohen & Faulkner, 1986; Young, Hay, & Ellis, 1985), and a pilot experiment in which participants were required to generate names for the faces showed that participant performance on the naming task was highly variable and error prone. The current method produced much cleaner data.
The participants were 36 undergraduate students in the subject pool at Iowa State University naive to the purpose of the current experiments. They were all native English speakers who reported normal or corrected-to-normal vision and received course credit for their participation. None participated in any of the other experiments reported here.
The experiment was controlled by a Macintosh Quadra 800 computer using Picture Perception Lab software (Kohlmeyer, 1992). Participants responded using a two button response box attached to a National Instruments NB-DIO-24 timing board that gave ±0.5 ms. response time accuracy. Stimuli were presented on an Apple 17 inch color monitor with a resolution of 832 x 624 pixels and a vertical refresh rate of 75 Hz.
Stimuli for the experiment consisted of color pictures of 120 different people's faces: 60 target faces and 60 distracter faces. The faces were scanned from color photographs at a resolution of 72 dots per inch. The faces were scanned so that their maximum extent would fit into a 512 x 512 pixel box. Given the conditions of experimental presentation, the faces would fit in a box with 8.3 X 8.3 of visual angle. The photographs were all full frontal pictures (no profile shots) in which both eyes, the nose, and the mouth were clearly visible, and each had enough of the forehead showing that the eye manipulations used in the experiment could be made without altering the face's hairline. All the faces were of people whom the authors judged would be familiar to undergraduates from Iowa. A full list of the faces used in the experiment may be found in the Appendix.
Alterations were made to the face pictures in order to create the stimuli used in the experiment. For each of the 60 target faces, three versions were created: a No Eyes Moved version (the original picture), a One Eye Moved version, in which one of the eyes was moved vertically from its original position, and a Two Eyes Moved version in which both of the eyes were moved vertically from their original positions, but the relations between them were maintained (see Figure 4 for examples of these manipulations). The eyes were moved using a computer program for digital image manipulation (Adobe Photoshop). For each eye that was to be moved, an area including the eye, the eyebrow, and 0.5 cm below the eye was cut out of the image and moved 1.5 cm vertically upward (0.7 of visual angle). The resulting blank area caused by the displacement of the eye was replaced by copying skin pixels from the surrounding area into the blank area such that the resulting face appeared as seamless as possible. The face pictures were chosen so that enough of the forehead was visible that the moved eye area would not contact the face's hair. The nose was left unaltered by these manipulations.
For each of the 60 target faces, 30 were chosen randomly to have their left eye moved in the One Eye Moved condition and the rest had their right eye moved. For the 60 distracter faces, 20 were not altered (No Eyes Moved), 20 had one eye moved, and 20 had both eyes moved. For the 20 distracter faces with one eye moved, ten had the left eye moved and ten had the right eye moved.
Also used as stimuli in the experiment were 120 famous names. The names were written in 72 point Palatino bold font. Each name had the first name centered above the last name. When presented on the computer screen, the names could fit in a box 5.96 horizontal X 2.39 vertical of visual angle.
A masking stimulus was used in the experiment which consisted of numerous eyes, noses, and mouths cut from color photographs and overlapped. The masking stimulus was a square of 8.3 X 8.3 of visual angle.
Presentation of stimuli in the experiment was self-paced. Participants would click a mouse button to begin each trial. After clicking the mouse button, a fixation cue would be presented on the screen for 504 ms, followed by one of the face pictures for 140 ms, followed by the name of a famous person for 1120 ms, followed by the masking stimulus for 504 ms.
The participant was instructed that his or her task was to decide whether or not the famous name was the correct name for the picture that preceded it, ignoring any possible movement of the picture's eyes. The participant was instructed to press the left button on the response box if the name and the face matched and the right button if the name and the face did not match. Participants were informed that their response times to perform this task were being recorded and that they should try to respond as quickly as possible while attempting to get 90% of the trials correct. Participants were shown several example faces illustrating the sorts of eye positions that the stimuli might show.
During the experiment, each participant saw 120 trials: 60 positive trials in which the names and the faces matched (using the target faces) and 60 negative trials in which the names and the faces did not match (using the distracter faces). When matching improper names to faces for the negative trials, care was taken to use the names of famous people who were the same sex and race as the picture with which the name was matched. An attempt was made to match the hair color of the person whose name was presented with the face that was presented in the distracter trials, but this criterion could not always be met. See the Appendix for a list showing how names were matched with faces in the distracter trials.
The order of presentation of the stimuli was chosen randomly. Half the participants saw the stimuli in one order and the other half saw the stimuli in reversed order. All participants saw exactly the same negative (distracter) trials. For both the positive (target) trials and negative (distracter) trials, each participant saw 20 faces with no eyes moved, 20 with one eye moved and 20 with both eyes moved. Stimulus presentation was balanced such that across a set of three participants, each target face appeared once and only once in each of the three eye movement conditions. Every target face appeared equally often in all three conditions across the 36 participants.
Participants were presented with 20 practice trials (ten positive and ten negative) prior to the experiment. The practice trials used faces that did not appear in the experiment proper, but used the same sorts of eye movement manipulations.
The results of Experiment 1A are shown in Figures 5 and 6. The distracter trials in Experiment 1A had a mean response time of 845 ms. with 21.3% errors.
All statistical hypotheses in this paper were tested with a two-tailed a level of .05. A One Way Within Participants Analysis of Variance was conducted on the response time and error rate data from Experiment 1A with Eye Movement (No Eyes Moved vs. One Eye Moved vs. Two Eyes Moved) as the single independent variable. The data were analyzed both by participants and by faces.
For the response time data, a reliable overall F ratio was obtained in both the by participant analysis (F(2, 70) = 5.667, p < .01, MSE = 2240.61) as well as the by face analysis (F(2, 118) = 5.396, p < .01, MSE = 4233.30). The main comparison of interest in the data was whether the One Eye Moved condition would produce reliably different response times than the Two Eyes Moved condition. Planned comparisons between the One Eye Moved and Two Eyes Moved conditions in the response time data showed that the One Eye Moved condition produced reliably faster response times than the Two Eyes Moved condition both when analyzed by participant (F(1, 70) = 6.524, p < .02) and by face (F(1, 118) = 5.916, p < .02).
For the error rate data, a reliable overall F ratio was obtained both in the analysis by participant (F(2, 70) = 10.006, p < .0005, MSE = .005) and in the analysis by face (F(2, 118) = 8.732, p < .0005, MSE = .009). Planned comparisons revealed that there were reliably more errors in the Two Eyes Moved condition than in the One Eye Moved condition both when the analysis was done by participant (F(1, 70) = 12.171, p < .001) and by face (F(1, 118) = 10.621, p < .005).
The results showed that the One Eye Moved condition produced reliably fewer errors and faster response times than did the Two Eyes Moved condition. Because the identification performance was superior for the One Eye Moved faces, the results of Experiment 1A are consistent with theories positing that the representation underlying face identification uses co-ordinate relations. The results are inconsistent with all current theories of shape recognition positing categorical relations.
As a converging operation, and in order to make the results of the "Whose face is it?" task more directly comparable with those in the "Is it a face?" task presented in Experiment 2, a variation of the task presented in Experiment 1A was conducted in Experiment 1B. In this version of the task, participants were simultaneously provided examples of One Eye Moved and Two Eyes Moved versions of famous faces and were simply asked which one looks more like the person depicted (e.g., "Which one of these two pictures looks more like Michael Jackson?).
The participants were 20 undergraduate students from the subject pool at Iowa State University naive to the purpose of the current experiments. They were all native English speakers who reported normal or corrected-to-normal vision and received course credit for their participation. None participated in any of the other experiments reported here.
The stimuli for the experiment consisted of both the One Eye Moved and Two Eyes Moved versions of the faces of the 12 people the authors judged to be the most famous of the target faces used in Experiment 1A (Bill Clinton, Bill Cosby, Tom Cruise, Princess Diana, Harrison Ford, Mel Gibson, Michael Jackson, Madonna, Demi Moore, Eddie Murphy, Jerry Seinfeld, and Robin Williams). The twelve most famous faces were selected in order to use only faces with whom the participants in the experiment would be familiar. The One Eye Moved and Two Eyes Moved versions of each of the twelve faces were printed side by side (in 256 shade grayscale) on sheets of paper. Each sheet used in the experiment would begin with the question "Which picture looks more like X?" (where X was the name of the person depicted) followed by the One Eye Moved and Two Eyes Moved versions of the person named in the question presented side by side (the order of presentation of the faces was balanced for each participant so that half the time, the One Eye Moved version was presented on the left, and half the time it was presented on the right). Blanks were provided on each sheet above the two pictures where the participant could put an "X" indicating which of the two faces most resembled the person named.
Each participant was given a packet containing sheets depicting all 12 of the famous people used in the study presented in the manner described in the Apparatus subsection. The participants were instructed that they were going to see distorted pictures of famous people, and that for each person, they were to decide which of the two pictures presented most resembled the person named in the question above the pictures. Participants were instructed to place an "X" in the blank above the picture that they thought most resembled the person named in the question.
The results showed that of the 20 participants, 15 selected the One Eye Moved stimuli as being the best depiction of the person named for a majority of the faces (i.e., more than 6 of the 12 faces). The remaining five participants selected the Two Eyes Moved stimuli as the best depiction of the person named for a majority of the faces. A chi-square test showed that reliably more participants selected the One Eye Moved stimuli for the majority of the faces than selected the Two Eyes Moved stimuli (c2(1) = 5.00, p < .05). Over all the comparisons (20 participants X 12 comparisons each = 240 total comparisons), the One Eye Moved faces were selected as the best depiction 172 times (71.7% of all responses) and the Two Eyes Moved faces were selected as the best depiction 68 times (28.3% of all responses). The mean percentage of times the One Eyes Moved faces were selected as the best depiction was reliably greater than 50% (t(19) = 2.91, p < .01).
The results of Experiment 1B provide converging evidence that in a "Whose face is it?" task, the One Eye Moved faces are perceived to be better depictions than are the Two Eyes Moved faces. The results thus replicate the principal findings of Experiment 1A using a converging operation, again suggesting that the representation used to recognize a particular person's face codes the positions of the primitives in a co-ordinate manner.
Some critics of this interpretation of the results of Experiment 1 have argued that participants may have adopted a strategy of simply ignoring the eye that was out of place in the One Eye Moved stimuli and only processing the features that were in their appropriate positions. Along the same lines, perhaps participants mentally "moved" the out of place eye down in the One Eye Moved stimuli so that it was directly across from the eye that was in the correct position in order to produce a face that had the eyes in the same positions as the original face. Because participants may have used one of these strategies to perform the task, these critics have argued that the results of Experiment 1 do not support the hypothesis that co-ordinate relations are used during face identification.
We would agree that participants may have used one or both of these strategies to perform the task in Experiment 1. However, both of these strategies would enhance recognition if and only if a co-ordinate representation underlies the face identification process as both strategies assume that identification will be enhanced if the face's features are at their proper co-ordinates (an assumption that is not shared by categorical relations theories). If participants actually used these strategies to improve their performance, rather than falsifying the hypothesis that the face identification process uses co-ordinate relations, it would be excellent evidence in favor of that hypothesis.
The results of Experiment 1 falsify all extant categorical relations theories (all of which use the categories "above", "below", and "side of" to code the primitives' spatial positions) as accounts of the way in which spatial relations are represented during face identification because all would predict that the Two Eyes Moved stimuli would be recognized more readily than the One Eye Moved stimuli. The categories "above", "below", and "side of" have been posited by categorical theorists because they appear to be both necessary and sufficient to account for human basic level object recognition performance (Biederman, 1987; Hummel & Biederman, 1992). Although it is possible that a different set of categorical spatial relations (i.e., other than "above", "below", and "side of") could be devised which could account for the results of Experiment 1, it is difficult to imagine what they would be. In order to account for the Experiment 1 results, the set of categorical relations would have to give the same structural description to the No Eyes Moved and One Eye Moved stimuli, but a different structural description to the Two Eyes Moved stimuli. Perhaps more importantly, a post hoc theory based on such relations would make the same predictions as the (more parsimonious) co-ordinate based account, which predicted the findings a priori. The only existing theories that can account for the results of Experiment 1 are co-ordinate relations theories.
Experiment 2: Is it a Face?
The purpose of Experiment 2 was to determine if the same pattern of results achieved in Experiment 1 (Two Eyes Moved more disruptive than One Eye Moved) would still be obtained if the task were switched from a face identification task (Whose face is it?) to a basic level object recognition task (Is this object a human face?). It is quite possible to recognize a stimulus as being a human face even if we fail to recognize the identity of the person whose face we are seeing. As pointed out in the Introduction, current structural description models of basic level object recognition (e.g., Biederman, 1987, Hummel & Biederman, 1992) would be unable to distinguish among different human faces (because all human faces would activate the same representation), and would therefore not be useful for performing face identification. However, such models would be useful for distinguishing human faces from other basic level objects. Also, as pointed out in the Introduction, a representation based on co-ordinate relations might not generalize across variations among instances of a basic level category as easily as would a categorical representation.
Experiment 2 will also provide a critical test of whether the representation used for basic level object recognition is defined by Rosch's (1978) superimposition test, (as predicted by Diamond & Carey's (1986) first and second order relations theory) or whether it is the "above", "below", and "side of" relations between parts that define the representation that is activated (as categorical relations theories would predict). Recall from the Introduction that in Diamond and Carey's (1986) theory, objects will share a configuration if a) corresponding points may be identified on the two objects and b) if, when the corresponding points are averaged, the resulting figure is recognizable as an example of the object. Note that, under these criteria, the No Eyes Moved, One Eye Moved, and Two Eyes Moved faces share exactly the same configuration (i.e., corresponding points can be identified on each type of stimulus and, if averaged, they would produce a stimulus recognizable as a human face). As such, Diamond & Carey's (1986) theory predicts that One Eye Moved and Two Eyes Moved stimuli should be equally recognizable as "human faces" (because both are equally good examples of the "human face" configuration under their criteria). In contrast, categorical relations theories predict that the One Eye Moved stimuli should be more difficult to recognize as "human faces" than the Two Eyes Moved stimuli because the Two Eyes Moved stimuli preserve the "above", "below"," and "side of" relations between a human face's primitives while the One Eye Moved stimuli do not.
The purpose of Experiment 2A was to determine whether moving one of the eyes of a face or both eyes together (in the same manner as in Experiment 1) would be more disruptive to recognition when participants were required to distinguish whether an object was or was not a human face.
The greatest difficulty with conducting an experiment of this sort is selecting appropriate distracter stimuli for the task. The most obvious method would be to use objects other than faces as distracters (e.g., table, telephone, car, etc.). The difficulty with using this method is that deciding whether a stimulus is a human face or another common basic level object is an extremely easy task that can be accomplished simply by looking for a distinctive feature (e.g., Does the stimulus have hair? Does it have a nose?). Participants can easily perform such a task while not attending to the eyes of the faces at all, and response times and error rates on the task are likely to exhibit a floor effect. A pilot study with 40 participants using typical basic level objects as distracters confirmed that performance on this task was at floor with extremely fast response times (mean response times less than 400 ms) and low error rates (overall less than 3%).
Given that using other basic level objects as distracters is not feasible, what other sorts of stimuli might be used to assess the type of representation used to recognize an object as being a human face? Any set of distracters that could adequately test whether co-ordinate or categorical relations are used when recognizing whether a stimulus is a human face must have the following attributes: a) they must have features similar to faces so that participants cannot use the presence of a simple feature (e.g., the presence of hair or a nose) to perform the task but must rather access the memory representation normally used to recognize an object as being a "human face," and b) they would have to activate a different structural description than the faces so that if such a representation exists, participants would be able to use it to perform the task.
The problem of finding appropriate distracters has been encountered by students of face recognition before, and we decided to use a variation of the method successfully used by Valentine and Bruce (1986) to examine basic level recognition of faces. The distracters (the non-face stimuli) in this experiment were constructed by taking the Original, One Eye Moved, and Two Eyes Moved face pictures and removing either one of the eyes, the nose, or the mouth, and replacing it with another feature from the same face. For example, some of the distracter objects would be faces that had two noses and no mouth, or three eyes and no nose, or two noses and only one eye (see Figure 7 for examples). This procedure was designed to lift performance off the floor and allow a comparison of whether One Eye Moved or Two Eyes Moved is more disruptive to basic level recognition of faces.
Although the distracter stimuli used in this experiment are not objects that people normally encounter, would using such distracters allow us to assess the nature of the representation normally used during basic level object recognition? The most parsimonious explanation for how participants would perform the task in Experiment 2A is to posit, as all shape recognition theories do, that there is a memory representation for recognizing a "human face" whose activity level increases to the extent that an input stimulus matches the recognition system's description of a "human face." The target stimuli would necessarily have to activate that representation more strongly than would the distracter stimuli because although some of the target stimuli have the positions of their facial features altered relative to a normal face, the distracter stimuli have exactly the same sort of alterations in the features' positions but additionally have alterations in the primitives. Because of this, the activation level of the memory representation for recognizing a human face must be lower for the distracter stimuli than for the target stimuli (by any theory), and thus, the activation level of this representation can easily be used to determine whether a stimulus is a target or a distracter (i.e., if the activation level exceeds some threshold, then the stimulus must be a target). Therefore, the participants' response times to recognize the three types of target stimuli (No Eyes Moved, One Eye Moved, Two Eyes Moved) can be used to determine how strongly each activates the memory representation for recognizing "human face" at the basic level.
The participants were 30 undergraduate students in the subject pool at Iowa State University. They were all native English speakers who reported normal or corrected-to-normal vision, and received course credit for their participation. All were naive to the purpose of the current experiment, and none participated in any of the other experiments reported here.
The apparatus used for collecting data was the same as in Experiment 1A. The "Face" stimuli for the experiment consisted of the faces of 40 of the 60 people used in the other experiments using the No Eyes Moved, One Eye Moved and Two Eyes Moved versions from Experiment 1A. The 40 people whose faces were used in this experiment were chosen randomly from the larger set. Only 40 faces were used in order to reduce the number of distracter stimuli that had to be created for the experiment.
A total of 40 "Non-Face" (distracter) stimuli were created by copying either an eye, a nose, or a mouth from each of the 40 target faces over one of the other features in the face (i.e., features were only duplicated within a particular face, no mixing of features from different faces occurred), resulting in a non-face that had either an extra eye, nose, or mouth. Skin pixels around the duplicated feature were smoothed in order to create as seamless a picture as possible (see Figure 7 for examples). Of the 40 distracter faces, 13 were created from stimuli with the eyes in their original positions (i.e., No Eyes Moved), 14 were created from stimuli with one eye moved, and 13 were created from stimuli with both eyes moved. Therefore, the location of the features for the distracter stimuli matched, as closely as possible, the locations of the features in the target stimuli.
When altering the features in the distracter stimuli, the left eye, right eye, mouth and nose were all substituted for one another with the exception that the left and right eyes never replaced one another. Of the 40 distracter stimuli, the breakdown of feature replacement went as follows: the nose replaced the mouth in five stimuli, the nose replaced the left eye in four stimuli, the nose replaced the right eye in five stimuli, the mouth replaced the nose in five stimuli, the mouth replaced the left eye in three stimuli, the mouth replaced the right eye in five stimuli, the left eye replaced the mouth in four stimuli, the left eye replaced the nose in four stimuli, the right eye replaced the nose in one stimulus, and the right eye replaced the mouth in four stimuli.
Figure 7. Examples of three of the distracter stimuli (Ross Perot, Harrison Ford, and Barbra Streisand) used in Experiment 2A. Note that in each picture, one of the features of the face (either the eye, nose, or mouth) has been replaced by a different feature. The pictures used in the actual experiment were in color.
Presentation of stimuli in the experiment was paced by the participants. Participants would click a mouse button to begin each trial. After clicking the mouse, a fixation cue would be presented on the screen for 504 ms followed by either a picture of one of the faces or one of the non-faces for 133 ms followed by the masking stimulus for 504 ms.
The participant was instructed that his or her task was to decide whether the picture presented was a face or a non-face. Participants were told that the face stimuli might have the positions of their eyes altered, and were shown several example faces illustrating the sorts of eye position changes that stimuli in the experiment might undergo. The participant was instructed to press the left button on the response box if the object was a face and the right button if the object was a non-face. Participants were informed that their response times to perform this task were being recorded and that they should try to respond as quickly as possible while attempting to get 90% of the trials correct. Participants were shown several example faces and non-faces illustrating the sorts of alterations that might occur in the non-face stimuli.
During the experiment, each participant saw 240 trials: 120 face trials (for each of the 40 faces, each participant saw the No Eyes Moved, One Eye Moved and the Two Eyes Moved versions) and 120 non-face trials (each participant saw each of the 40 non-face distracter stimuli three times). The order of presentation of the stimuli was chosen at random. Half the participants saw the stimuli in one order and the other half saw the stimuli in reversed order.
Participants were presented with 20 practice trials (ten faces and ten non-faces) prior to the experiment on which no data were collected. The practice trials used faces and non-faces that did not appear in the experiment proper.
The results of Experiment 2A are shown in Figures 8 and 9. The distracter trials in Experiment 2A had a mean response time of 617 ms. with 7.0% errors.
A One Way Within Participants ANOVA was conducted on the response time and error rate data from Experiment 2A with Eye Movement (No Eyes Moved vs. One Eye Moved vs. Two Eyes Moved) as the single independent variable. The data were analyzed both by participants and by faces.
For the response time data, a reliable overall F ratio was obtained in both the by participant analysis (F(2, 58) = 87.95, p < .0001, MSE = 778.63) as well as the by face analysis (F(2, 78) = 46.67, p < .0001, MSE = 2288.54). The main comparison of interest in the data was whether the One Eye Moved condition would produce reliably different response times than the Two Eyes Moved condition. Planned comparisons between the One Eye Moved and Two Eyes Moved conditions in the response time data showed that the Two Eyes Moved condition produced reliably faster response times than the One Eye Moved condition both in the by participant analysis (F(1, 58) = 36.45, p < .0001) and in the by face analysis (F(1, 78) = 22.483, p < .0001).
For the error rate data, a reliable overall F ratio was obtained both in the analysis by participant (F(2, 58) = 28.69, p < .0001, MSE = .002) and in the analysis by face (F(2, 78) = 16.34, p < .0001, MSE = .005). Planned comparisons revealed that there were reliably more errors in the One Eyes Moved condition than in the Two Eyes Moved condition both when the analysis was done by participant (F(1, 58) = 32.11, p < .0001) and by face (F(1, 78) = 18.29, p < .0001).
The results of Experiment 2A are clear and dramatic; when performing a basic level (Is it a face?) recognition task, moving one eye independently was much more disruptive to recognition than moving both eyes together. The results are in direct contrast to the results achieved in the face identification task in Experiment 1 (Whose face is it?) in which moving two eyes was always more disruptive than moving one eye, and strongly suggest that a categorical coding of relations is used during basic level recognition.
One possible alternative explanation for the results is that participants were performing a serial search of each stimulus in which they looked for an "extra" feature (i.e., an extra eye, nose, or mouth), and that, for some reason, having the eyes directly across from one another (as in the No Eyes Moved and Two Eyes Moved stimuli) is a configuration that can be searched more quickly than can the One Eye Moved stimuli. This hypothesis can be tested by examining the data from the distracter trials. If the configuration in which the uppermost features are directly across from one another is easier to search, then this configuration should also be easier to search in the distracter trials and should therefore show faster response times than the configuration in which the two uppermost features are not directly across from one another. In fact, distracter trials in which the uppermost features were directly across from one another (mean response time = 626 ms., error pct. = 6.62%) were not reliably different from those in which they were not (mean response time = 619 ms., error pct. = 7.78%; t(29) = 0.47, ns for response times; t(29) = 1.10, ns for error rate) falsifying this hypothesis as an explanation for the results.
A critic of Experiment 2A might argue that the task is artificial because in the real world people never have to distinguish among the sorts of stimuli used in the experiment, and thus participants may have adopted a "specialized strategy" of some sort (which presumably does not rely on the representation normally used during basic level recognition) for performing the task. There are several responses to such a position: First, such a position is unparsimonious because, as pointed out in the Introduction to this experiment, the memory representation normally used for recognizing an object as a human face could easily be used to perform this task. Therefore, positing a specialized strategy to account for the results is unnecessary. Second, simply stating that "some sort of specialized strategy was used" without providing an alternative account of how the task was performed fails to explain the huge effects of the eye movement manipulation that were observed in Experiment 2A. Third, and most importantly, Experiment 2B will provide converging evidence of the use of categorical coding during basic level recognition .
Experiment 2B attempted to replicate the results of Experiment 2A using a converging operation. Experiment 2B is in every way identical to Experiment 1B (the task in which participants were presented with the One Eye Moved and Two Eyes Moved pictures of a person and were asked to judge which looked most like the person depicted) except that participants were asked, for each pair of pictures they saw, "Which picture looks most like a human face?" rather than, for example, "Which picture looks most like Madonna?" as in Experiment 1B.
The participants were 20 undergraduate students in the subject pool at Iowa State University naive to the purpose of the current experiments. They were all native English speakers who reported normal or corrected-to-normal vision and received course credit for their participation. None participated in any of the other experiments reported here.
The stimuli were in every way identical to those used in Experiment 1B except that each sheet of stimuli used in the experiment would begin with the question "Which picture looks more like a human face?" rather than "Which pictures looks more like X?" (where X was the person depicted) as in Experiment 1B.
The procedure was identical to that of Experiment 1B except that participants were instructed to place an "X" in the blank above the picture that they thought most resembled a human face.
The results showed that of the 20 participants, two selected the One Eye Moved stimuli as being the best depiction of a human face for a majority of the faces (i.e., more than 6 of the 12 faces). The remaining 18 participants selected the Two Eyes Moved stimuli as the best depiction of a human face for a majority of the faces. A chi-square test showed that reliably more participants selected the Two Eyes Moved stimuli for the majority of the faces than selected the One Eye Moved stimuli (c2(1) = 12.80, p < .001). Over all the comparisons (20 participants X 12 comparisons each = 240 total comparisons), the Two Eyes Moved faces were selected as the best depiction 213 times (88.75% of all responses) and the One Eye Moved faces were selected as the best depiction 27 times (11.25% of all responses). The mean percentage of times the Two Eyes Moved faces were selected as the best depiction of a human face was reliably greater than 50% (t(19) = 7.44, p < .001).
The results of Experiment 2B show that the Two Eyes Moved stimuli were perceived to be better depictions of human faces than were the One Eye moved stimuli, thus replicating the results of Experiment 2A. Note the striking contrast between the results of the "Which face looks most like X?" task of Experiment 1B in which 75% of participants chose the One Eye Moved faces as the best depiction, and the "Which looks most like a human face?" task of Experiment 2B in which 90% chose the Two Eyes Moved faces as best. The results of both versions of Experiment 2 suggest that basic level recognition of faces is accomplished using a categorical representation. The results also suggest that the configuration of an object is defined by the "above", "below", and "side of" relations between the features of an object (as categorical theories predict) rather than by Rosch's (1978) superimposition test (as Diamond & Carey (1986) predict).
Several critics have suggested that the One Eye Moved faces were difficult to recognize as faces in Experiment 2 because real people don't have one eye placed several inches above the other. Although admittedly this manipulation is outside the range of variation for human faces it is also the case that the Two Eyes Moved faces are well outside the range of variation for human faces. No human face exists that has both eyes in the middle of the forehead as the Two Eyes Moved faces do (if you doubt this, imagine trying to find a pair of eyeglasses that would fit the Two Eyes Moved faces shown in Figure 4). Indeed, if anything, the Two Eyes Moved faces violate more anatomical constraints than do the One Eye Moved faces because the Two Eyes Moved faces have two elements at impossible locations while the One Eye Moved faces have only one. Since both the One Eye Moved and Two Eyes Moved faces are completely outside the range of human variation, why is it that when looking at the stimuli people get such a strong perception that the Two Eyes Moved stimuli are better depictions of human faces than the One Eye Moved stimuli? The likely reason is because the basic level representation used for recognizing a human face represents the features' positions in relative terms (i.e., categorically) rather than in absolute terms (i.e., co-ordinately). The fact that the Two Eyes Moved faces look less "odd" than the One Eye Moved faces is a testament to the fact that the basic level recognition system, unlike the system used for face identification, is fairly insensitive to metric variations that do not lead to categorical changes among the positions of the primitives.
This strong sense of perceptual similarity between objects that share the same categorical relations among their parts can be demonstrated even using nonsense stimuli with which people have no familiarity. In Figure 10, which of the two lower nonsense objects (B or C) looks most like the upper object (A)? A co-ordinate coding of relations would predict that object B should be more similar to A because in object B, only the square is at different co-ordinates than in object A (while the oval and the triangle are at the same co-ordinates), but in object C, the square and the oval are at different co-ordinates than in object A. In contrast, a categorical theory would predict that object C should look more like object A because the above, below, and side of relationships between the shapes are identical in objects A and C while the above/below relationship between the oval and the square is different in object B. We suspect that most readers will perceive object C to be more like object A than object B, just as categorical theories predict.
The result that Two Eyes Moved stimuli are perceived to be more like human faces than the One Eye Moved stimuli holds even if the distance the eyes are moved is made more extreme. The two faces in Figure 11 were created using a drawing with an extremely high hair line and then making the One Eye Moved and Two Eyes Moved manipulations by moving the eyes as far as was possible. Despite the large distortions, 15 of 20 subjects asked which of the faces in Figure 10 looked most like a human face picked the Two Eyes Moved version (c2(1) = 5.00, p < .05).
Some critics of Experiment 2 have argued that the Two Eyes Moved faces were perceived to be more like human faces than the One Eye Moved faces because, like unaltered human faces, the Two Eyes Moved faces were symmetrical while the One Eye moved faces were not. Thus, perhaps participants perceived the Two Eyes Moved faces to be more "face-like" because of their symmetry, not because they preserved the categorical positions of the faces' features (as we are claiming).
The hypothesis that symmetry was responsible for the results of Experiment 2 can be tested using the distorted faces shown in Figure 12. Note that the face on the left has been distorted in the manner of the One Eye Moved stimuli from Experiments 1 and 2 and is asymmetrical while the face on the right (hereafter called the "Highly Distorted face") is symmetrical. Also note that the One Eye Moved face has only one feature (the eye) at an incorrect position relative to the other features, while the Highly Distorted face has massive disruption of almost all the relative positions of the features. A symmetry account of the results of Experiment 2 would thus predict that the symmetrical, Highly Distorted face ought to be perceived as more like a human face than the asymmetrical One Eye Moved face. In contrast, if the categorical positions of the features were responsible for the results of Experiment 2, then the One Eye Moved face (small disruption of categorical positions) should be perceived to be more like a human face than the Highly Distorted face (large disruption of categorical positions).
Twenty undergraduate students, naive to the purposes of the demonstration, were shown the faces in Figure 12 (for ten participants the One Eye Moved face was on the left and for ten it was on the right) and were asked which looked more like a human face. All 20 chose the One Eye Moved face as the best depiction (c2(1) = 20.00, p < .001), thus falsifying the hypothesis that symmetry was responsible for the results of Experiment 2. Only a categorical representation scheme predicts the obtained pattern of results when participants are asked to judge which looks most like a human face: Two Eyes Moved better than One Eye Moved better than the Highly Distorted face.
The two variations of Experiment 1 showed a pattern of results in which moving one of the eyes vertically from its normal position in a face was less disruptive to face identification (i.e., Whose face is it?) than was moving both of the eyes vertically while keeping their relations to one another intact. In contrast, the basic level object recognition task (Is this a face?) used in Experiment 2 showed the opposite pattern of results. The results suggest that two different recognition systems exist with qualitatively different ways of coding the spatial positions of their primitives: a system using categorical relations (which is used for basic level object recognition) and a system using co-ordinate relations (which is used for face identification).
Are the Results Surprising?
When we have presented this work at colloquia and professional meetings, a frequent comment of those in attendance is that they don't find the results of these experiments surprising. Indeed, when one examines the face pictures presented in Figure 4, it is intuitively obvious that the One Eye Moved faces look more like Elvis and JFK than the Two Eyes Moved faces, and that the Two Eyes Moved faces look more like "human faces" than the One Eye Moved faces. We agree that the results from the experiments reported here are highly intuitive after one has a look at the stimuli, and, indeed, regard this as a strength of our research rather than a weakness. Although the results are intuitive, the research makes a critical theoretical contribution because the idea that face identification uses a co-ordinate representation and basic level object recognition uses a categorical representation is completely unacknowledged in the literature on the differences between the two processes.
The most comprehensive recent review of the differences in the processing of faces and basic level objects (Bruce & Humphreys, 1994) makes no mention of the idea that face identification uses co-ordinate relations while basic level object recognition uses categorical relations, and we are aware of no other published review of the differences between the two process that makes this point. Additionally, numerous students of face identification have explicitly (and recently) stated in print that the representation used to identify faces is a structural description (e.g., Ellis, 1992; Farah, 1992, 1994; George, Evans, Fiori, Davidoff, & Renault, 1996; Moscovitch, Winocur, & Behrmann, 1997). The results reported here argue that face identification is not accomplished using a structural description (no extant structural description theory can account for the results of Experiment 1), but rather appears to use a co-ordinate based system. Further, the vast majority of theories of object recognition that have been proposed in recent years have posited co-ordinate coding of relations (e.g., BΈlthoff & Edelman, 1992; Edelman & Weinshall, 1991; Intrator & Cooper, 1992; Intrator & Gold, 1993; Lowe, 1987; Olhausen, Anderson & Van Essen, 1993; Poggio & Edelman, 1990; Poggio & Vetter, 1992; Siebert & Waxman, 1992; Ullman, 1989; Ullman & Basri, 1991) rather than categorical coding (contrary to what the results of Experiment 2 suggest). Thus, extant models of both face identification and object recognition require modifications to account for the current results.
Further, the idea that two separate recognition systems (categorical and co-ordinate) exist is also relevant to understanding a number of issues in the current literature on visual cognition. If there are two separate recognition systems having qualitatively different ways of coding relations as the experiments reported here suggest, such a distinction can explain: a) laterality effects observed in recognition performance, b) the phenomenon that faces are recognized poorly relative to objects when presented upside down, c) mistakes people make when drawing pictures, and d) patterns of recognition deficits shown in neurological patients. Each of these issues will be discussed below.
Laterality Effects in Recognition
A number of researchers have examined hemispheric processing differences during the performance of visual tasks involving categorical and co-ordinate relationships between different objects (see Hellige & Michimata, 1989; Kosslyn, 1987; Kosslyn, Chabris, Marsolek, & Keonig, 1992; Kosslyn, Koenig, Barrett, Cave, Tang, & Gabrieli, 1989; Rybash & Hoyer, 1992). The general finding from all of these studies is that there exists a strong right hemisphere advantage when participants must perform a task involving a co-ordinate judgment (e.g., judging whether a dot is within 3 mm of a line). Additionally, as pointed out earlier, the right hemisphere appears to have a substantial advantage in face identification as well (Davidoff, 1982; Ellis, 1983; Sergent, Ohta, & MacDonald, 1992). In contrast, the laterality differences observed when participants perform categorical tasks (e.g., deciding whether a dot is "above" or "below" a line regardless of its distance) are far less pronounced (see Kosslyn, Chabris, Marsolek & Keonig, 1992 for a discussion) with most research showing either a small left hemisphere advantage (e.g., Hellige & Michimata, 1989; Kosslyn et al, 1989) or no advantage for one hemisphere over the other (e.g., Rybash & Hoyer, 1992; Sergent, 1991). Similarly, the effects of laterality in basic level object recognition show exactly the same pattern as most studies report either a left hemisphere advantage (Bryden & Rainey, 1963; McKeever & Jackson, 1979; Wyke & Ettlinger, 1961; Young, Bion, & Ellis, 1980) or no hemispheric advantage (Biederman & Cooper, 1991; Kimura & Durnford, 1974; Levine & Banich, 1982). The data suggest that the right hemisphere may be more efficient coding co-ordinate relations while the left hemisphere may be slightly more efficient at coding categorical relations, thus explaining the pattern of laterality effects observed for face identification and basic level object recognition.
The Face Inversion Effect
The results of the current experiments suggest a possible explanation for the well established phenomenon that turning a face upside down disrupts face identification more than it disrupts basic level object recognition. Jolicoeur (1985) found that the amount of time required to identify objects that have been rotated in the picture plane (i.e., rotated in the manner that the hands of a clock rotate) is a peculiar "M-shaped" function of the amount of angular rotation. That is, rather than showing a typical mental rotation function that peaks when stimuli are rotated 180, the function for identifying basic level objects shows a dip at 180 where performance actually improves. This result has been replicated numerous times (Jolicoeur, 1988; Jolicoeur & Milliken, 1989; McMullen & Jolicoeur, 1990; McMullen & Farah, 1991).
Hummel and Biederman (1992) showed that a computational model of object recognition based on categorical relations can account for the M-shaped rotation function. Consider what happens to the categorical relations among the parts of the mug shown in Figure 13 as the mug's image is rotated in the picture plane. Suppose that the mug is represented categorically as being a "cylinder to the side of a curved cylinder." Note that as the mug is rotated in the picture plane, the description of the input mug will change to "cylinder above a curved cylinder" and will therefore provide weaker activation of the cup representation stored in memory. However, when the cup has been rotated completely around to 180, the side of relation between the two parts will be restored (i.e., the input cup is now a "cylinder to the side of a curved cylinder" again), thus providing a good match with the stored memory representation.
Figure 13. Note that as the cup rotates around, the relations of the two parts go from a) "side of" one another to b) one "above" the other to c) "side of" one another again. Thus, the categorical relations between parts that are "side of" one another are restored when the object is rotated 180.
Hummel and Biederman (1992) tested their computer model of object recognition (which uses categorical relations) with stimuli that had undergone planar rotation, and found that the system showed poorest performance at around 135, with performance improving as the rotation approached 180 thereby showing a pattern virtually identical to human recognition performance. Thus, it is possible that the M-shaped function for recognizing rotated basic level objects is a consequence of using categorical relations in the representation used for basic level recognition.
In contrast, imagine a co-ordinate shape representation model in which the primitives' exact distances from some reference point (or set of reference points) are specified. The most parsimonious cost function for such a model is one in which the activation of the memory representation of a particular shape decreases monotonically as the Euclidean distances of the primitives in the representation depart from their stored co-ordinates. As the input shape is rotated in the picture plane, such a model would predict the lowest activation of the stored memory representation when the shape has been rotated 180 because it is at this rotation that the distance is greatest between the input co-ordinates and the stored co-ordinates. Consistent with the idea that faces are identified using shape representations whose activation level is lowest at 180, Valentine and Bruce (1988) found a monotonic increase in response times to recognize faces as the angle of picture plane rotation increased from 0 to 180.
Note that in the hypothetical recognition time functions shown in Figure 14 that at 180, the recognition time for systems using co-ordinate relations is maximized (because it is here that the maximum mismatch between the co-ordinates of the input features and the stored memory representation occurs). In contrast, for the categorical representation, there is a local minimum in response time at 180 because the relations between primitives with "side of" and strictly "above/below" relations in the upright object have been restored when the input stimulus reaches 180. Therefore, the greater effects of inversion on face identification than for basic level object recognition may be a consequence of the manner in which the spatial positions of their primitives are coded (though further empirical work is needed to test this explanation).
Figure 14. Graph showing the hypothetical relationship between the amount of stimulus rotation and recognition time for systems that represent shapes using co-ordinate and categorical relations. Note that at 180, the co-ordinate relations function hits its maximum while the categorical relations function is at a local minimum.
The results reported here can explain Rock's (1984) observation that when people untrained in drawing are asked to draw a generic "human face" they invariably place the eyes much too high above the nose (although the eyes are always placed directly across from one another). Such a mistake is to be expected if the representation used for basic level recognition of a human face involves categorical relations in which the fact that the eyes are above the nose and directly to the side of one another is coded (i.e., if categorical coding is used), but the precise distances between the features is left unspecified.
Disorders of Recognition
Farah (1994) has argued that the human visual system has two distinct shape recognition systems: one that is necessary for the recognition of words and is useful for the recognition of objects, and another that is necessary for the recognition of faces and is also useful for the recognition of objects. The basis for this conclusion is the finding that in brain damaged patients who are experiencing disorders of recognition, object recognition is found to be impaired only if either face or word recognition is also impaired. Further, Farah (1994) found only one (questionable) case in which a patient showed impaired face and word recognition, but object recognition was spared.
We would argue that the system that is necessary for words and useful for objects is the one using categorical relations while the system that is necessary for faces and useful for objects is the one using co-ordinate relations. Note that in words, as in many basic level objects, the precise distance between the primitives (letters, in the case of words) is irrelevant to the recognition process although the categorical relationships among the primitives are important (else we would be unable to distinguish the words "pat" and "tap"). Ideally, a system for recognizing words would be able to generalize over differences that would exist among different font styles in the distances between the letters, and using categorical relations to specify the relations of the letters would be one way such generalization could be accomplished. Consistent with the idea the word recognition uses categorical relations, Koriat and Norman (1989) found that the time to identify words that had undergone planar rotation showed exactly the same "M" shaped cost function (with a dip in response times at 180) that a system using categorical coding would predict! A similar "M" shaped cost function is observed when recognizing rotated scenes containing multiple objects (Diwadkar & McNamara, 1997) suggesting that categorical relations may be used during scene recognition as well.
Are faces the only sort of stimuli that are identified using co-ordinate relations? There are a number of reasons to suspect that the process used for identifying faces may also be used for identifying some other types of visual stimuli. Frequently, prosopagnosics (people who have lost the ability to recognize faces) also have difficulty distinguishing between different animals, cars, foods, flowers, and buildings (Pallis, 1955; Bornstein, 1963; Cole and Perez-Cruet, 1964; Newcombe, 1979; Damasio, Damasio, and Van Hoesen, 1982; Farah, Levinson, and Klein, 1995). Similarly, Diamond and Carey (1986) found that dog experts show inversion effects for pictures of dogs similar to those that are experienced for pictures of faces. Note that in most of these cases, the stimuli that cannot be distinguished are, like human faces, classes of stimuli that would not be easily distinguished by current theories positing categorical relations (e.g., Biederman, 1987) suggesting that the co-ordinate system may be used in circumstances when a discrimination must be made among stimuli that would activate the same categorical representation. Further research is needed to clarify the issue of exactly which sorts of shape discriminations use the categorical relations system and which use the co-ordinate relations system. The only purpose of the current paper was to demonstrate the existence of the two recognition systems and to provide an example of a case in which each system is used.
We certainly would not argue that the only distinction between face identification and basic level object recognition is the manner in which the spatial positions of the primitives are coded. Note that the distinction between categorical and co-ordinate relations cannot account for the finding that faces are more difficult to recognize in photographic negatives than are basic level objects. The primitives used in the two different forms of representation almost certainly differ as well. For example, Buhmann, Lange, von der Malsburg, Vorbruggen, and Wurtz (1991) have had great success with a face identification model that uses local fourier components as primitives (with relations coded using a co-ordinate system). However, the experiments reported here suggest that the manner in which the relations among primitives are coded in the shape representation is a major difference between the two processes.
Contributions of the Present Research
In closing, we would like to summarize the main contributions that the current paper makes to the face identification and basic level object recognition literature:
1. The research presented here is the first direct empirical evidence that face recognition uses co-ordinate relations while basic level object recognition use categorical relations, thus establishing the existence of two separate recognition systems that use qualitatively different ways of coding relations.
2. This is the first paper to demonstrate a stimulus manipulation (the eye movements in Experiments 1 and 2) in which one version of the manipulated stimuli shows better subordinate level recognition while the other version shows better basic level recognition.
3. This paper is the first to link the literature on laterality effects in basic level object recognition with the data on laterality effects in computing categorical relations between objects (a small left hemisphere advantage appears to exist for both) thus suggesting an explanation for the laterality differences observed in face identification and basic level object recognition.
4. This paper provides an explanation for Rock's (1984) observation that when people untrained in drawing are asked to draw a generic "human face" they invariably place the eyes much too high above the nose (although the eyes are always placed directly across from one another). Such a mistake is to be expected if the relations among elements in a "basic level" face are coded categorically without preserving precise distances.
5. This paper is the first to offer an explanation for the fact that basic level object recognition is less disrupted by planar rotation than is face recognition in terms of categorical vs. co-ordinate relations.
6. Finally, this paper is the first to speculate that the "M" shaped response time function obtained for recognizing words (Koriat & Norman, 1989) and scenes (Diwadkar & McNamara, 1997) that have undergone planar rotation may be due to categorical coding of the relations between the elements in the stimuli. The "M" shaped functions suggest that one of the chief distinctions between the two recognition systems proposed by Farah (1994) may be that one system codes co-ordinate relations while the other codes categorical relations.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147.
Biederman, I., & Cooper, E. E. (1991a). Evidence for complete translational and reflectional invariance in visual object priming. Perception, 20, 585-593.
Biederman, I., & Cooper, E. E. (1991b). Priming contour-deleted images: Evidence for intermediate representations in visual object recognition. Cognitive Psychology, 23, 393-419.
Biederman, I., & Cooper, E. E. (1991c). Object recognition and laterality: Null effects. Neuropsychologia, 29, 685-694.
Biederman, I., & Cooper, E. E. (1992). Size invariance in visual object priming. Journal of Experimental Psychology: Human Perception and Performance, 18, 121-133.
Bornstein, B. (1963). Prosopagnosia. In L. Halpern (Ed.), Problems of dynamic neurology Jerusalem: Hassadah Medical Organization.
Bruce, V., & Langton, S. (1994). The use of pigmentation and shading information in recognizing the sex and identities of face. Perception, 23, 803-822.
Bryden, M. P., & Rainey, C. A. (1963). Left-right differences in tachistoscopic recognition. Journal of Experimental Psychology, 66, 568-571.
Bulthoff, H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Science, 89, 60-64.
Buhmann, J., Lange, J., von der Malsburg, C., Vorbruggen, J. C., & Wurtz, R. P. (1991). Object recognition in the dynamic link architecture: Parallel implementation of a transputer network. In B. Kosko (ed.), Neural Networks for Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
Bruce, V., & Humphreys, G. W. (1994). Recognizing objects and faces. Visual Cognition, 1, 141-180.
Carey, S., & Diamond, R. (1977). From piecemeal to configural representations of faces. Science, 195, 312-314.
Cohen, G. (1990). Why is it difficult to put names to faces? British Journal of Psychology, 81, 287-297.
Cohen, G., & Faulkner, D. (1986). Memory for proper names: Age differences in retrieval. British Journal of Developmental Psychology, 4, 187-197.
Cole, M., & Perez-Cruet, J. (1964). Prosopagnosia. Neuropsychologia, 2, 237-246.
Cooper, E. E., Biederman, I., & Hummel, J. E. (1992). Metric invariance in object recognition: A review and further evidence. Canadian Journal of Psychology, 46, 191-214.
Damasio, A. R., Damasio, H., & Van Hoeson, G. W. (1982). Prosopagnosia: Anatomical basis and behavioral mechanisms. Neurology, 32, 331-341.
Davidoff, J. (1982). Studies with non-verbal stimuli. In J. G. Beaumont (Ed.), Divided Visual Field Studies of Cerebral Organization (pp. 29-55). New York: Academic Press.
Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology: General, 115, 107-117.
Dickinson, S. J., Pentland, A. P., & Rosenfeld, A. (1993). 3-D shape recovery using distributed aspect matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 174-198.
Diwadkar, V. A., & McNamara, T. P. (1997). Viewpoint dependence in scene recognition. Psychological Science, 8, 302-306.
Edelman, S. & Weinshall, D. (1991). A self-organizing multiple-view representation of 3-D objects. Biological Cybernetics, 64, 209-219.
Ellis, A. W. (1992). Cognitive mechanisms of face processing. Philosophical Transactions of the Royal Society of London, B, 335, 113-119.
Ellis, H. D. (1983). The role of the right hemisphere in face perception. In A. W. Young (Ed.), Functions of the Right Cerebral Hemisphere (pp. 33-64). New York: Academic Press.
Farah, M. J. (1992). Is an object an object an object? Cognitive and neuropsychological investigations of domain specificity in visual object recognition. Current Directions in Psychological Science, 1, 164-169.
Farah, M. J. (1994). Specialization within visual object recognition: Clues from prosopagnosia and alexia. In M. J. Farah & G. Ratcliff (Eds.), The Neuropsychology of High-level Vision (pp. 133-148). Hillsdale: Lawrence Erlbaum Associates.
Farah, M. J. (1995). Dissociable systems for visual recognition: A cognitive neuropsychology approach. In S. M. Kosslyn & D. N. Osherson (Eds.), Visual Cognition: An Invitation to Cognitive Science, Vol. 2 (pp. 101-119). Cambridge, MA: MIT Press.
Farah, M. J., Levinson, K., & Klein, K. (1995). Face perception and within-category discrimination in prosopagnosia. Neuropsychologia, 33, 661-674.
Galper, R. E. (1970). Recognition of faces in photographic negatives. Psychonomic Science, 19, 207-208.
Galper, R. E., & Hochberg, J. (1971). Recognition memory for photographs of faces. American Journal of Psychology, 84, 351-354.
George, N., Evans, J., Fiori, N., Davidoff, J., & Renault, B. (1996). Brain events related to normal and moderately scrambled faces. Cognitive Brain Research, 4, 65-76.
Hellige, J., & Michimata, C. (1989). Categorization versus distance: Hemispheric differences for processing spatial information. Memory & Cognition, 17, 770-776.
Hummel, J. E. (1994). Reference frames and relations in computational models of object recognition. Current Directions in Psychological Science, 3, 111-116.
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480-517.
Hummel, J. E., & Stankiewicz, B. J. (1996). Categorical relations in shape perception. Spatial Vision., 10, 201-236.
Intrator, N. & Cooper, L. N. (1992). Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions. Neural Networks, 5, 3-17.
Intrator, N. & Gold, J. I. (1993). Three-dimensional object recognition using an unsupervised BCM network: The usefulness of distinguishing features. Neural Computation, 5, 61-74.
Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory and Cognition, 13, 289-303.
Jolicoeur, P. (1988). Mental rotation and the identification of disoriented objects. Canadian Journal of Psychology, 42, 461-478.
Jolicoeur, P., & Milliken, B. (1989). Identification of disoriented objects: Effects of context of prior presentation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 200-210.
Kholmeyer, S. W. (1992). Picture perception lab: A program for picture experiments on the Macintosh II. Behavior Research Methods, Instruments, and Computers, 24, 67-71.
Kimura, D., & Durnford, M. (1974). Normal studies on the function of the right hemisphere in vision. In S. J. Diamond & J. G. Beaumont (Eds.), Hemispheric Function in Human Brain (pp. 25-47). New York: Halstead.
Koriat, A., & Norman, J. (1989). Why is word recognition impaired by disorientation while identification of single letters is not? Journal of Experimental Psychology: Human Perception and Performance, 15, 153-163.
Kosslyn, S. M. (1987). Seeing and imagining in the cerebral hemispheres: A computational approach. Psychological Review, 9, 148-175.
Kosslyn, S. M., Chabris, C., Marsolek, C., & Koenig, O. (1992). Categorical versus coordinate spatial relations: Computational analyses and computer simulations. Journal of Experimental Psychology: Human Perception and Performance, 18, 562-577.
Kosslyn, S. M. Koenig, O., Barrett, A., Cave, C. B., Tang, J., & Gabrieli, J. D. E. (1989). Evidence for two types of spatial representations: Hemispheric specialization for categorical and coordinate relations. Journal of Experimental Psychology: Human Perception and Performance, 15, 723-735.
Levine, S. C., & Banich, M. T. (1982). Lateral asymmetries in the naming of words and corresponding line drawings. Brain and Language, 17, 34-45.
Loftus, G. R., & Loftus, E. F. (1988). Essence of Statistics (2nd ed.). New York: Random House.
Lowe, D. G. (1987). The viewpoint consistency constraint. International Journal of Computer Vision, 1, 57-72.
McKeever, W. F., & Jackson, T. L. (1979). Cerebral dominance assessed by object and color-naming latencies: Sex and familial sinistrality effects. Brain Language, 7, 175-190.
McMullen, P. A., & Farah, M. J. (1991). Viewer-centered and object-centered representations in the recognition of naturalistic line drawings. Psychological Science, 2, 275-277.
McMullen, P. A., & Jolicoeur, P. (1990). The spatial frame of reference in object naming and discrimination of left-right reflections. Memory and Cognition, 18, 99-115.
Moscovitch, M., Winocur, G., & Behrmann, M. (1997). What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. Journal of Cognitive Neuroscience, 9, 555-604.
Newcombe, F. (1979). The processing of visual information in prosopagnosia and acquired dyslexia: Functional versus physiological interpretation. In D. J. Osbourne, M. M. Gruneberg, & J. E. Eiser (Eds.), Research in Psychology and Medicine (pp. 315-322). London: Academic.
O'Toole, A. J., Abdi, H., Deffenbacher, K. A., & Valentin, D. (1995). A perceptual learning theory of the information in faces. In T. Valentine (Ed.), Cognitive and Computational Aspects of Face Recognition (pp. 159-182). New York: Routledge.
Olhausen, B. A., Anderson, C. H., & Essen, D. C. V. (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. The Journal of Neuroscience, 13, 4700-4719.
Pallis, C. A. (1955). Impaired identification of faces and places with agnosia for colors. Journal of Neurology, Neurosurgery, and Psychiatry, 18, 218-224.
Phillips, R. (1972). Why are faces hard to recognize in photographic negative? Perception & Psychophysics, 12, 425-426.
Poggio, T. & Edelman, S. (1990). A neural network that learns to recognize three-dimensional objects. Nature, 317, 314-319.
Poggio, T. & Vetter, T. (1992). Recognition and structure and from one 2D model view: observations on prototypes, object classes, and symmetries. MIT AI Memo 1347, Massachusetts Institute of Technology.
Reinitz, M., Morrissey, J., & Demb, J. (1994). Role of attention in face encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 161-168.
Rhodes, G., Brake, S., & Atkinson, A. P. (1993). What's lost in inverted faces? Cognition, 47, 25-57.
Rock, I. (1984). Perception. New York: W. H. Freeman and Company.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 27-48). Hillsdale, NJ: Erlbaum.
Rybash, J., & Hoyer, W. (1992). Hemispheric specialization for categorical and coordinate spatial representations: A reappraisal. Memory & Cognition, 20, 271-276.
Scapinello, F. F., & Yarmey, A. D. (1970). The role of familiarity and orientation in immediate and delayed recognition of pictorial stimuli. Psychonomic Science, 21, 329-331.
Schmuller, J., & Goodman, R. (1980). Bilateral tachistoscopic perception, handedness, and laterality. Brain and Language, 11, 12-18.
Sergent, J. (1991). Judgments of relative position and distance on representations of spatial relations. Journal of Experimental Psychology: Human Perception and Performance, 17, 762-780.
Sergent, J., Ohta, S., & MacDonald, B. (1992). Functional neuroanatomy of face and object processing. Brain, 115, 15-36.
Siebert, M. & Waxman, A. M. (1992). Learning and recognizing 3D objects from multiple views in a neural system. In H. Wechsler (Ed.), Neural Networks for Perception, Volume 1, Human and Machine Perception. (pp. 427-444). Academic Press.
Sutherland, N. S. (1968). Outlines of a theory of visual pattern recognition in animals and man. Proceedings of the Royal Society of London, 171 B, 95-103.
Tanaka, J. W., & Farah, M. J. (1991). Second order relational properties and the inversion effect: Testing a theory of face perception. Perception & Psychophysics, 50, 367-372.
Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology, 46A, 225-245.
Tarr, M. (1995). Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonomic Bulletin and Review, 2, 55-82.
Tarr, M., & Pinker, S. (1989). Mental rotation and orientation-dependence in shape recognition. Cognitive Psychology, 21, 233-282.
Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-254.
Ullman, S. & Basri, R. (1991). Recognition by linear combinations of models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 992-1006.
Valentine, T. (1988). Upside-down faces: A review of the effect of inversion upon face recognition. British Journal of Psychology, 79, 471-491.
Valentine, T., & Bruce, V. (1986). The effects of distinctiveness in recognising and classifying faces. Perception, 15, 525-535.
Winston, P. (1975). Learning structural descriptions from examples. In P. Winston (Ed.), The Psychology of Computer Vision (pp. 157-209). New York: McGraw-Hill.
Wyke, M., & Ettlinger, G. (1961). Efficiency of recognition in the left and right visual fields. Archives of Neurology, 5, 659-665.
Yin, R. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81, 141-145.
Young, A. W., Bion, P. J., & Ellis, A. W. (1980). Studies toward a model of laterality effects for picture and word naming. Brain and Language, 11, 54-65.
Young, A. W., Hay, D. C., & Ellis, A. W. (1985). The face that launched a thousand slips: Everyday difficulties and errors in recognizing people. British Journal of Psychology, 76, 495-523.
|Arnold, Roseanne||Ali, Muhammed||Chamberlain, Wilt|
|Arnold, Tom||Allen, Woody||Donahue, Phil|
|Bergen, Candice||Bacon, Kevin||Prince Edward|
|Bush, George||Ball, Lucille||Kidman, Nicole|
|Carson, Johnny||Basinger, Kim||Long, Shelley|
|Culkin, Macaulay||Beatty, Warren||Kennedy Jr., John|
|Cher||Bronson, Charles||Linden, Hal|
|Clinton, Bill||Candy, John||Goodman, John|
|Clinton, Hillary||Carter, Jimmy||Safer, Morley|
|Connery, Sean||Chase, Chevy||Simon, Paul|
|Cosby, Bill||Cosell, Howard||Dukakis, Michael|
|Costner, Kevin||Crystal, Billy||Cassidy, David|
|Crawford, Cindy||Curtis, Jamie Lee||Lunden, Joan|
|Cruise, Tom||Derek, Bo||Seymour, Jane|
|Danson, Ted||Dole, Bob||Shapiro, Robert|
|Depp, Johnny||Douglas, Michael||Woods, James|
|Doherty, Shannen||Dukakis, Michael||Cash, Johnny|
|Fabio||Einstein, Albert||Skerrit, Tom|
|Ford, Harrison||Goldberg, Whoopi||Houston, Whitney|
|Foster, Jodie||Goodman, John||Dennehy, Brian|
|Fox, Michael J.||Gore, Al||Williams, Treat|
|Gibson, Mel||Gretsky, Wayne||Gere, Richard|
|Gorbachev, Mikhail||Hannah, Darryl||Rivers, Joan|
|Hall, Arsenio||Herman, Pee Wee||Nelson, Craig T.|
|Harrelson, Woody||Hitchcock, Alfred||DeVito, Danny|
|Hawn, Goldie||Hogan, Hulk||Castro, Fidel|
|Hoffman, Dustin||Johnson, Don||Kaelin, Kato|
|Jackson, Jesse||Jordan, Michael||Mandela, Nelson|
|Jackson, Michael||Keaton, Michael||Lovett, Lyle|
|Johnson, Magic||King, Martin Luther||Foreman, George|
|Kennedy, John F.||Lewis, Jerry||Berle, Milton|
|Lady Diana||Locklear, Heather||Garth, Jennie|
|Letterman, David||Lowe, Rob||Dillon, Matt|
|Limbaugh, Rush||McCartney, Paul||Jagger, Mick|
|Madonna||McMahon, Ed||Nixon, Richard|
|Martin, Steve||Mondale, Walter||Gotti, John|
|Monroe, Marilyn||Montana, Joe||Grammer, Kelsey|
|Moore, Demi||Moore, Dudley||Carrey, Jim|
|Murphy, Eddie||Moore, Mary Tyler||Bateman, Justine|
|Murray, Bill||Nixon, Richard||Schwarzkopf, Norman|
|Perot, Ross||Nolte, Nick||Carvey, Dana|
|Perry, Luke||O'Connor, Sinead||Young, Sean|
|Presley, Elvis||Pacino, Al||Neeson, Liam|
|Prince Charles||Parton, Dolly||Butler, Brett|
|Reagan, Nancy||Pfeiffer, Michelle||Ryan, Meg|
|Reagan, Ronald||Powell, Colin||Snipes, Wesley|
|Redford, Robert||Quayle, Dan||Bentsen, Lloyd|
|Reynolds, Burt||Rivera, Geraldo||Norris, Chuck|
|Roberts, Julia||Rourke, Mickey||Clark, Dick|
|Seinfeld, Jerry||Ryan, Meg||Locklear, Heather|
|Selleck, Tom||Shatner, William||Urich, Robert|
|Simpson, O.J.||Sheen, Charlie||Winkler, Henry|
|Slater, Christian||Shepherd, Cybill||Thurman, Uma|
|Streep, Meryl||Springsteen, Bruce||Turturro, John|
|Streisand, Barbara||Segal, Steven||Downey Jr., Robert|
|Travolta, John||Swayze, Patrick||Eastwood, Clint|
|Walters, Barbara||Thatcher, Margaret||Hepburn, Audrey|
|Williams, Robin||Tyson, Mike||Jackson, Samuel|
|Willis, Bruce||Westheimer, Ruth||Turner, Janine|
Eric E. Cooper and Tim J. Wojan, Department of Psychology.
The authors would like to thank Veronica Dark, Luke Rosielle, Brian Brooks, Michael O'Boyle, Gary Wells, Tom Sanocki and Gillian Rhodes for careful readings of an earlier version of this manuscript. We would also like to thank Paula Semplinski, Keri Bassman, Heidi Pierce, Kate McKee, Tara King, Stacey Sentyrz, Brandi King, Deann Van Diest, Tammy Thomsen, Jen Majeski, Cindi Nolan, Ann Grienke, Kris Ghosh, Jen Lansink, Karri Buresh, Andrea Cronin, Tim Jensen, Brian Brooks, and Christina Homan for their assistance in carrying out the work described in this paper.
Portions of the research reported here were presented at the 1996 Meeting of the Association for Research in Vision and Ophthalmology, Ft. Lauderdale, FL.
Correspondence about this paper should be addressed to Eric E. Cooper, W112 Lagomarcino Hall, Ames, IA 50011, e-mail: firstname.lastname@example.org.