Introduction | Example Data Sets | References

Abstract

This paper provides a suite of datasets from standard multivariate distributions and simple high-dimensional geomtric shapes that can be used to familiarize new users of grand tour visualizations. It contains Quicktime and Gif animations of 1-D, 2-D, 3-D, 4-D and 5-D grand tours, links to starting XGobi or XLispStat on the calibration data sets, and C code for generating a grand tour.

The purpose of the paper is two-fold: providing code for the grand tour that others could pick up and modify (it is not easy to code this version which is why there are very few implementations currently available), and secondly, provide a variety of training datasets to help new users get a visual sense for high-dimensional data.

Introduction

The grand tour is a method for viewing multivariate data "from all sides". As originally proposed by Asimov (1985) it is a movie of data projections, where the viewer is shown a continuous sequence of d-dimensional projections of the p-dimensional data. The dimension of the projection can be 1, 2, 3, ... , p. Currently there are implementations of grand tours available in XGobi (Swayne, Cook and Buja, 1997), XLispStat (Tierney, 1991)and ExplorN (Carr, Wegman and Luo, 1996).

Grand tour examples

Here are some examples of a grand tour running on a small seven dimensional dataset. This is the primeval form of the grand tour, a la Asimov (1985). They are purely movies with fixed play speed and no user interaction. Gif animations of points at the corners of a nine dimensional cube are available through the links if you are viewing this on a platform that doesnt support quicktime.

1-D as a sequence of histograms
movie size: 90k duration: 30s
view the animated gif here

2-D as a scatterplot
movie size: 150k duration: 30s
view the animated gif here

3-D as a parallel coordinate plot
movie size: 300k duration: 30s
view the animated gif here

4-D as a parallel coordinate plot
movie size: 300k duration: 30s
view the animated gif here

5-D as a parallel coordinate plot
movie size: 300k duration: 30s
view the animated gif here

A Note: The animated gifs run through the grand tour sequence once. They should show smooth changes to the image as the animation runs, but it may appear jerky and non-smooth over the net. To re-run it you need to reload. The quicktime movies used through out this paper allow better control of each animation.

These examples illustrate tours implemented using the algorithm in Buja, Cook, Asimov, Hurley (1997). They are geodesic tours that contain no "within-projection-plane" spin, which is optimal for viewing tours where d is less than p . This is the type of tour implemented in XGobi , with the main difference being that XGobi is capable of 2-D projections only.

Example Data Sets

Ways to view the data

If you have your web browser set up to recognize quicktime movies then you can simply click the animation image to start downloading and viewing the moives.

If you have your web browser set up to recognize files with a .xgobi extension then you can simply click the XGobi button beside the data explanations below. (You'll need the latest version of XGobi, at least the Oct 1997 beta release for this to work correctly.)

If you have your web browser set up to recognize files with a .xli extension as XLispStat, then you can simply click the XLispStat button beside the data explanations below. This will start up a tour in XLispStat on the dataset.

Compile C code to compute arbitrary dimension projection vectors for composing a grand tour and display results in S/S-Plus.

Samples from Standard Multivariate Distributions

Multivariate Normal Distributions

5-D Standard Normal

Samples from a standard normal distribution in any dimension look like samples from a standard bivariate normal distribution. The familiar bulls-eye is visible in every projection seen.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 600k
duration: 90s
5-D same variance, correlation 0.5

Samples from a normal with equal variances, but correlation equal to 0.5 have both circular and elliptical contours. The appearance is elliptical in most views.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 500k
duration: 90s
5-D different variances, no correlation

Samples from a normal with different variances, but no correlation also look mostly elliptical but you see a shrinking-expanding effect in a tour that results from variables with small variables being toured in and then out again.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 300k
duration: 90s
5-D "singularity"

First variable has almost zero variance compared to all the others. In some views the points will "collapse" into a very linear shape.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 450k
duration: 90s

Note: Variables need to be scaled together (min/max over all measurements is used) in the viewing transformation so that variance difference are reflected. In XGobi, this is achieved by creating a file with the extension .vgroups with each row having a 1 in the the first place and nothing else on the line. The number of rows should match the number of variables. To maintain the scale differences in the latter two datasets we have used a trick: two points are added to the top of the data files which delimit the min/max values of the variables with the largest variances. These appear as two anomalous data points floating far from other points in the grand tour, visually distracting but they work to force XLispStat, and XGobi initiated from the web browser, to keep the variable scales relevant to each other.

Samples from Long-Tailed Distributions
5-D Standard Cauchy

Samples from a standard Cauchy distribution in any dimension look like a mass of points in one location and a few very extreme points. If you remove the extreme points and rescale it still looks like mass of points in one location and a few very extreme points

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 300k
duration: 90s
5-D t with 30 df

Similar to a normal sample but has a tighter center clustering and more outlying points.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 500k
duration: 90s

Samples from Skewed Distributions
5-D Standard Exponential

Samples from a standard Exponential distribution (lambda=1) in any dimension have most projections that exhibit skewness. In the pairwise plot the points mass at the (0,0) location in each plot. The grand tour views are more interesting: (1) it is clear that there is one point in 5-D that is a vertex where 5 edges merge, (2) in many projections (when all variables contribute to the projection in an averaging manner) the data look somewhat like a sample from a normal distribution.

Data file | .vgroups file | S Code for generating samples | XGobi | XLispStat

movie size: 600k
duration: 90s

Simple Geometric Shapes

The vertices of a cube up to 9-D

This data is interesting because most projections from 9-D look quite normal, except for the regularities imposed by the cube grid. The pairs plot is quite different from the grand tour views.

Data file | S Code for generating samples | XGobi | XLispStat

movie size: 2200k
duration: 90s
Uniform in a 5-D cube

Just looks like a box.

Data file | S Code for generating samples | XGobi | XLispStat

movie size: 1000k
duration: 90s
Uniform on a 5-D sphere

Always circular projections with sharp edges. Interesting to take watch a section tour of this data - it is always a circle.

Data file | S Code for generating samples | XGobi | XLispStat

movie size: 700k
duration: 90s
Uniform within a 5-D sphere

Always circular projections but "fuzzy" edges. No circular sections.

Data file | S Code for generating samples | XGobi | XLispStat

movie size: 600k
duration: 90s
Three distinct unconnected clusters

The points follow 3 different motion patterns.

Data file | XGobi | XLispStat

movie size: 650k
duration: 90s
1-D structure embedded in 5-D

This data always looks almost linear (highly correlated) or occasionally reduces in variance to a very small blob.

Data file | XGobi | XLispStat

movie size: 400k
duration: 90s
2-D structure embedded in 5-D

This data always looks almost planar or linear (highly correlated) or occasionally reduces in variance to a very small blob.

Data file | XGobi | XLispStat

movie size: 600k
duration: 90s
1-D non-linear structure embedded in 5-D

This data always looks like a curved line rotating.

Data file | XGobi | XLispStat

movie size: 300k
duration: 90s

Challenge Data Sets

These data sets can be viewed on line through an applet with the button, or downloaded to view using XGobi or XLispStat.

How many clusters in this data set? Data 1 (XGobi, XLispStat)| Data 2 (XGobi, XLispStat) | Data 3 (XGobi, XLispStat) | Data 4 (XGobi, XLispStat) Answers
What is the distribution? Data 1 (XGobi, XLispStat) | Data 2 (XGobi, XLispStat) | Data 3 (XGobi, XLispStat) | Data 4 (XGobi, XLispStat) Answers

Acknowledgements

This work began with the writing of code to run a grand tour with arbitrary dimensional projections for use in the C2 Virtual Reality Lab at Iowa State University. It is possible as a result of the work in Buja, Cook, Asimov and Hurley (1997) which describes the algorithm. The work here can be viewed as an adjunct to that paper.

Thanks to Dr Sigbert Klinke for valuable feedback on the material in this paper.

The author was supported by National Science Foundation grants DMS9632662 and DMS9214497.

References

Asimov, D. (1985) The Grand Tour: A Tool for Viewing Multidimensional Data, SIAM Journal of Scientific and Statistical Computing, 6(1):128-143.

Buja, A., Cook, D., Asimov, D., Hurley, C. (1997) Dynamic Projections in High-Dimensional Visualization: Theory and Computational Methods, Journal of Computational and Graphical Statistics, submitted.

Carr, D. B. and Wegman, E. J. and Luo, Q. (1996) ExplorN: Design Considerations Past and Present, Technical Report No. 129, Center for Computational Statistics, George Mason University .

Swayne, D. F., Cook, D., Buja, A. (1998) XGobi: Interactive Dynamic Graphics in the X Window System, Journal of Computational and Graphical Statistics, 7(1):113-130. See also www.research.att.com/areas/stat/xgobi/.

Tierney, L. (1991), LispStat: An Object-Orientated Environment for Statistical Computing and Dynamic Graphics, Wiley, New York, NY.


This paper is a revision of the paper that can be found at http://www.stat.ucla.edu/journals/jss/v02/i06/

Dianne Cook, Dept of Statistics, ISU, 325 Snedecor Hall, Ames, IA 50011-1210
Tel: (515) 294 8865, Fax: (515) 294 4040
email:
dicook@iastate.edu
http://www.public.iastate.edu/~dicook/

Last modified: Tue Sep 12 05:47:40 CDT 2000