C Code for Computing a Grand Tour

Tar'd files | Makefile | main.c | defs_and_types.h | read.c | gt.c | gt.h | gt_util.c | gt_util.h | svd.c | util.c | util.h


  gt datafilename proj_dim tourlen stepscale

Optional arguments:
  proj_dim = Projection Dimension, default is 2
  tourlen = The number of projections to calculate, default is 150
  stepscale = The speed factor: 1 is slow, 100 is fast, default is 10


This code takes a data file and several optional arguments and returns a sequence of grand tour projection vectors. Small modifications can be made to have the code return the projected data. The algorithm is described in detail in Buja, Cook, Asimov, Hurley (1997). The code generates a grand tour similar to the one available in XGobi, with the major difference being that in XGobi the projection is 2D, but this code can generate grand tours of 1D, 2D, 3D or any dimension, lower than that of the data, projections.

The main operational functions are:

run_tour() is the main driving engine of the tour. This calls the functions that sequentially compute the projections.

preproject_data() preprojects the data into the space spanned by the starting and ending bases at the end of each interpolation stage. This speeds calculations during the interpolation stage.

tour_reproject() calculates the projected data and the corresponding variable axes at each tour step.

gen_norm_variates() generates random normal variates used to generate the new random bases.

new_random_basis() transforms the normal random variates into random variates from a (p-1)-dimensional sphere in p-space, that form the new ending basis.

path() calculates the new interpolation path, from the current basis to the new ending basis. Several steps are involved: the first is calculating the principal angles and corresponding principal vectors between the starting and ending bases, then the preprojection basis is computed, and finally the viewing frames.

reached_target_plane() checks if the interpolation stage is finished.

finishing_step() does the final small increment needed to get to the ending basis.

mean_lgdist() finds the distance between the point furthest from the sample mean. This is used to scale the data into the plotting window. It is more effective than using the minimum and maximum for each variable when there are 3 or more begin used in a grand tour. As the number of variables grows the distance of the point farthest from the mean grows at a rate proportional to the square root of the number of variables. This scaling method helps to take this into account for different data sets, to keep all the data within the plotting window. There maybe other/better ways to do this.

scale_into_window() scales the projected data into plotting coordinates, currently assumed to be between -1 and 1.

gt_pipeline() scale the raw data into a -1 to +1 p-dimensional box.

Using S/S-Plus to generate pictures:

Pipe the output to a file, say "gt_proj". The following S-code will read the file, and plot the consecutive projections of the data, 25 plots per page:

1D | 2D | 3D | 4D | 5D

Dianne Cook, Dept of Statistics, ISU, 325 Snedecor Hall, Ames, IA 50011-1210
Tel: (515) 294 8865, Fax: (515) 294 4040

Last modified: Sun Sep 21 14:30:20 CDT 1997