** Usage: **

gt datafilename proj_dim tourlen stepscale Optional arguments: proj_dim = Projection Dimension, default is 2 tourlen = The number of projections to calculate, default is 150 stepscale = The speed factor: 1 is slow, 100 is fast, default is 10

** Description: **

This code takes a data file and several optional arguments and returns a sequence of grand tour projection vectors. Small modifications can be made to have the code return the projected data. The algorithm is described in detail in Buja, Cook, Asimov, Hurley (1997). The code generates a grand tour similar to the one available in XGobi, with the major difference being that in XGobi the projection is 2D, but this code can generate grand tours of 1D, 2D, 3D or any dimension, lower than that of the data, projections.

The main operational functions are:

* run_tour() * is the main driving engine of the tour. This
calls the functions that sequentially compute the projections.

* preproject_data() * preprojects the data into the space
spanned by the starting and ending bases at the end of each
interpolation stage. This speeds calculations during the interpolation
stage.

* tour_reproject() * calculates the projected data and the
corresponding variable axes at each tour step.

* gen_norm_variates() * generates random normal variates
used to generate the new random bases.

* new_random_basis() * transforms the normal random
variates into random variates from a *(p-1)*-dimensional sphere
in *p*-space, that form the new ending basis.

* path() * calculates the new interpolation path, from the
current basis to the new ending basis. Several steps are involved: the
first is calculating the principal angles and corresponding principal
vectors between the starting and ending bases, then the preprojection
basis is computed, and finally the viewing frames.

* reached_target_plane() * checks if the interpolation
stage is finished.

* finishing_step() * does the final small increment needed
to get to the ending basis.

* mean_lgdist() * finds the distance between the point
furthest from the sample mean. This is used to scale the data into the
plotting window. It is more effective than using the minimum and
maximum for each variable when there are 3 or more begin used in a
grand tour. As the number of variables grows the distance of the point
farthest from the mean grows at a rate proportional to the square root
of the number of variables. This scaling method helps to take this
into account for different data sets, to keep all the data within the
plotting window. There maybe other/better ways to do this.

* scale_into_window() * scales the projected data into
plotting coordinates, currently assumed to be between -1 and 1.

* gt_pipeline() * scale the raw data into a -1 to +1
p-dimensional box.

** Using S/S-Plus to generate pictures: **

Pipe the output to a file, say "gt_proj". The following S-code will read the file, and plot the consecutive projections of the data, 25 plots per page:

Dianne Cook, Dept of Statistics, ISU, 325 Snedecor Hall, Ames, IA 50011-1210

Tel: (515) 294 8865, Fax: (515) 294 4040

email: dicook@iastate.edu

http://www.public.iastate.edu/~dicook/

Last modified: Sun Sep 21 14:30:20 CDT 1997