Overview of the Project

Limn focuses on tools for interacting with visual renderings of large, multi-dimensional data. For data, up to about 1 million cases, there exists an extensive body of research and resulting proven methods for interacting with visual renderings. The most successful approach for visualizing multi-dimensional data is called the "multiple views paradigm". Users are presented with multiple renderings, such as a histogram, scatterplot, parallel coordinate plot, or touring plot, and provided with mechanisms for linking the information between them. The most common linking method is linked brushing, where highlighting features in one rendering similarly highlights the corresponding features in the other renderings. This project concentrates on research on scaling up this multiple views paradigm for data larger than a million cases, to arbitrary amounts of data. There are two directions of research: linking mechanisms between renderings, and generating touring plots. Much of the existing research on working with large data has focused on reducing the size, using binning, or clustering type approaches before rendering the data. This approach may be appropriate for extracting the global trends, but will fail to extract local anomalies and deviations, which might ultimately be more interesting to know about. So we are concentrating our attention on methods to visualize the entire data set. Quite an ambitious objective!

Most of our current work has focused on generating touring plots using movie technology. Tours are intuitively rotations of high-dimensional data spaces. The most common implementation is a sequence of 2D projections rendered as a scatterplot, which for 3 variables is simply a 3D rotation. Tours are excellent for exploring for multi-dimensional shapes, embedded subspaces, and the joint distribution of variables. The tour algorithm is especially fast, only depending on the number of cases when taking linear projections. The algorithm is of order n, one pass through the data for each view. We are exploring the use of QuickTime to generate tour movies of the data off-line. A novel addition is interaction with the movie that enables brushing of small subsets and have these real data points highlighted as an overlay on the movie images.

There are three necessary approaches when constructing tour paths: random, guided and manual. In a random tour path, new target planes are chosen randomly from all possible projection planes, and a geodesic interpolation path is constructed to move the view from the current projection plane to the target plane, in a continuous, smooth manner. The random tour path provides an overview of the data. If the user could sit watching for an extended period of time, she would see all possible projections of the data. There are two other approaches for generating tours that provide an overview of the data: the coding theory tour, and the little tour. The coding theory tour, uses exactly that - theory of codes, and Hamiltonian circuits - to calculate a set number of planes that are roughly equally spaced in the space of all possible projections. The little tour is the path generated by interpolating from one variable directly to another. We are using both of these approaches and the random tour in making tour movies. The problem with the random tour path, and generally overview tours, is that if the dimension of the space is large (>5) then much of the time the views seen are uninteresting, or near Gaussian. Then it is important to change from random path to a guided path, to provide more of the interesting views and less of the uninteresting views to the user. A manual tour allows the user to interactively alter the projection coefficient of a variable, to rotate a variable into or out of the view. To facilitate a guided tour using movie methods we do an off-line search for interesting projections, and then use the interpolation algorithm to generate a path between these views to make a guided tour movie. Potential candidates for the searching algorithms are projection pursuit, decision trees and neural networks.

The approaches discussed above are tested and developed for AVIRIS data, provided by the USGS EROS Data Lab, in Sioux Falls, South Dakota. The data provided by AVIRIS are coverages of selected regions of the mid-West, with each image being 512x614 pixels of 224 bands. In the initial phases of this work we concentrate on just one image, reduce the number of bands to about 25, and ignore time and space components, to focus on the multivariate space. Later phases of the work will provide methods for incorporating time and space information, and we will expand the methods to MODIS satellite data. We will demonstrate several movies of the band space illustrating the tour approaches described above.