Overview of the Project
Limn focuses on tools
for interacting with visual renderings of large, multi-dimensional
data. For data, up to about 1 million cases, there exists an extensive
body of research and resulting proven methods for interacting with
visual renderings. The most successful approach for visualizing
multi-dimensional data is called the "multiple views paradigm". Users
are presented with multiple renderings, such as a histogram,
scatterplot, parallel coordinate plot, or touring plot, and provided
with mechanisms for linking the information between them. The most
common linking method is linked brushing, where highlighting features
in one rendering similarly highlights the corresponding features in
the other renderings. This project concentrates on research on scaling
up this multiple views paradigm for data larger than a million cases,
to arbitrary amounts of data. There are two directions of research:
linking mechanisms between renderings, and generating touring
plots. Much of the existing research on working with large data has
focused on reducing the size, using binning, or clustering type
approaches before rendering the data. This approach may be appropriate
for extracting the global trends, but will fail to extract local
anomalies and deviations, which might ultimately be more interesting
to know about. So we are concentrating our attention on methods to
visualize the entire data set. Quite an ambitious objective!
Most of our current work has focused on generating touring plots using
movie technology. Tours are intuitively rotations of high-dimensional
data spaces. The most common implementation is a sequence of 2D
projections rendered as a scatterplot, which for 3 variables is simply
a 3D rotation. Tours are excellent for exploring for multi-dimensional
shapes, embedded subspaces, and the joint distribution of
variables. The tour algorithm is especially fast, only depending on
the number of cases when taking linear projections. The algorithm is
of order n, one pass through the data for each view. We are exploring
the use of QuickTime to generate tour movies of the data off-line. A
novel addition is interaction with the movie that enables brushing of
small subsets and have these real data points highlighted as an
overlay on the movie images.
There are three necessary approaches when constructing tour paths:
random, guided and manual. In a random tour path, new target planes
are chosen randomly from all possible projection planes, and a
geodesic interpolation path is constructed to move the view from the
current projection plane to the target plane, in a continuous, smooth
manner. The random tour path provides an overview of the data. If the
user could sit watching for an extended period of time, she would see
all possible projections of the data. There are two other approaches
for generating tours that provide an overview of the data: the coding
theory tour, and the little tour. The coding theory tour, uses exactly
that - theory of codes, and Hamiltonian circuits - to calculate a set
number of planes that are roughly equally spaced in the space of all
possible projections. The little tour is the path generated by
interpolating from one variable directly to another. We are using both
of these approaches and the random tour in making tour movies. The
problem with the random tour path, and generally overview tours, is
that if the dimension of the space is large (>5) then much of the time
the views seen are uninteresting, or near Gaussian. Then it is
important to change from random path to a guided path, to provide more
of the interesting views and less of the uninteresting views to the
user. A manual tour allows the user to interactively alter the
projection coefficient of a variable, to rotate a variable into or out
of the view. To facilitate a guided tour using movie methods we do an
off-line search for interesting projections, and then use the
interpolation algorithm to generate a path between these views to make
a guided tour movie. Potential candidates for the searching algorithms
are projection pursuit, decision trees and neural networks.
The approaches discussed above are tested and developed for AVIRIS
data, provided by the USGS EROS Data Lab, in Sioux Falls, South
Dakota. The data provided by AVIRIS are coverages of selected regions
of the mid-West, with each image being 512x614 pixels of 224 bands. In
the initial phases of this work we concentrate on just one image,
reduce the number of bands to about 25, and ignore time and space
components, to focus on the multivariate space. Later phases of the
work will provide methods for incorporating time and space
information, and we will expand the methods to MODIS satellite
data. We will demonstrate several movies of the band space
illustrating the tour approaches described above.