XGobi in Brief

Dianne Cook

Introduction

XGobi is a freeware data visualization system for interactive dynamic graphics written by Deborah F. Swayne, Dianne Cook and Andreas Buja. It is especially designed for the exploration of multivariate data. Its basic plot is a scatterplot, and these are some of the tools available for scatterplot display and manipulation:

XGobi has a direct manipulation interface, and all the above actions are performed using the mouse. The layout of the xgobi window is shown in Figure 1. XGobi can be used in conjunction with the S language for scientific computing and data analysis.

Starting XGobi

First add the xgobi locker:


add xgobi

Then start xgobi with the following syntax. The only mandatory argument is a data filename:


xgobi [ X options ] [ -std mmx|msd|mmd ] [ -dev std_deviation ] filename

Options


    -std mmx|msd|mmd 
         By default, the data are scaled into the plotting
         window using the minimum and maximum values of each variable
         or variable group, in such a way that the midpoint of the
         variable is at the center of the plotting window and no
         points fall outside the window. Instead, to scale using mean
         and standard deviation, specify -std msd; to scale using the
         median and median absolute deviation, specify -std mmd.  

    -dev x
         If you have specified -std msd or -std mmd, then you can also
         specify the number of standard deviations (or median absolute
         deviations) from the mean (or median) to be contained within
         the plotting window, using the argument -dev x, where x is a
         real number between 0 and 100. The default is 2.

     X options 
         The standard X command line options can be used with
         XGobi. These include "-display machinename:0", used when
         running an X program on one machine and displaying its output
         on another, and "-title Title", where Title is a string you
         want to appear in the window manager titlebar.

Files

XGobi accepts standard input, but is most often used with files, partly because of the additional plot control that can be achieved using a set of files. The data input file should be an ASCII file with the data matrix arranged in rows and columns; in ASCII, rows must be distinguished by carriage returns, and columns can be separated by any amount of white space. Missing values can be coded as ".", "NA" or "na". (The input file can also be a binary file, which can be produced within XGobi once the ASCII data has been read in.) XGobi accepts other input about the display of the data from files as well. If the data is in a file named


       filename 

         or 

       filename.dat 
 
         (either of which must be an ASCII file), or 

       filename.bin 

         (the binary version of the data), then the other files are as follows: 

       filename.row 
       filename.rowlab 
       filename.case 

         Row or case labels: a label for each row of the data matrix, which is
         displayed in the identification mode. The file should contain one 
         label per line. 

       filename.col 
       filename.collab 
       filename.column 
       filename.var 

         Column or variable labels: a label for each column of the data 
         matrix, which becomes part of the XGobi variable selection panel. 
         The file should contain one label per line. 

       filename.colors 

         Brushing colors: a color for each point in the plot, representing a 
         row or case of the data. The file should contain one color per line. 
         (It is probably best if the colors correspond to the colors used in 
         brushing; see the later section on resources.) 

       filename.glyphs 

         Brushing glyphs: a glyph type for each point in the plot, 
         representing a row or case of the data. The file should contain one 
         glyph type per line. The glyph types are as follows: 

              1 through 5: Five sizes of '+'
              6 through 10: Five sizes of 'X'
              11 through 15: Five sizes of open rectangle
              16 through 20: Five sizes of filled rectangle
              21 through 25: Five sizes of open circle
              26 through 30: Five sizes of filled circle
              31: A single-pixel point

       filename.erase 

         Erase: a column of 1s (to have a point erased on startup) and 0s 
         (to have the point plotted). There should be one value per line and 
         as many lines as there are rows in the data. 

       filename.lines 

         Line segments: specifications for the pattern of line segments which 
         connect pairs of points. The file should contain two numbers per 
         line. The pair of numbers represents the row numbers of the two 
         points that should be connected. 

       filename.linecolors 

         Line colors: a color for each line in the .lines file. The file 
         should contain one color per line. (It is best if the colors 
         correspond to the colors used in brushing; see the later
         section on resources.) 

       filename.nlinkable 

         The number of rows to be linked for brushing and identification. 
         By default, nlinkable is equal to the number of rows in the data. 
         This feature can be used to link ordinary scatterplots with plots 
         that have some decorations requiring additional points, such as
         clustering trees. 

       filename.vgroups 

         Variable groups: an integer for each column in the data. Each set 
         of columns that is represented by the same integer will grouped 
         together for scaling and transformation.
         The file is just one long line of integers. For example, an input 
         file with four columns could have a .vgroups file containing the 
         line 1 2 2 3. The second and third columns are then grouped together.
         The range of their plotting axes is be the same, and if
         column 2 is transformed, column 3 is transformed at the same time. 

       filename.missing 

         A file identical in structure to filename.dat, where non-zero
         values indicate positions with missing (or censored, or
         otherwise exceptional) values. This file represents the
         pattern of missing values in the data; it can be examined in
         a separate XGobi window by selecting Launch missing data
         XGobi... from the Tools menu.

       filename.imp 

         Multiple imputations of missing values: Each column should
         have a full set of imputed values. The number of rows needs
         to be identical to the number of non-zero values in
         filename.missing, or the number of missing codes in
         filename.dat if filename.missing is not provided.. The
         imputed values should be given in their order in the data
         column by column. For example, if filename.dat looks like
         this:

           10 NA 12 -3
           98  0 10  0
           77  3 NA -5
            1  2 NA 10
           NA NA  5 -8
            0  0 10 12

         (six cases, four variables, five missing values), then
         filename.imp with two sets of imputed values could look like
         this:

           54  37
            3   2
            4   1
           11  10
           13  11 

         If the second column is selected for imputation (Select
         Impute missing values from the Tools menu), the full data
         matrix with imputations looks like this:

           10  2 12 -3
           98  0 10  0
           77  3 10 -5
            1  2 11 10
           37  1  5 -8
            0  0 10 12

       filename.resources 

         Resources: a set of datafile-specific XGobi resources, which
         specify the size of the plotting window and some
         user-selection option settings. The file is in the format of
         a standard X resource file. It can be directly edited so that
         other resources can be specified. See the later section on
         resource files for more information.

     All of the above files can be created outside of XGobi, using an
     editor or other UNIX utilities, and several of them (glyphs and
     colors, line segments and line color, resources) can be written
     out during an XGobi session, in which case they represent the
     results of interactions performed during that session.

Example data files are provided in the xgobi locker /home/xgobi/data.

From S-Plus start xgobi on a matrix of data by xgobi(matrx).

Usage

XGobi is entirely driven by the mouse, and has a number of modes. Its appearance in tour mode is shown in figure 1.

The window can be resized as a whole in the usual way using the resize corners. The proportions of the bottom panel devoted to the buttons, graphics and variable selection can be adjusted by moving the tabs near the bottom of the separating lines with the LEFT mouse button (subsequently abbreviated to L, analogously M and R).

Some buttons hide menus of exclusive options, which are brought up by holding down L and releasing it when the desired option is highlighted. Others have menus which are brought up by a click. All stay-up menus and windows have a Click here to dismiss panel at the bottom-use L to do so. Many buttons are toggles-click with L to highlight them (and switch the function on) and click again to switch the function off.

Help is available on any feature of the command layout by clicking with R, which will bring up a help window. If the window has a scrollbar, clicking L will move the window down by the distance from the top to the mouse pointer, and clicking M will move the display to that point proportionally in the total display.

The top row of menus is organized are follows:

Selecting Variables

The right-hand panel selects variables. The variable name is a button which highlights if that variable is selected, and hides a menu of transformations (use L on the name). There is normally a box advising of the current functions of the buttons.

The circles below the variable names have two functions. They are used to select variables (by clicking with the circle; the precisely effect depends on the mode) and the line within the circle indicates the current projection of that variable axis (as a dot if it is orthogonal to the view plane). The boundary of the circle is made bolder if that variable is selected (see the figures).

DotPlot mode

This displays a univariate dotplot, like a histogram plotted sideways. The Cycle button tells XGobi to cycle sequentially through the plots of all the variables, and the speed scrollbar below this button determines how quickly the next plot is shown.

XYPlot mode

This displays a scatterplot of two variables. The Cycle button tells XGobi to cycle sequentially through the plots of all the variables, and the speed scrollbar below this button determines how quickly the next plot is shown.

Rotate mode

The layout is somewhat different in Rotate mode; see figure 2. Three variables must be selected, by either clicking either L or M in the variables's circle. (Initially the first three variables are selected.) There are three sub-modes, corresponding to rotating about the X axis, the Y axis or an arbitrary axis (the default), and variable selection differs in each.

There is a speed slider which can be dragged with L or clicked with L at the desired speed of rotation. Change Direction toggles the direction of rotation. Pause is a self-explanatory toggle; Reinit resets the original set of axes. Rock causes the rotation to change direction every ten steps, so the speed slider also controls the angle of rock.

Save Coeffs saves the coefficients of the current projection (in two columns, X axis then Y axis, with rows the variables in order). If invoked from S-Plus, the coefficients will be saved as an S object in the current working directory, as a vector in column-first order, which can be read by matrix(name,2) To see this vector, you will have to execute the synchronize(1) command.

Oblique axis

The current target variable is marked by a small circle in the center of its disc when the mouse pointer is the variable selection panel. Click (with either L or M) on a currently selected variable to make it the target. Click on an unselected variable to have it replace the target.

In this mode think of the points in the central window as within a transparent ball. When the rotation is paused, dragging with L or M in the central panel will mimic a reversed `trackball' action rotating that ball.

Y axis

In this mode there are two X variables and a Y variable. One of the X variables is the target, and is used for the X axis in the initial display. Click with M on a variable's circle to make it the Y variable, and with L to make it an X variable. (The last selected X variable is the target, marked by a small circle in the center of its disc when the mouse pointer is the variable selection panel.)

Rotation is about the Y axis, and `trackball' action still works. The additional option Interpolate rocks between the two X axes.

X axis

As for Y axis, but interchange X and Y throughout.

Tour mode

This mode is used for Asimov's (1985) grand tour and for projection pursuit. Its default operation is a continuous grand tour through the space of selected variables (or nothing if there are just two). This continually selects a random new projection in the -dimensional space of currently selected variables, and moves continuously (by interpolation) to that new projection. When the projection is reached, another is chosen (and a slight pause may be visible).

In this mode variables are selected by clicking L within the variable's circle, and de-selected by clicking M. Any number of two or more variables can be selected, but if only two are currently selected, neither can be de-selected until a further variable is selected.

The left-hand panel controls operations. The slider at the top (see figure 1) controls the speed of rotation; drag it with L, or click with L to indicate the place you want. Pause is a self-explanatory toggle; Reinit resets the original set of axes. In Step mode, the tour stops at each new view; click L on Go to continue. If Local Scan is selected, the tour returns to the starting position after each new view.

The tour is `checkpointed' at each new view, and can be selectively re-played, using the Backtrack and F or B buttons. (Once Backtrack is selected, F/B selects the direction of replay. The number between the buttons indicates the last view number.

The I/O button hides a menu to save the coefficients of the current projection, and read/save the checkpoint history. (See under Save Coeffs for behavior from S-Plus.)

The Interp button hides a menu giving a choice of interpolation method.

The PrnComp Basis button transforms the variables to principal components. The variables change names to PC1, PC2, ... but axes are still displayed in the original variables. An implicit re-initialize is done when this button is selected or de-selected.

The Section button switches to a section tour. Points not within a tolerance Eps (set by a slider) of the current section hyperplane are shown as a single pixel point.

Projection Pursuit tours

Selecting the ProjPrst button also selects principal component basis (necessary to `sphere' the data) and brings up another window which tracks the PP index as the tour progresses. Which index is selected from a menu hidden on the PP Index button, and this will have a slider indicating the number of terms in the polynomial or bandwidth of the kernel density estimate as appropriate. The indices are described in the help for that button; be warned that calculation of the Friedman-Tukey and Entropy indices seems very slow for large datasets.

N.B. No optimization is done until the Optimz button is selected. This will perform a (rough) gradient ascent when selecting new projections, and stop when a local maximum appears to have been found. Allowing the tour to proceed randomly between sessions of optimization may reveal other local maxima.

The bitmap button saves a small image of the picture below the track of the PP index, which can help to identify the points on the tour (including local maxima). The Return to Bitmap and Record Bitmap allow one to return to bitmaps, and to output the projection used. (Note: this will be in terms of the sphered principal components, not the original variables.)

Scale

Select by clicking L on the Scale button at any time when the view is stationary. The view can be dragged with L, and stretched with M. There are also buttons to perform these shifts.

The Stdzation button hides a menu controlling how variables are scaled. There are three possibilities. The default is to rescale to . Other options are to rescale to mean zero, variance one or to median zero, MAD (median absolute deviation) one. The slider controls how many standard deviations / MADs are within the display. These options (especially the third) can be useful when extreme outliers are present.

There is an option, vgroups, on invocation which ensures that groups of variables are scaled together, in which case the standardization applies to sample of all cases from all variables in the group. The vgroups is specified by a file or a S-Plus argument, listing for each variable its group number.

Brush

Select by clicking L on the Brush button at any time when the view is stationary. Within the view the brush size can be changed by dragging with M. Brushing is performed by dragging the brush outline (a rectangle) with L. (The rectangle can be moved without brushing by de-selecting Brush on.)

To brush, a color or glyph (symbol) or both must the selected from the hidden menus on the Color and Glyph buttons. (The brush rectangle changes to the currently selected brushing color.) The effect can be Persistent, Transient or Undo by selecting (with L) the appropriate button.

There are two erase buttons. The top one changes the effect of brushing to that of erasing points. The lower button hides a menu which will restore erased points, swap erased and non-erased points, and so on.

The button Make Group Var makes another variable which enumerates the current color/glyph groups. The button I/O enables the current color and glyph settings to be saved in a file.

Reset hides a menu of reset actions.

Identify

Select by clicking L on the Identify button at any time when the view is stationary. Moving the pointer over the view labels (by the row label) the nearest point. Clicking L makes the currently displayed label persistent.

There are two buttons; Remove Labels and Case profile with obvious effects.

Sticky labels will be preserved in future rotations or tours.

Acknowledgments

This document has been developed from the man pages written by Deborah Swayne and a previous short introduction by Brian Ripley.

References

Asimov, 1985
Asimov, D. (1985). The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing, 6(1):128-143.

Becker et al., 1988
Becker, R., Chambers, J., and Wilks, A. (1988). The New S Language - A Programming Environment for Data Analysis and Graphics. Wadsworth and Brooks/Cole, Pacific Grove, CA.

Buja et al., 1997
Buja, A., Cook, D., Asimov, D., and Hurley, C. (1997). Dynamic Projections in High-Dimensional Visualization: Theory and Computational Methods. Journal of Computational and Graphical Statistics. Submitted.

Buja et al., 1996
Buja, A., Cook, D., and Swayne, D. (1996). Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5(1):78-99.

Cleveland and McGill, 1988
Cleveland, W. S. and McGill, M. E., editors (1988). Dynamic Graphics for Statistics. Wadsworth, Monterey, CA.

Cook and Buja, 1997
Cook, D. and Buja, A. (1997). Manual Controls For High-Dimensional Data Projections. Journal of Computational and Graphical Statistics. Forthcoming.

Cook et al., 1993
Cook, D., Buja, A., and Cabrera, J. (1993). Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2(3):225-250.

Cook et al., 1995
Cook, D., Buja, A., Cabrera, J., and Hurley, C. (1995). Grand Tour and Projection Pursuit. Journal of Computational and Graphical Statistics, 4(3):155-172.

Inselberg, 1985
Inselberg, A. (1985). The Plane with Parallel Coordinates. The Visual Computer, 1:69-91.

Swayne et al., 1997
Swayne, D. F., Cook, D., and Buja, A. (1997). XGobi: Interactive Dynamic Graphics in the X Window System. Journal of Computational and Graphical Statistics, Forthcoming.

Tufte, 1990
Tufte, E. R. (1990). Envisioning Information. Graphics Press, Cheshire, CT.

Wegman, 1990
Wegman, E. (1990). Hyperdimensional Data Analysis Using Parallel Coordinates. Journal of American Statistics Association, 85:664-675.


dicook@iastate.edu
Tue Sep 2 14:25:16 CDT 1997