Dianne Cook
XGobi is a freeware data visualization system for interactive dynamic graphics written by Deborah F. Swayne, Dianne Cook and Andreas Buja. It is especially designed for the exploration of multivariate data. Its basic plot is a scatterplot, and these are some of the tools available for scatterplot display and manipulation:
XGobi has a direct manipulation interface, and all the above actions are performed using the mouse. The layout of the xgobi window is shown in Figure 1. XGobi can be used in conjunction with the S language for scientific computing and data analysis.

First add the xgobi locker:
add xgobi
Then start xgobi with the following syntax. The only mandatory argument is a data filename:
xgobi [ X options ] [ -std mmx|msd|mmd ] [ -dev std_deviation ] filename
-std mmx|msd|mmd
By default, the data are scaled into the plotting
window using the minimum and maximum values of each variable
or variable group, in such a way that the midpoint of the
variable is at the center of the plotting window and no
points fall outside the window. Instead, to scale using mean
and standard deviation, specify -std msd; to scale using the
median and median absolute deviation, specify -std mmd.
-dev x
If you have specified -std msd or -std mmd, then you can also
specify the number of standard deviations (or median absolute
deviations) from the mean (or median) to be contained within
the plotting window, using the argument -dev x, where x is a
real number between 0 and 100. The default is 2.
X options
The standard X command line options can be used with
XGobi. These include "-display machinename:0", used when
running an X program on one machine and displaying its output
on another, and "-title Title", where Title is a string you
want to appear in the window manager titlebar.
XGobi accepts standard input, but is most often used with files, partly because of the additional plot control that can be achieved using a set of files. The data input file should be an ASCII file with the data matrix arranged in rows and columns; in ASCII, rows must be distinguished by carriage returns, and columns can be separated by any amount of white space. Missing values can be coded as ".", "NA" or "na". (The input file can also be a binary file, which can be produced within XGobi once the ASCII data has been read in.) XGobi accepts other input about the display of the data from files as well. If the data is in a file named
filename
or
filename.dat
(either of which must be an ASCII file), or
filename.bin
(the binary version of the data), then the other files are as follows:
filename.row
filename.rowlab
filename.case
Row or case labels: a label for each row of the data matrix, which is
displayed in the identification mode. The file should contain one
label per line.
filename.col
filename.collab
filename.column
filename.var
Column or variable labels: a label for each column of the data
matrix, which becomes part of the XGobi variable selection panel.
The file should contain one label per line.
filename.colors
Brushing colors: a color for each point in the plot, representing a
row or case of the data. The file should contain one color per line.
(It is probably best if the colors correspond to the colors used in
brushing; see the later section on resources.)
filename.glyphs
Brushing glyphs: a glyph type for each point in the plot,
representing a row or case of the data. The file should contain one
glyph type per line. The glyph types are as follows:
1 through 5: Five sizes of '+'
6 through 10: Five sizes of 'X'
11 through 15: Five sizes of open rectangle
16 through 20: Five sizes of filled rectangle
21 through 25: Five sizes of open circle
26 through 30: Five sizes of filled circle
31: A single-pixel point
filename.erase
Erase: a column of 1s (to have a point erased on startup) and 0s
(to have the point plotted). There should be one value per line and
as many lines as there are rows in the data.
filename.lines
Line segments: specifications for the pattern of line segments which
connect pairs of points. The file should contain two numbers per
line. The pair of numbers represents the row numbers of the two
points that should be connected.
filename.linecolors
Line colors: a color for each line in the .lines file. The file
should contain one color per line. (It is best if the colors
correspond to the colors used in brushing; see the later
section on resources.)
filename.nlinkable
The number of rows to be linked for brushing and identification.
By default, nlinkable is equal to the number of rows in the data.
This feature can be used to link ordinary scatterplots with plots
that have some decorations requiring additional points, such as
clustering trees.
filename.vgroups
Variable groups: an integer for each column in the data. Each set
of columns that is represented by the same integer will grouped
together for scaling and transformation.
The file is just one long line of integers. For example, an input
file with four columns could have a .vgroups file containing the
line 1 2 2 3. The second and third columns are then grouped together.
The range of their plotting axes is be the same, and if
column 2 is transformed, column 3 is transformed at the same time.
filename.missing
A file identical in structure to filename.dat, where non-zero
values indicate positions with missing (or censored, or
otherwise exceptional) values. This file represents the
pattern of missing values in the data; it can be examined in
a separate XGobi window by selecting Launch missing data
XGobi... from the Tools menu.
filename.imp
Multiple imputations of missing values: Each column should
have a full set of imputed values. The number of rows needs
to be identical to the number of non-zero values in
filename.missing, or the number of missing codes in
filename.dat if filename.missing is not provided.. The
imputed values should be given in their order in the data
column by column. For example, if filename.dat looks like
this:
10 NA 12 -3
98 0 10 0
77 3 NA -5
1 2 NA 10
NA NA 5 -8
0 0 10 12
(six cases, four variables, five missing values), then
filename.imp with two sets of imputed values could look like
this:
54 37
3 2
4 1
11 10
13 11
If the second column is selected for imputation (Select
Impute missing values from the Tools menu), the full data
matrix with imputations looks like this:
10 2 12 -3
98 0 10 0
77 3 10 -5
1 2 11 10
37 1 5 -8
0 0 10 12
filename.resources
Resources: a set of datafile-specific XGobi resources, which
specify the size of the plotting window and some
user-selection option settings. The file is in the format of
a standard X resource file. It can be directly edited so that
other resources can be specified. See the later section on
resource files for more information.
All of the above files can be created outside of XGobi, using an
editor or other UNIX utilities, and several of them (glyphs and
colors, line segments and line color, resources) can be written
out during an XGobi session, in which case they represent the
results of interactions performed during that session.
Example data files are provided in the xgobi locker /home/xgobi/data.
From S-Plus start xgobi on a matrix of data by xgobi(matrx).
XGobi is entirely driven by the mouse, and has a number of modes. Its appearance in tour mode is shown in figure 1.
The window can be resized as a whole in the usual way using the resize corners. The proportions of the bottom panel devoted to the buttons, graphics and variable selection can be adjusted by moving the tabs near the bottom of the separating lines with the LEFT mouse button (subsequently abbreviated to L, analogously M and R).
Some buttons hide menus of exclusive options, which are brought up by holding down L and releasing it when the desired option is highlighted. Others have menus which are brought up by a click. All stay-up menus and windows have a Click here to dismiss panel at the bottom-use L to do so. Many buttons are toggles-click with L to highlight them (and switch the function on) and click again to switch the function off.
Help is available on any feature of the command layout by clicking with R, which will bring up a help window. If the window has a scrollbar, clicking L will move the window down by the distance from the top to the mouse pointer, and clicking M will move the display to that point proportionally in the total display.
The top row of menus is organized are follows:
The right-hand panel selects variables. The variable name is a button which highlights if that variable is selected, and hides a menu of transformations (use L on the name). There is normally a box advising of the current functions of the buttons.
The circles below the variable names have two functions. They are used to select variables (by clicking with the circle; the precisely effect depends on the mode) and the line within the circle indicates the current projection of that variable axis (as a dot if it is orthogonal to the view plane). The boundary of the circle is made bolder if that variable is selected (see the figures).
This displays a univariate dotplot, like a histogram plotted sideways. The Cycle button tells XGobi to cycle sequentially through the plots of all the variables, and the speed scrollbar below this button determines how quickly the next plot is shown.
This displays a scatterplot of two variables. The Cycle button tells XGobi to cycle sequentially through the plots of all the variables, and the speed scrollbar below this button determines how quickly the next plot is shown.
The layout is somewhat different in Rotate mode; see figure 2.
Three variables must be selected, by either clicking either L or M
in the variables's circle.
(Initially the first three variables are selected.) There are
three sub-modes, corresponding to rotating about the X axis, the Y axis
or an arbitrary axis (the default), and variable selection differs in
each.

There is a speed slider which can be dragged with L or clicked with L at the desired speed of rotation. Change Direction toggles the direction of rotation. Pause is a self-explanatory toggle; Reinit resets the original set of axes. Rock causes the rotation to change direction every ten steps, so the speed slider also controls the angle of rock.
Save Coeffs saves the coefficients of the current projection (in two columns, X axis then Y axis, with rows the variables in order). If invoked from S-Plus, the coefficients will be saved as an S object in the current working directory, as a vector in column-first order, which can be read by matrix(name,2) To see this vector, you will have to execute the synchronize(1) command.
The current target variable is marked by a small circle in the center of its disc when the mouse pointer is the variable selection panel. Click (with either L or M) on a currently selected variable to make it the target. Click on an unselected variable to have it replace the target.
In this mode think of the points in the central window as within a transparent ball. When the rotation is paused, dragging with L or M in the central panel will mimic a reversed `trackball' action rotating that ball.
In this mode there are two X variables and a Y variable. One of the X variables is the target, and is used for the X axis in the initial display. Click with M on a variable's circle to make it the Y variable, and with L to make it an X variable. (The last selected X variable is the target, marked by a small circle in the center of its disc when the mouse pointer is the variable selection panel.)
Rotation is about the Y axis, and `trackball' action still works. The
additional option Interpolate rocks between the two X
axes.
As for Y axis, but interchange X and Y throughout.
This mode is used for Asimov's (1985) grand tour and for projection
pursuit. Its default operation is a continuous grand tour through the
space of selected variables (or nothing if there are just two).
This continually selects a random new projection in the -dimensional
space of currently selected variables, and moves continuously
(by interpolation) to that new projection. When the projection is
reached, another is chosen (and a slight pause may be visible).
In this mode variables are selected by clicking L within the variable's circle, and de-selected by clicking M. Any number of two or more variables can be selected, but if only two are currently selected, neither can be de-selected until a further variable is selected.
The left-hand panel controls operations. The slider at the top (see figure 1) controls the speed of rotation; drag it with L, or click with L to indicate the place you want. Pause is a self-explanatory toggle; Reinit resets the original set of axes. In Step mode, the tour stops at each new view; click L on Go to continue. If Local Scan is selected, the tour returns to the starting position after each new view.
The tour is `checkpointed' at each new view, and can be selectively re-played, using the Backtrack and F or B buttons. (Once Backtrack is selected, F/B selects the direction of replay. The number between the buttons indicates the last view number.
The I/O button hides a menu to save the coefficients of the current projection, and read/save the checkpoint history. (See under Save Coeffs for behavior from S-Plus.)
The Interp button hides a menu giving a choice of interpolation method.
The PrnComp Basis button transforms the variables to principal components. The variables change names to PC1, PC2, ... but axes are still displayed in the original variables. An implicit re-initialize is done when this button is selected or de-selected.
The Section button switches to a section tour. Points not within a tolerance Eps (set by a slider) of the current section hyperplane are shown as a single pixel point.
Selecting the ProjPrst button also selects principal component basis (necessary to `sphere' the data) and brings up another window which tracks the PP index as the tour progresses. Which index is selected from a menu hidden on the PP Index button, and this will have a slider indicating the number of terms in the polynomial or bandwidth of the kernel density estimate as appropriate. The indices are described in the help for that button; be warned that calculation of the Friedman-Tukey and Entropy indices seems very slow for large datasets.
N.B. No optimization is done until the Optimz button is selected. This will perform a (rough) gradient ascent when selecting new projections, and stop when a local maximum appears to have been found. Allowing the tour to proceed randomly between sessions of optimization may reveal other local maxima.
The bitmap button saves a small image of the picture below the track of the PP index, which can help to identify the points on the tour (including local maxima). The Return to Bitmap and Record Bitmap allow one to return to bitmaps, and to output the projection used. (Note: this will be in terms of the sphered principal components, not the original variables.)
Select by clicking L on the Scale button at any time when the view is stationary. The view can be dragged with L, and stretched with M. There are also buttons to perform these shifts.
The Stdzation button hides a menu controlling how variables are
scaled. There are three possibilities. The default is to rescale to
. Other options are to rescale to mean zero, variance one or to
median zero, MAD (median absolute deviation) one. The slider controls
how many standard deviations / MADs are within the display. These
options (especially the third) can be useful when extreme outliers are
present.
There is an option, vgroups, on invocation which ensures that groups of variables are scaled together, in which case the standardization applies to sample of all cases from all variables in the group. The vgroups is specified by a file or a S-Plus argument, listing for each variable its group number.
Select by clicking L on the Brush button at any time when the view is stationary. Within the view the brush size can be changed by dragging with M. Brushing is performed by dragging the brush outline (a rectangle) with L. (The rectangle can be moved without brushing by de-selecting Brush on.)
To brush, a color or glyph (symbol) or both must the selected from the hidden menus on the Color and Glyph buttons. (The brush rectangle changes to the currently selected brushing color.) The effect can be Persistent, Transient or Undo by selecting (with L) the appropriate button.
There are two erase buttons. The top one changes the effect of brushing to that of erasing points. The lower button hides a menu which will restore erased points, swap erased and non-erased points, and so on.
The button Make Group Var makes another variable which enumerates the current color/glyph groups. The button I/O enables the current color and glyph settings to be saved in a file.
Reset hides a menu of reset actions.
Select by clicking L on the Identify button at any time when the view is stationary. Moving the pointer over the view labels (by the row label) the nearest point. Clicking L makes the currently displayed label persistent.
There are two buttons; Remove Labels and Case profile with obvious effects.
Sticky labels will be preserved in future rotations or tours.
This document has been developed from the man pages written by Deborah Swayne and a previous short introduction by Brian Ripley.