More Examples Using Linked Software

Dianne Cook ( dicook@iastate.edu)
Jürgen Symanzik (symanzik@iastate.edu)
James J. Majure (jim@miner.com)
Noel Cressie ( ncressie@iastate.edu )

This document describes the linking of two software packages to provide exploratory dynamic graphical tools directly from within a geographic information system (GIS). The GIS we have used is ArcView 2.1® , which is a very widely used package for examining maps and images. XGobi (Swayne, Cook and Buja, 1991) is the dynamic graphics package that is publicly available and also widely used for exploring multivariate data. The link involves cross-referencing each location in the map view provided by ArcView with a multitude of plot types in XGobi. This cross-referencing allows a user to interact with either the map view or the XGobi plots by painting (that is, brushing) groups of points with different colors or glyph (shape) types, identifying points, or erasing them, and have the similar changes made automatically in the other view.

Examples of the different types of plots and interactions are given in the paper in both image and video format.

ArcView® is a registered trademark of Environmental Systems Research Institute, Inc., Redlands, CA, USA.

Keywords

Spatial Statistics, XGobi, ArcView, Visualization, Exploratory Data Analysis

Outline

The topics covered in this document include:

Introduction

The link between ArcView 2.1 and XGobi is designed to combine the strengths of both packages in the exploration and analysis of data collected in a spatial framework. ArcView 2.1 is a geographic information system (GIS) that allows spatial data in many forms to be displayed, queried, and manipulated. XGobi is a dynamic graphics program that allows multivariate data to be explored through the manipulation of scatterplots. XGobi provides tools for the exploration of multivariate data such as linked brushing, identification, the grand tour, and projection pursuit.

The link allows data collected at spatial locations (stored in ArcView as point attributes) to be dynamically passed to XGobi and explored. The link between the data points in XGobi and the locations from which they were collected is maintained through "linked brushing". Linked brushing, as used in this context, is the ability to change the glyph/size/color of the display elements (points) in either ArcView or XGobi and to see the corresponding points in the other application change simultaneously. If a user highlights points on a scatterplot, the corresponding geographic locations will be highlighted on the map.

The link comes in five forms. The Basic, or the multivariate link, simply passes the data associated with each spatial location into XGobi and allows it to be analyzed. The SCDF link takes data from an attribute and calculates and displays its empirical spatial cumulative distribution function (SCDF). This version of the link allows multiple regions to be specified for SCDF prediction and display. The Variogram Cloud link calculates and displays univariate and multivariate variogram cloud plots that can be interactively explored. This version of the link is used when exploring a data set for spatial dependence. The Lagged Scatterplot link creates and displays dynamic lagged scatterplots. The Multivariate Variogram Cloud link calculates and displays a special type of variogram cloud that is used to look for asymmetric, multivariate spatial dependence.

The use and implementation of linked brushing is necessarily different among links. It is one-to-one linking in the multivariate and SCDF links. It is one-to-two linking in the other forms of the link because one point in the diagnostic plots corresponds to two geographic locations in the map view. This linking makes the combination of ArcView and XGobi a powerful tool for the exploration of spatially referenced data.

Visualization Methods

The visualization methods used in the link derive from a long history of development of tools for exploratory multivariate data analysis within the field of statistical graphics. An early example of exploratory tools can be found in Fisherkeller, Friedman, and Tukey, (1974) . In this software, conceptual developments such as projecting multivariate data into 2 dimensions, rotation of variables into and out of a projection and erasing and masking of groups of points were realized as tools for interactive data analysis.

Brushing was defined by Newton (1978) to be interactively painting a group of points using a unique color or symbol (glyph). The paint brush is commonly a rectangle for speed of computation but can take the form of a polygon or circle or any other shape.

Linked brushing was introduced by McDonald (1982) as a means for cross-referencing information in different plots. When there is a one-to-one correspondence between points displayed in one view and those displayed in another view, brushing can be conducted in one view and the corresponding points in another view will take the same color and glyph. One view might be a scatterplot of temperature versus precipitation and the other view might be a textured dotplot (Tukey and Tukey, 1990) of elevation. In theory, any sort of linking between data sets can be defined but the simplest conceptually is the one-to-one correspondence. And, of course, the linking could use labels of points as identification rather than color or glyph type.

Various developments in dynamic graphics for multivariate data have been made. The most notable is a grand tour, which can be thought of as an extension of 3-d rotation to higher dimensional rotation. A grand tour (Asimov, 1985 ; Buja and Asimov, 1986) shows a viewer "the data from all sides" using a continuous space-filling algorithm to move over the space of all low-dimensional projections of the multivariate data. Details of the algorithm are provided in Buja and others (1996). Modifications to the algorithm to allow automatic guidance and manual controls can be found in Hurley and Buja (1990), Cook and others (1995), and Cook and Buja (1996).

All of the graphical methods for multivariate data mentioned above are available in XGobi. A good general reference containing examples of using these tools is Buja, Cook, and Swayne (1996). Examples of other packages that have most of the methods discussed above are Data Desk, XLispStat, and JMP. XGobi was chosen for the link because it is unique in the variety and controls available for running tours of the data, it is publicly available, and the authors are familiar with the XGobi code so that modifications could be made as needed and as suggested for enhancing the analytical process.

Linking variable plots with geography is generally considered to be important for analyzing spatially referenced data. It has been attempted and discussed in several places. McDonald and Willis (1987) use a grand tour linked to an image to assess clustering of landscape types in the band space of a LandSat image taken over Manaus, Brazil. Carr and others (1987) and Monmonier (1989) discuss linking a map to a scatterplot matrix display. McDougall (1992) discusses an exploratory system for the MacIntosh that links histograms and scatterplots with latitude and longtitude (and depth) coordinates. REGARD (Unwin, Wills, and Haslett, 1990) also links a map view with histograms and scatterplots, although it also has diagnostic plots for assessing spatial dependence (discussed below). The most recent development in linked graphics is the software cdv (Dykes, 1996) which links a variety of plots with the geography.

How the map is linked in differs amongst the software. In the simplest case the spatial coordinates are shown as a point scatter, but in REGARD, Monmonier's example, and cdv the map is drawn with polygons. Our link between ArcView and XGobi is really a fancy case of displaying the spatial coordinates as a scattercloud, but it utilizes the GIS strengths to overlay map features on the scattercloud.

Linking software is an approach that several others have taken to provide tools for examining spatial data. Scott (1994) discusses exploratory data analysis using STATA linked to ArcView in a PC environment. Haining, Ma and Wise (1996) discuss designing a software system for interactive exploration of spatial data by linking with Arc/Info. Anselin and Bao (1996) discuss linking ArcView with SpaceStat, and Mathsoft (1996) provides the S+Gislink which is a bi-directional link between Arc/Info and S-Plus.

One of the features of the ArcView-XGobi link is that some of the plots are ones that have been developed for exploring spatial dependence. In particular, the variogram cloud plot is often examined before fitting a variogram model (see, for example, Cressie (1993) for a discussion). The first implementation of linking geography with the variogram cloud plot can be found in Haslett and others (1991). The lagged scatterplot (Cressie, 1984; Rossi and others, 1992) and the multivariate variogram cloud plot (Majure and Cressie, 1996) are two new types of plots for examining spatial dependence and asymmetric spatial dependence amongst multiple variables.

The spatial cumulative distribution function (SCDF) relative to a given region and expressed as a function of , is simply the proportion of area in where a given variable is below the cut-off . That is, the SCDF is . It is interesting to examine the locations of the top 5\% (say) of all values of one or more variables viewed. If the variable is an indicator of health of the forest, where high values are bad, then these top 5\% would represent the locations with the worst health, so regions under stress can be identified. Also, the SCDFs for different spatial regions can be compared, which may be useful, for example, in examining differences in indicator values across political boundaries such as states. We assume that a set of measurements are available to researchers in the field:

obtained from sampling locations where . In the absence of complete knowledge of $\bfZ$ at all locations, the SCDF must be predicted. We shall consider the predictor,

where associated with the sampling locations are a set of weights and is the region of the study. The indicator function of a vector is defined to be . A more complete description can be found in Majure and others (1996A).

Linking Mechanism

The link between ArcView 2.1 and XGobi is based on an interprocess communication mechanism called Remote Procedure Call (RPC). The use of RPCs is a programming technique where a process on the local system (called client) invokes a procedure on a remote system (called server). The same terminology is used when client and server reside on the same system as in our implementation. A general discussion on RPCs can be found in Stevens (1990). Security issues of RPCs have been discussed in Corbin (1991). Technical details on how our link was set up can be found in Symanzik, Majure, and Cook (1996A) and Symanzik and others (1996B).

Hardware Requirements and Cost

The link originally has been developed for DEC alphastations. It also has been successfully compiled on Sun/Sparc workstations, SGI workstations, and workstations from Data General Corporation. We have not encountered any problem when adapting the link to the previously mentioned systems. However, it is more difficult to install the software on Sun/Solaris workstations. Running the software is dependent on the local configuration of the system. As long as the minimum configuration requirements for ArcView are met, the link has no further requirements as far as performance, storage, or speed of the hardware are concerned.

Our software is mostly intended for people who have been using ArcView for a while and have collected a large database of spatially referenced data to be analyzed within XGobi. For small and very infrequent applications, there is no need to purchase an expensive Geographic Information System such as ArcView. Instead, it is possible to incorporate spatial coordinates (e.g., latitude and longitude) into the data set and analyze this extended data set entirely within XGobi. However, whenever ArcView is available, this link is a nice extension to perform a dynamic graphical analysis of the spatial data.

Examples

(Each of them is small, and some contain gif images, others link to video segments.)

Conclusions

Overall, the link is fairly stable software that is available and ready for use by analysts working with georeferenced data. With the link, we provide tools for exploring large scale variation or trends and features such as clustering (Basic Link), and small scale variation with established plots, such as the variogram cloud, and yet-to-be-fully-tried-and-tested plots, such as the lagged scatterplot and the multivariate variogram cloud. Part of the work involved adapting these plots to the dynamic environment, so that approaches to the analysis which might have involved iterative work in a static graphic environment could be done interactively by changing a brush size or simply moving a brush through the points in the dynamic environment. One of the most important contributions of the work is to provide a way to examine multivariate spatial data in ways other than layering several maps above each other.

As we have used the link for examining data we have discovered that some additions are desired. Two main improvements to the current link are planned: (1) having multiple links open simultaneously and (2) providing a third component to the environment which improves the statistical analysis, for example including a link to S (Becker, Chambers, and Wilks, 1988) or XploRe (Härdle, Klinke, and Turlach, 1995).

At present, only one link may be used at any time. To use another, the user has to exit and restart the different link. For data analysis purposes it would be desirable to have multiple links open simultaneously. To implement this though requires very careful thought to the logical linking between the forms. With only one form in use at any time there is no confusion about the result of a brushing action.

Having multiple links open simultaneously requires modifications of XGobi and the ArcView code. In XGobi, we will need to introduce a new type of linked brushing, called "hierarchical" linked brushing (explained below). As described in Symanzik and others (1996A) multiple (cloned) copies of XGobi communicate to each other through the production and consumption of XEvents. Only one XGobi communicates back to ArcView 2.1. This mechanism only works well if all participating XGobis have the same number of points, which are arranged in the same order. However, if we have one XGobi for the attribute data and one XGobi for the univariate spatial CDF, points are ordered differently: in the order they are stored in ArcView 2.1 (attribute data) and ordered from smallest to largest (univariate spatial CDF). If we brush points labeled 1 to k in the attribute data, then the first k points of the univariate spatial CDF are simultaneously brushed, which is nonsensical.

Also, if one XGobi displays attribute data while another XGobi displays a variogram cloud plot, then brushing one point of the attribute data should have the same effect as brushing one point in ArcView 2.1, that is, brushing all related points in the variogram cloud plot and drawing connected lines in the ArcView 2.1 view (Symanzik and others, 1996B). This is not supported in the current implementation of XGobi and the link. There are similar problems that involve the spatially lagged scatterplot and the multivariate variogram cloud plot. The idea of "hierarchical" linked brushing is to support the communication of "similar" XGobis through XEvents, while "different" XGobis have to communicate through ArcView 2.1.

The second improvement involves the addition of a link to a statistical software package such as S (Becker, Chambers, and Wilks, 1988) or XploRe (Härdle, Klinke, and Turlach, 1995) in order to further process the data during an analysis. For example, one may want to fit a model to extract the trend and then look at the residuals for spatial dependence, or to transform some variables or create new variables from combinations of existing ones. A statistical analysis package linked into the graphical environment would allow this to be performed seamlessly rather than through outputing and inputing data.

Acknowledgements

Inna Megretskaia has contributed substantially to the coding for several forms of the link and provided smoothing code in XGobi. Mark Kaiser and Soumendra Lahiri provided valuable comments as this research was developing. We thank the editor and reviewers for their careful reviews and for providing us with insights into closely associated work being conducted in different disciplines. The research reported in this article has been funded by the U.S. Environmental Protection Agency through Cooperative Agreement CR822919 with Iowa State University. This paper has not been subjected to the Agency's peer and administrative review. No endorsement of the contents by the Agency should be inferred. Symanzik's research was also partially supported by a German "DAAD-Doktorandenstipendium aus Mitteln des zweiten Hochschulsonderprogramms".

References

Anselin, L. and Bao, S. (1996) Exploratory Spatial Data Analysis Linking SpaceStat and ArcView. Research Paper 9618, Regional Research Institute and Department of Economics, West Virginia University.

Asimov, D. (1985). The Grand Tour: A Tool for Viewing Multidimensional Data, SIAM Journal of Scientific and Statistical Computing, 6(1):128-143.

Becker, R., Chambers, J., Wilks, A. (1988). The New S Language - A Programming Environment for Data Analysis and Graphics, Wadsworth and Brooks/Cole,Pacific Grove, CA

Boyer, R. and Savageau, D. (1981). Places Rated Almanac, Rand McNally, Chicago, IL.

Buja, A. and Asimov, D. (1986). Grand Tour Methods: An Outline, Computing Science and Statistics, 17:63-67.

Buja, A., Cook, D., Asimov, D., Hurley, C. (1996). Dynamic Projections in High-Dimensional Visualization: Theory and Computational Methods, Journal of Computational and Graphical Statistics, Submitted.

Buja, A., Cook, D., Swayne, D. (1996). Interactive High-Dimensional Data Visualization, Journal of Computational and Graphical Statistics, 5(1):78-99.

Carr, D. B., Littlefield, R. J., Nicholson, W. L., Littlefield, J. S. (1987). Scatterplot Matrix Techniques for Large N, Journal of the American Statistical Association, 82(398):424-436.

Cook, D. and Buja, A. (1996). Manual Controls For High-Dimensional Data Projections, Journal of Computational and Graphical Statistics, Submitted.

Cook, D., Buja, A., Cabrera, J., Hurley, C. (1995). Grand Tour and Projection Pursuit, Journal of Computational and Graphical Statistics, 4(3):155-172.

Cook, D., Majure, J. J., Symanzik, J., Cressie, N. (1996). Dynamic Graphics in a GIS: Exploring and Analyzing Multivariate Spatial Data Using Linked Software, Computational Statistics, 11(4):467-480.

Corbin, J. R. (1991). The Art of Distributed Applications: Programming Techniques for Remote Procedure Calls, Springer, New York, Berlin, Heidelberg.

Cressie, N. (1984). Towards Resistant Geostatistics, In Geostatistics for Natural Resources Characterization, Part 1, Verly, G., David, M., Journel, A., Marechal, A. (Editors), Reidel, Dordrecht, 21-44.

Cressie, N. A. C. (1993). Statistics for Spatial Data (revised edition), Wiley, New York, NY.

Dykes, J.A. (1996). Dynamic Maps for Spatial Science: A Unified Approach to Cartographic Visualization. In Innovations in GIS 3, Parker, D. (Editor), Taylor & Francis, 177-187.

Fisherkeller, M., Friedman, J. H., Tukey, J. (1974). PRIM-9: An Interactive Multidimensional Data Display and Analysis System, ASA Statistical Graphics Video Lending Library (contact: dfs@bellcore.com) .

Haining, R., Ma, J., Wise, S. (1996). Design of a Software System for Interactive Spatial Statistical Analysis Linked to a GIS. Computational Statistics, 11(4):449-466.

Härdle, W., Klinke, S., Turlach, B. A. (1995). XploRe: An Interactive Statistical Computing Environment, Springer, New York, Berlin, Heidelberg.

Haslett, J., Bradley, R., Craig, P., Unwin, A., Wills, G. (1991). Dynamic Graphics for Exploring Spatial Data with Application to Locating Global and Local Anomalies, The American Statistician, 45(3):234-242.

Hurley, C. and Buja, A. (1990). Analyzing High-Dimensional Data with Motion Graphics, SIAM Journal on Scientific and Statistical Computing, 11(6):1193-1211.

MathSoft (1996). S+Gislink, Seattle: MathSoft, Inc.

McDonald, J. A. (1982). Interactive Graphics for Data Analysis, Technical Report, Orion II, Statistics Department, Stanford University .

McDonald, J. A. and Willis, S. (1987). Use of the Grand Tour in Remote Sensing, ASA Statistical Graphics Video Lending Library (contact: dfs@bellcore.com)

McDougall, E. B. (1992). Exploratory Analysis, Dynamic Statistical Visualization, and Geographic Information Systems. Cartography and Geographic Information Systems, 19(4):237-246.

Majure, J., Cook, D., Cressie, N., Kaiser, M., Lahiri, S., Symanzik, J. (1996A). Spatial CDF Estimation and Visualization with Applications to Forest Health Monitoring. Computing Science and Statistics, 27:93-101.

Majure, J. J. and Cressie, N., (1996). Dynamic Graphics for Exploring Spatial Dependence in Multivariate Spatial Data, Geographical Systems, Forthcoming.

Majure, J. J., Cressie, N., Cook, D., Symanzik, J. (1996B). GIS, Spatial Statistical Graphics, and Forest Health, Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, New Mexico, USA, January 21-25, 1996, Forthcoming.

Monmonier, M. (1989). Geographic Brushing: Enhancing Exploratory Analysis of the Scatterplot Matrix. Geographical Analysis, 21(1):81-84.

Newton, C. (1978). Graphica: From Alpha to Omega in Data Analysis. In Wang, Graphical Representation of Multivariate Data , Academic Press, New York, NY, 59-92.

Rossi, R. E., Mulla, D. J., Journel, A. G., Franz, E. H. (1992). Geostatistical Tools for Modeling and Interpreting Ecological Spatial Dependence, Ecological Monographs, 62, 277-314.

Scott, L. M. (1994) Identification of a GIS Attribute Error Using Exploratory Data Analysis. The Professional Geographer, 46:378-386.

Stevens, W. R., (1990). UNIX Network Programming, Prentice-Hall, Englewood Cliffs, NJ.

Swayne, D. F., Cook, D., Buja, A. (1991). XGobi: Interactive Dynamic Graphics in the X Window System with a Link to S. In ASA Proceedings of the Section on Statistical Graphics. American Statistical Association, Alexandria, VA, pages 1-8.

Symanzik, J., Majure, J., Cook, D. (1996A). Dynamic Graphics in a GIS: A Bidirectional Link between ArcView 2.0 and XGobi. Computing Science and Statistics, 27:299-303.

Symanzik, J., Megretskaia, I., Majure, J., Cook, D. (1996B). Implementation Issues of Variogram Cloud Plots and Spatially Lagged Scatterplots in the Linked ArcView 2.1® and XGobi Environment. Computing Science and Statistics, 28.

Symanzik, J., Majure, J. J., Cook, D. (1996C). The Linked ArcView 2.1 and XGobi Environment - GIS, Dynamic Statistical Graphics, and Spatial Data. In: Shekhar, S. and Bergougnoux, P. (Eds.), Proceedings of the Fourth ACM Workshop on Advances in Geographic Information Systems, Rockville, Maryland, November 15-16, 1996, ACM, New York, New York, pages 149-156.

Tukey, J. and Tukey, P. (1990). Strips Displaying Empirical Distributions: I. Textured Dot Strips, Technical Memorandum, Bellcore.

Unwin, A., Wills, G., Haslett, J. (1990). REGARD - Graphical Analysis of Regional Data. In ASA Proceedings of the Section on Statistical Graphics. American Statistical Association, Alexandria, VA, pages 36-41.

Downloading the Software

There is documentation on obtaining a copy of the software and installing it available on http://www.gis.iastate.edu/XGobi-AV2/homepage.html
Last Revision: Fri Dec 20 11:47:49 CST 1996