Heike Hofmann
Tenure case
Selected Publications
- Hofmann H.: Multivariate Categorical
Data: Mosaic Plots. (3.5Mb) in Graphics of Large Datasets, Springer,
New York, 2006.
- Hofmann H.: Interactive biplots
for visual modelling. (3.7Mb) in Proceedings in Computational
Statistics 16th Symposium held in Prague, Czech Republic, Antoch,
Jaromir (Ed.), pp. 235–250, 2004.
- Ahn J.S., Cook D., Hofmann H.: A Projection
Pursuit Method on the multidimensional squared Contingency Table
(400k) Computational Statistics, Vol 18 (4), pp. 605–626,
2003.
This paper is originally based on a collaboration of Di and Ju Sun
- I only got involved right after I joined ISU in 2002. I helped
re-write section 2 and fixed some of the proofs.
- Hofmann H.: Constructing and
reading mosaicplots Computational Statistics and Data Analysis
(250k), Vol 43, No. 4, pp. 565-580, 2003.
- Hofmann H.: Generalised
Odds Ratios for Visual Modelling. (1.6Mb) In Journal of Computational
and Graphical Statistics, 10, pp 628-640, 2002.
- Unwin A., Hofmann H., Wilhelm A.: Direct
Manipulation Graphics for Data Mining. (380k) International
Journal of Image and Graphics, Vol. 2, No. 1, pp. 49-65, 2002.
The first half of the paper is part of a `real' collaboration -
those ideas stem from intensive discussions how best to visualize
association rules. It's therefore hard to tell afterwards who first
came up with which idea;
sections 3.2.2 - end directly comes from my own research.
Books
- Graphics of Large Datasets, with Antony Unwin and Martin Theus,
Springer, New York, 2006.
- Graphical Tools for the Exploration of Multivariate Categorical
Data, ISBN 3831116601, 2001.
in
Google Books
Creative Research
Software
- MANET. MANET is for exploring data, whether
raw data, transformed data or model residuals. MANET provides a
range of graphical tools specially designed for studying multivariate
features. Anyone involved in analysing data will find MANET useful
for gaining insights into the structure and relationships of their
data sets.
- GGobi. GGobi is software for data exploration using highly interactive statistical graphics. My main contribution is a first implementation of area charts, such as barcharts and histograms.
Prizes
Selected Talks
- Presidential Address WNAR 2003: Graphics
- an Ace in the Sleeve of a Statistician (3.3 MB PDF)
Statistical Graphics have a long tradition, dating back to the late
1700s when William Playfair primped up his Commercial and
Political Atlas with plots. Success stories attest to the
fact thatlives have literally been saved by statistical graphics.
When for example cholera struck London the source was found using
graphics.
Graphical displays give the data analyst a unique framework for
exploration, especially as we understand more about the possibilities
and limits of human visual perception. A statistical framework
underlying graphics helps determine whether what we see is actually
there probabilistically.
Technically, the capabilities of computing systems are very much
improved from twenty years ago. The approaches are very different
too, but the demands grow at the same rate at least, if not faster.Challenges
for modern visualization are ever increasing data sets of growing
complexity. New sources of data emerge such as in the developing
genomics and proteomics communities. Graphical displays provide
a vehicle for matching experts' knowledge with statistical tools,
and communicating to a wider audience.
Producing good graphics is an art - as a good magician's trick.
Unlike the magician a good statistician does not want to produce
an illusion but reveal the hidden qualities of the data. We
will show numerous famous statistical graphics examples - from the
early beginnings of statistical graphics up to modern visualization.
- DIV 2006: Mosaicplots and their Variations
(19.3 MB PDF)
Mosaicplots have been introduced by Hartigan and Kleiner (1981)
as a way of visualizing contingency tables. Named for their resemblance
to the art form, mosaicplots represent cells of a contingency table
by a composition of rectangles. Both size and position of these
rectangles are meaningful for the interpretation of mosaicplots,
making them one of the more advanced plots around. With a little
practice they become an invaluable tool in the representation and
exploration of multivariate categorical data. We will be discussing
ways of constructing mosaicplots Hofmann (2000). Mosaicplots have
the huge advantage of preserving all information of multivariate
contingency tables while presenting an overview at the same time.
As mosaicplots follow the hierarchy of their counterpart contingency
tables exactly, the order of variables in the tables is crucial.
Finding the “right” or at least “good” ordering
is commonly found to be one of the main difficulties first time
users experience with mosaicplots. We will discuss effects of changes
in the order and give recommendations how to obtain “good
plots”. Modelling of multivariate categorical models is usually
done with loglinear models. It can be shown (Hofmann, 2001; Theus
and Lauer, 1999; Friendly, 1992) that mosaicplots have excellent
mathe matical properties, which allow visual assessments of the
strength of interaction e tools for checking residuals and modelling
assumptions. We will discuss relationships between mo saicplots
and loglinear models. Close relatives of the mosaicplot such as
fluctuation diagrams and double decker plots (Hofmann et al., 2000)
have been found very useful in practice. We are going to have a
look into those and other important variations of mosaicplots. All
of these variations are essentially simplifications of the default
construction of mosaics. While losing some information these plots
put additional emphasis on a specific aspect of the data. From a
visualizer’s point of view, both treemaps, introduced by Shneiderman
(1992), and trellis plots (Becker et al., 1994) are gener alizations
of two di same structure, trellis plots are more flexible by not
necessarily displaying numbers as rectangles. Treemaps on the other
hand do show the data by rectangles, but are able to deal with more
general partitions than mosaicplots. These generalizations do not
come without losses, though. We will compare mosaicplots to these
other forms of displays in section 4 and comment on strengths and
weaknesses of each of them. Existing implementations of mosaicplots
are becoming more frequent. An implementation in R was done by Emerson
(1998). Mosaicplots in JMP (John Sall, 1989) have some limited interactive
features. Fully interactive mosaicplots are implemented e.g. in
MANET (Unwin, Hawkins, Hofmann, and Siegl, 1997), Mondrian (Theus,
2002) and KLIMT (Urbanek, 2002).
Sample lecture notes