Data sets used in the text are all on the CD enclosed with the book.
You need to look at the README file for instructions on copying and
uncompressing the files.
Data sets that have been discussed in class or are used in homework
are:
Data discussed in Aug 27
lecture Each column is one set of data. All four sets have the
same mean and standard deviation but differ in other characteristics.
Data for HW 1, problem 26. Zinc in
rats. Variables are: Group (A or B) and zinc level (mg/ml).
Tomato fertilizer data discussed in
class and lab, Aug 30. Each row is a
data for a single plant. Variables are fertilizer (a: standard, b:
improved), and yield (lbs).
Transgenic mice weight dataData
for HW 2, problem 1.
Weights of transgenic and non-transgenic mice. Each row is a single
mouse. Variables are tg (0: non-transgenic, 1: transgenic) and
weight (in grams)
Radon concentrations in Ramsey Co,
MN.
Data for HW 2, problem 2. Each row is data for a single house. The
value is the radon concentration (units are picoCuries/liter of air, pCi/l).
Microsatellite counts for mutagen
study. Data for the problem on HW 3. The two columns are the dose of mutagen (0 or 80) and the
count of microsatellite nuclei in 100 cells.
Bumpus weight data Each line of data has two variables: weight (grams) and a code that
indicates whether that sparrow lived or died. Sparrows that survived have
a code of 1; sparrows that perished have a code of 2.
Bumpus
sparrow humerus length data The Bumpus sparrow data is that from Case Study 2.1 (p 28) Each row
is a sparrow. The first value is the humerus length (in 1000'ths of inches)
the second value is a code: 1 = perished, 2 = survived.
Change in blood pressure for men who
received 4 weeks of a fish oil diet
Schizophrenia data (Case Study 2.2) Each line of data
corresponds to a pair of maternal twins. The first number is the
hippocampus volume for the unaffected individual. The second number
is the hippocampus volume for the individuals affected by
schizophrenia.
Bioremediation data. This is a subset of data from an experiment
evaluating ways to grow crops in areas contaminated by radioactivity
(e.g. Chernobyl fallout). The two treatments here are HiK: a
fertilize with large amounts of potassium and Bio: a plastic barrier
to stop roots penetrating too deeply into the soil. Each line
corresponds to an experimental plot. The first variable is the
treatment; the second is the level of contamination (pCi/gm) in
collards grown in that plot.
Darwin
cross/self fertilization data In the Darwin data, each row of the data set represents a pair of plants.
The first number is some extra information not needed for this problem.
The second is the height (in inches) of the cross-fertilized plant. The
third is the height of the self-fertilized plant.
Bumpus
sparrow humerus length data The first column is the humerus
length in thousands'th of an inch. The second column is whether or
not they survived: 1 = died, 2 = survived.
Bee data for chapter 3, problem
28. First column is proportion of pollen removed, 2nd column is
duration of visit (in seconds) and 3nd column is type of bee: 1 = bumblebee, 2 =
honeybee workers.
Iron supplementation data for
chapter 3, problem 31. First column is the percent retention of
iron. The second column is the form of the iron: 1 = Fe3, 2=Fe4.
Rainfall
data from Case study 3.1 Used in transf.sas (lab on 27 Sept).
The first column is the rainfall, the second is a treatment code: 1
= unseeded day, 2 = seeded day.
Diet
and longevity data set (Case study 5.1), for diet.sas in lab on 11 Oct 2004.
Tyrannosaurus data Oxygen
isotope data on bones from a single T. rex for problem 5:23. Column 1 is the oxygen
isotope value; column 2 is the bone number.
Fatty acid data for
problem 5:18. The first column is the protein level, the second
column is the treatment (numbered 1 to 6 where 1 is CPFA 50, 2 is
CPFA 150, and so on through 6 is Control), the third column is the
day (1 to 5) and the last column is 'group'. Group has 10 unique
values, one for each unique combination of treatment and day. So,
the first five treatments are each one single group. Then, each day
of the control is a unique group.
Handicap
data set (Case study 6.1) for HW 7. The two columns are the
score of perceived qualifications and a code for the handicap. The
codes are: 1 = None, 2 = Amputee, 3=Crutches, 4=Hearing, and 5=Wheelchair.
Peanut and aflatoxin concentration data. The first column is the
percent clean peanuts. The second is the aflatoxin concentration
(ppb).
meat pH data set (case study 6.2), for meat.sas in lab on 25 Oct
2004.
Planet data) for HW 8.
Problem 7.14, data from display 1.15. Three columns: planet name,
order from sun, and distance from sun.
Pollen data, queens only for
HW 8, problem 7.17. This is the subset of bee.txt with only queen
data in it.
First column is proportion of pollen removed, 2nd column is
duration of visit (in seconds) and 3nd column is type of bee: 1 =
bumblebee queen, 2 = honeybee workers.
Music / brain activity data for
HW 8. First column is the number of years the subject has played a
string instrument. The second is the neuronal activity index, a
measure of brain activity.
Eruption.txt Wait time
between eruptions of the Old Faithful geyser in Yellowstone.
Columns are date (ignore for hw 9 problem), interval between
eruptions (in minutes), and duration of the interval (in minutes).
full meat pH data set , Meat.txt from case study 6.2 with 2
additional observations at 24 hr.
wine.txt Data on wine consumption
(liter/person/yr) and ischemic heart disease deaths (deaths/1000
people) for 18 industrialized countries. First column is the
country, second is the wine consumption and third is the mortality.
Anscombe data sets used in handout
that illustrated the need for regression diagnostics.
Brain weight data set (case study 9.2), for brain.sas in lab on
15 Nov 2004.
Flowering time data (Case study 9.1) Uses E or L to mark early
or late groups.
Flowering time data (Case study 9.1) Uses 1 or 2 to mark early
or late groups.
SAT score data (Case study 12.1). Note: Alaska is omitted.
Modified anscombe data sets used to
illustrate Cook's distance. A small amount of random jitter is
added to all X values. Used in anscombe2.sas
Corn data set Used in corn.sas to
illustrate polynomial regression
Collard data set Used in
collard.sas
Pygmalion experiment from
textbook Used in pygmalion.sas
Zinc and Copper data for homework
problem 13.16
Iridium data for HW problem 13.17.
Donner party data from
textbook (Case study 20.1) Used in donner.sas
Excel worksheet with Donner party data from
textbook (Case study 20.1) Used in readexcel.sas