This may seem to be a long assignment, but I believe each part will be very quick to answer. There is very little hand calculation required.
1) The data in ratwt.txt are from a
study of rat growth on 6 different diets. The 6 diets included all
possible combinations of 3 protein sources (beef, cereal and pork)
and 2 levels of protein (low or high) in the diet. Rats were
individually randomized to a specific diet. There are 10 rats per
diet and a total of 60 rats in the study. The response is the
weight gain in grams. The data file includes the level, the type,
the response and a treatment number (1 through 6, indicating each
combination of level and type).
a) Calculate the means for each of the 6 treatments and plot them in
a way that illustrates whether or not there is an interaction
between protein level and protein type.
b) Calculate the ANOVA table and test the hypotheses of no main
effect of protein level, no main effect of protein type, and no
interaction. Report the F statistics and p-values.
c) The types of protein sources suggests two important contrasts:
the difference between beef and pork and the difference between
cereal and meat (the average of beef and pork). Use the marginal
means to estimate each contrast and its
standard error. Test whether each contrast = 0. Report the
estimate, s.e. and p-value for each contrast.
d) The investigators are uncomfortable with the marginal means
because the interaction p-value is close to 5%. Estimate the value
of each contrast from part c separately in the high level of protein
and the low level of protein. Report the estimate, s.e. and
p-value.
Hint: This is easiest if you start with a 1-way ANOVA using
treatment number.
e) Using the information gathered in parts a through d (and anything else
you feel is important), summarize the important effects found in
this study.
Reminder: I am interesting in more than 'significant' or not.
f) Plot residuals from the two-way ANOVA model against predicted
values. Is the assumption of equal variance reasonable? Are there
any outliers?
2) The data
in carrot.txt are from a study of
carrots. This study compares two carrot stocks (T and H) and four
sowing rates (1.5, 2, 2.5, and 3 lb/ac). The respond is the
marketable yield of carrots. The experiment was conducted in a
randomized complete block design. There are three blocks, each with
8 plots. The 8 treatments were randomly assigned to plots within
each block.
Note: in case it helps, the data file includes a treatment variable
which labels each of the 8 treatments as 1, 2, ... 8.
a) Calculate the ANOVA table and test the hypotheses of no main
effect of stock, no main effect of sowing rate, and no interaction.
Hint: You need to combine ideas of experimental design (blocking)
and treatment structure. The obvious way is the correct way.
b) construct an interaction plot, i.e. plot the cell means against
the levels of one of the factors. Does the plot indicate an
interaction? If the plot and the F test for rate*stock indicate
different things,
explain why (or how) that can happen.
Hint: think about the size of the s.e. of a cell mean.
c) The investigators are most interested in the yield difference for
stock H between sowing at 2 lb/ac (current practice) and 3 lb/ac
(proposed new practice). Before they collected the data, the
investigators believe that stock H and
stock T would respond similarly to sowing rate. Which is more
appropriate to answer the investigator's question, the main effect or
the simple effect?
d) Using the most appropriate method (from part c), estimate the
yield difference between sowing rates of 2lb/ac and 3lb/ac.
Calculate the s.e. of that difference.
3) Reconsider the range grass fertilation study from two weeks ago.
The data are in range.txt.
Two weeks ago, you wrote out contrast coefficients for two comparisons:
Between no fertilizer and the average of the four fertilized treatments
and, between the two with PO4 and the two without.
For treatments in the order that SAS uses:
100lbN, 100lbN.P, 50lbN, 50lbN.P No.fert
the coefficients are -0.25 -0.25 -0.25 -0.25 1 and -0.5 0.5 -0.5 0.5 0
Previously, you
estimated the differences. This week:
a) Calculate the contrast SS for each of these two contrasts.
Hint: remember that the CONTRAST command in SAS provides the SS and F test.
The syntax is exactly the same as the ESTIMATE command. Remember there are no
commas between coefficients, just spaces.
b) Show that the two contrasts in part a are orthogonal.
Remember the sample size is 5 for
each treatment.
c) There are five treatments, so the treatment SS has 4 d.f. and the
treatment SS can be written as the sum of SS from four orthogonal contrasts.
Coefficients for two additional contrasts (using the same treatment order as above)
are:
-0.5 -0.5 0.5 0.5 0 and
-1 1 1 -1 0
Compute the SS from these two contasts. Does the sum of the four contrast SS
equal the 4 d.f. treatment SS? Should it?
d) Interpret the two additional contrasts in part c by describing,
in words, what they are estimating.
e) Imagine that the investigators now want to know whether the type
of fertilizer makes any difference. Previously, we have considered
three separate parts of this question. Now, we simply want to know
whether four of the five treatments have the same mean. I.e. do the
100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. If we
only had these four treatments, this would be a one-way ANOVA.
However, we want to use data from all five groups because the no
fertilizer group still provides information about the pooled error.
Using orthogonal contrasts, calculate the SS to test whether the
100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. You
don't have to calculate the F statistic.
f) Use the 'lack of fit' approach (difference between an overall SS
and a single contrast SS) to calculate the SS for the test of
whether the
100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. Is
this the same SS as in part e?) Should it be the same? Briefly explain why
or why not.
g) SAS can calculate multiple d.f. contrasts for you. That is the
purpose of the contrast statements in food2.sas that have two pieces
separated by commas. Using the orthogonal contrasts from earlier,
write out a contrast statement to calculate
the 3 d.f. test of equality of four means in part e.
Hint: You only need three contrasts.
h) One advantage of having SAS 'do the addition' is that you are not
restricted to orthogonal contrasts. Any set of contrasts that
define 'four equal means' is sufficient. SAS takes care of the
non-orthogonality for you. One set that is easy to write is: A-B =
0, A-C = 0, and A-D = 0, where A, B, C, and D are the four groups.
The following contrast statement will compute the SS for the test of
'four equal means':
contrast 'four equal means' trt 1 -1 0 0 0, trt 1 0 -1 0 0, trt 1 0
0 -1 0;
Again, the above assumes that the groups are in the order:
100lbN, 100lbN.P, 50lbN, 50lbN.P No.fert
Add that contrast to your code and run it. Do you get the same SS
as in parts f and g? If not, figure out (and summarize in your
answer) what you need to do to get the same SS.
Hints for all the above parts:
1) My coefficients are for the treatment
order. If you use the trtcode variable, the order is different.
2) Remember to include block in your model. It won't change the SS
(because this is an RCBD), but it would change the MSerror and hence
F tests.
4) The data in range2.txt
are the same data used in problem 3. The treatments are labeled as
they would be for a 2 way factorial. The Nrate variable is the
nitrogen rate with 3 levels (0, 50 and 100) and the P variable is
the presence (+) or absence (-) of phorsphorus.
Use proc glm; (or mixed with all fixed effects) to fit the usual
2 way ANOVA model (main effects and interaction). Use lsmeans to
estimate the marginal means for the 2 P levels and the 3 N levels.
Your output will have some unusual features. I can see three
obvious ones and there may be more. Identify at least two of these
unusual features.
5) Reconsider the millet spacing experiment from
last week. The data are in
millet.txt
There are five spacing treatments (numbered 2, 4, 6, 8, and 10) arranged in a
5 x 5 Latin Square. The treatment numbers are the distance in cm
between plants along a row. So the treatment numbered 2 has plants
2cm apart; the treatment numbered 4 has plants 4cm apart, and
similarly for the other three treatments.
a) Calculate the means for each spacing treatment and plot the mean
yield against spacing distance. Does the relationship between yield
and spacing seem close to linear (i.e. close to a straight line)?
Contrasts provide a way to test whether a linear
trend is sufficient to summarize the differences between treatment
means.
b) The coefficients of the contrast to test for a linear effect of
spacing are -2, -1, 0, 1, and 2 (for treatments in the order 2, 4,
6, 8, 10). (beware if treatments are in a different order if treatment
is a categorical variable, i.e. input with a $).
Calculate the SS for that contrast and test the null
hypothesis of no linear effect of spacing. Report the F statistic
and p-value.
c) The contrast in part b accounts for one part of
the treatment SS. The remaining SS (i.e. the left over part)
are large when a linear model is not sufficient to describe the
pattern in the means. Here, these left over SS
have 3 d.f. If the F statistic for the 'left over' parts is
significant, there is evidence that the response to spacing is not
linear. Calculate the 'left over' SS and construct the F test.
Report the F statistic and an approximate p-value.
Note: Calculating the left over SS is easiest by hand,
so you will probably use F tables to get the p-value.