Homework #7 - due in class, Friday, Mar 7, 2008.

This may seem to be a long assignment, but I believe each part will be very quick to answer. There is very little hand calculation required.

1) The data in ratwt.txt are from a study of rat growth on 6 different diets. The 6 diets included all possible combinations of 3 protein sources (beef, cereal and pork) and 2 levels of protein (low or high) in the diet. Rats were individually randomized to a specific diet. There are 10 rats per diet and a total of 60 rats in the study. The response is the weight gain in grams. The data file includes the level, the type, the response and a treatment number (1 through 6, indicating each combination of level and type).
a) Calculate the means for each of the 6 treatments and plot them in a way that illustrates whether or not there is an interaction between protein level and protein type.
b) Calculate the ANOVA table and test the hypotheses of no main effect of protein level, no main effect of protein type, and no interaction. Report the F statistics and p-values.
c) The types of protein sources suggests two important contrasts: the difference between beef and pork and the difference between cereal and meat (the average of beef and pork). Use the marginal means to estimate each contrast and its standard error. Test whether each contrast = 0. Report the estimate, s.e. and p-value for each contrast.
d) The investigators are uncomfortable with the marginal means because the interaction p-value is close to 5%. Estimate the value of each contrast from part c separately in the high level of protein and the low level of protein. Report the estimate, s.e. and p-value.
Hint: This is easiest if you start with a 1-way ANOVA using treatment number.
e) Using the information gathered in parts a through d (and anything else you feel is important), summarize the important effects found in this study.
Reminder: I am interesting in more than 'significant' or not.
f) Plot residuals from the two-way ANOVA model against predicted values. Is the assumption of equal variance reasonable? Are there any outliers?

2) The data in carrot.txt are from a study of carrots. This study compares two carrot stocks (T and H) and four sowing rates (1.5, 2, 2.5, and 3 lb/ac). The respond is the marketable yield of carrots. The experiment was conducted in a randomized complete block design. There are three blocks, each with 8 plots. The 8 treatments were randomly assigned to plots within each block.
Note: in case it helps, the data file includes a treatment variable which labels each of the 8 treatments as 1, 2, ... 8.
a) Calculate the ANOVA table and test the hypotheses of no main effect of stock, no main effect of sowing rate, and no interaction.
Hint: You need to combine ideas of experimental design (blocking) and treatment structure. The obvious way is the correct way.
b) construct an interaction plot, i.e. plot the cell means against the levels of one of the factors. Does the plot indicate an interaction? If the plot and the F test for rate*stock indicate different things, explain why (or how) that can happen.
Hint: think about the size of the s.e. of a cell mean.
c) The investigators are most interested in the yield difference for stock H between sowing at 2 lb/ac (current practice) and 3 lb/ac (proposed new practice). Before they collected the data, the investigators believe that stock H and stock T would respond similarly to sowing rate. Which is more appropriate to answer the investigator's question, the main effect or the simple effect?
d) Using the most appropriate method (from part c), estimate the yield difference between sowing rates of 2lb/ac and 3lb/ac. Calculate the s.e. of that difference.

3) Reconsider the range grass fertilation study from two weeks ago. The data are in range.txt. Two weeks ago, you wrote out contrast coefficients for two comparisons:
Between no fertilizer and the average of the four fertilized treatments
and, between the two with PO4 and the two without.
For treatments in the order that SAS uses:
100lbN, 100lbN.P, 50lbN, 50lbN.P No.fert
the coefficients are -0.25 -0.25 -0.25 -0.25 1 and -0.5 0.5 -0.5 0.5 0
Previously, you estimated the differences. This week:

a) Calculate the contrast SS for each of these two contrasts.
Hint: remember that the CONTRAST command in SAS provides the SS and F test. The syntax is exactly the same as the ESTIMATE command. Remember there are no commas between coefficients, just spaces.
b) Show that the two contrasts in part a are orthogonal. Remember the sample size is 5 for each treatment.
c) There are five treatments, so the treatment SS has 4 d.f. and the treatment SS can be written as the sum of SS from four orthogonal contrasts. Coefficients for two additional contrasts (using the same treatment order as above) are:
-0.5 -0.5 0.5 0.5 0 and
-1 1 1 -1 0
Compute the SS from these two contasts. Does the sum of the four contrast SS equal the 4 d.f. treatment SS? Should it?
d) Interpret the two additional contrasts in part c by describing, in words, what they are estimating.
e) Imagine that the investigators now want to know whether the type of fertilizer makes any difference. Previously, we have considered three separate parts of this question. Now, we simply want to know whether four of the five treatments have the same mean. I.e. do the 100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. If we only had these four treatments, this would be a one-way ANOVA. However, we want to use data from all five groups because the no fertilizer group still provides information about the pooled error. Using orthogonal contrasts, calculate the SS to test whether the 100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. You don't have to calculate the F statistic.
f) Use the 'lack of fit' approach (difference between an overall SS and a single contrast SS) to calculate the SS for the test of whether the 100lbN, 100lbN.P, 50lbN, 50lbN treatments have the same mean. Is this the same SS as in part e?) Should it be the same? Briefly explain why or why not.
g) SAS can calculate multiple d.f. contrasts for you. That is the purpose of the contrast statements in food2.sas that have two pieces separated by commas. Using the orthogonal contrasts from earlier, write out a contrast statement to calculate the 3 d.f. test of equality of four means in part e.
Hint: You only need three contrasts.
h) One advantage of having SAS 'do the addition' is that you are not restricted to orthogonal contrasts. Any set of contrasts that define 'four equal means' is sufficient. SAS takes care of the non-orthogonality for you. One set that is easy to write is: A-B = 0, A-C = 0, and A-D = 0, where A, B, C, and D are the four groups. The following contrast statement will compute the SS for the test of 'four equal means':
contrast 'four equal means' trt 1 -1 0 0 0, trt 1 0 -1 0 0, trt 1 0 0 -1 0;
Again, the above assumes that the groups are in the order:
100lbN, 100lbN.P, 50lbN, 50lbN.P No.fert
Add that contrast to your code and run it. Do you get the same SS as in parts f and g? If not, figure out (and summarize in your answer) what you need to do to get the same SS.

Hints for all the above parts:
1) My coefficients are for the treatment order. If you use the trtcode variable, the order is different.
2) Remember to include block in your model. It won't change the SS (because this is an RCBD), but it would change the MSerror and hence F tests.

4) The data in range2.txt are the same data used in problem 3. The treatments are labeled as they would be for a 2 way factorial. The Nrate variable is the nitrogen rate with 3 levels (0, 50 and 100) and the P variable is the presence (+) or absence (-) of phorsphorus.
Use proc glm; (or mixed with all fixed effects) to fit the usual 2 way ANOVA model (main effects and interaction). Use lsmeans to estimate the marginal means for the 2 P levels and the 3 N levels. Your output will have some unusual features. I can see three obvious ones and there may be more. Identify at least two of these unusual features.

5) Reconsider the millet spacing experiment from last week. The data are in millet.txt There are five spacing treatments (numbered 2, 4, 6, 8, and 10) arranged in a 5 x 5 Latin Square. The treatment numbers are the distance in cm between plants along a row. So the treatment numbered 2 has plants 2cm apart; the treatment numbered 4 has plants 4cm apart, and similarly for the other three treatments.
a) Calculate the means for each spacing treatment and plot the mean yield against spacing distance. Does the relationship between yield and spacing seem close to linear (i.e. close to a straight line)?

Contrasts provide a way to test whether a linear trend is sufficient to summarize the differences between treatment means.
b) The coefficients of the contrast to test for a linear effect of spacing are -2, -1, 0, 1, and 2 (for treatments in the order 2, 4, 6, 8, 10). (beware if treatments are in a different order if treatment is a categorical variable, i.e. input with a $).
Calculate the SS for that contrast and test the null hypothesis of no linear effect of spacing. Report the F statistic and p-value.
c) The contrast in part b accounts for one part of the treatment SS. The remaining SS (i.e. the left over part) are large when a linear model is not sufficient to describe the pattern in the means. Here, these left over SS have 3 d.f. If the F statistic for the 'left over' parts is significant, there is evidence that the response to spacing is not linear. Calculate the 'left over' SS and construct the F test. Report the F statistic and an approximate p-value.
Note: Calculating the left over SS is easiest by hand, so you will probably use F tables to get the p-value.