Hw 5 Solutions 1a) proc glm; class trt; model weight=trt; output out=two residual=ehat predicted=yhat; run; The GLM Procedure Dependent Variable: weight Sum of Source DF Squares Mean Square F Value Pr > F Model 1 115.200000 115.200000 2.15 0.1596 Error 18 963.600000 53.533333 Corrected Total 19 1078.800000 These results suggest that there is no treatment effect. (F = 2.15, p-value = 0.1596). 1b) i) Independence – This assumption is not satisfied. Within each pot are two plants. It is usually not safe to assume that two plants within a single pot are independent. Treatments were assigned to pots rather than plants, so pots are the experimental unit. When there are multiple observations corresponding to any one experimental unit, it is usually not appropriate to assume that the observations are independent. ii) Constant variance – The assumption is roughly satisfied. The variation of residuals in the vertical direction is roughly the same for both treatments. iii) Normal distribution of errors – This also is roughly satisfied. The residuals have a pattern that is rather uniform, but there are no outliers and most points lie along the line in the normal probability plot. 1c) The experimental unit is the pot. 1d) The observational unit is the plant. 1e) This problem asks you to compute one mean for each experimental unit (pot) and conduct an analysis of those means as if they were the original data. You could have easily computed the means by hand and typed them into SAS. It is also possible to get SAS to compute the means for you using the following code. proc means data=one; var weight; by trt pot; output out=two mean=mean; run; proc print; run; Obs trt pot _TYPE_ _FREQ_ mean 1 1 1 0 2 15.0 2 1 2 0 2 15.5 3 1 3 0 2 15.0 4 1 4 0 2 15.5 5 1 5 0 2 15.0 6 2 6 0 2 20.0 7 2 7 0 2 20.0 8 2 8 0 2 20.5 9 2 9 0 2 20.0 10 2 10 0 2 19.5 proc glm data=two; class trt; model mean=trt; run; The GLM Procedure Dependent Variable: mean Sum of Source DF Squares Mean Square F Value Pr > F Model 1 57.60000000 57.60000000 576.00 <.0001 Error 8 0.80000000 0.10000000 Corrected Total 9 58.40000000 F = 576 with 1 & 8 df. p-value <0.0001 We can conclude that the plants treated with the placebo had significantly greater weights then the plants treated with the fungal pathogen. 1f) The errors in the first analysis were negatively correlated; thus the variance is overestimated in the first analysis. We don’t see a significant treatment effect initially because although the true variation among experimental units treated alike is quite low, it seems high because of the negative correlation among plants within pots. The negative correlations makes it seem like there is more variation in response to treatment than there really is. Note that the negative correlation can be observed by examining the residual plot with the points labeled by pot number. The higher the residual corresponding to one plant in a pot, the lower the residual for the other plant in the same pot. The better one plant does, the worse its partner does. This can sometimes occur in field experiments where fast growing varieties shade neighboring varieties. 2a) i) Independence – We have no information about this in the problem, so we will assume independence holds. ii) Constant variance – This is violated. The variation of the residuals increases with dose. This suggests that the variation of the error terms is not constant. iii) Normality – This also is violated somewhat. There are outliers present in the lower tail. 2b) The best transformation is the cube root. The log transformation over corrects, leaving the low-dose observations more variable than the high dose observations. The square root is better than the log, but it still looks like there is more variability at high doses than at low doses. The cube root transformation is a good compromise between square root and log. The SAS code is below: proc glm data=one; class dose; model count=dose; output out=two residual=ehat predicted=yhat; run; proc univariate plot data=two; var ehat; run; proc plot data=two; plot ehat*yhat; run; data three; set one; logct=log(count); run; proc glm data=three; class dose; model logct=dose; output out=four residual=ehatlog predicted=yhatlog; run; proc univariate plot data=four; var ehatlog; run; proc plot data=four; plot ehatlog*yhatlog; run; data five; set one; sqrtct=count**(1/2); run; proc glm data=five; class dose; model sqrtct=dose; output out=six residual=ehatsqrt predicted=yhatsqrt; run; proc univariate plot data=six; var ehatsqrt; run; proc plot data=six; plot ehatsqrt*yhatsqrt; run; data seven; set one; cubertct=count**(1/3); run; proc glm data=seven; class dose; model cubertct=dose; output out=eight residual=ehatcubert predicted=yhatcubert; run; proc univariate plot data=eight; var ehatcubert; run; proc plot data=eight; plot ehatcubert*yhatcubert; run;