HW4: You have to use R to complete
this HW – copy paste your R commands and plots-and add your notes if necessary
for each problem:
Solution:
> cerealdata<-read.table("cereals.txt",header=TRUE)
> sod<-cerealdata[,7]
> sod
[1] 130 15 260 140 200 180 125 210 200 210 220 290 210 140 180 280 290 90 180
[20] 140 80 220 140 190 125 200 0 160 240 135 45 280 140 170 75 220 250 180
[39] 170 170 260 150 180 0 95 150 150 220 190 220 170 170 200 320 0 0 135
[58] 0 210 140 0 240 290 0 0 0 70 230 15 200 190 200 250 140 230 200
[77] 200
> hist(sod,n=20,prob=T)

lines(density(sod,bw=10))

> mean(sod)
[1] 159.6753
> sd(sod)
[1] 83.8323
> quantile(sod,0.8,type=1)
80%
220
Solution:
> cal_kel=cerealdata[,4][cerealdata[,2]=="K"]
> cal_nab=cerealdata[,4][cerealdata[,2]=="N"]
> summary(cal_kel)
Min. 1st Qu. Median Mean 3rd Qu. Max.
50.0 100.0 110.0 108.7 115.0 160.0
> summary(cal_nab)
Min. 1st Qu. Median Mean 3rd Qu. Max.
70.00 82.50 90.00 86.67 90.00 100.00
> boxplot(cal_kel,cal_nab)

Overall calorie content for Nabisco cereals is much less than that of the Kellog brand.
Solution:
> b=qt(0.67,7)
> b
[1] 0.4592146
> y<-rt(1000,7)
> prob=length(y[y<=b])/length(y)
> prob
[1] 0.663
Solution:
> ceodata<-read.table("ceo.txt",header=TRUE)
> plot(ceodata[,2],ceodata[,1])

The plot does not suggest strong association.
cor(ceodata[,2],ceodata[,1])
[1] 0.1275554
The correlation coefficient also supports the guess.
> qqnorm(ceodata[,1])
> qqline(ceodata[,1])

Yes, it looks like it is normally distributed.
> x<-ceodata[,1]
> confidenceint=c(mean(x)-qnorm(0.975,0,1)*sd(x)/sqrt(length(x)),(mean(x)+qnorm(0.975,0,1)*sd(x)/sqrt(length(x))))
> confidenceint
[1] 49.25111 53.83364
> length(x)
[1] 59
This confidence interval is
calculated using the
Solution:
> tree<-read.table("darwin.txt",header=TRUE)
> tree
cross self
1 23.500 17.375
2 12.000 20.375
3 21.000 20.000
4 22.000 20.000
5 19.125 18.375
6 21.500 18.625
7 22.125 18.625
8 20.375 15.250
9 18.250 16.500
10 21.625 18.000
11 23.250 16.250
12 21.000 18.000
13 22.125 12.750
14 23.000 15.500
15 12.000 18.000
> dim(tree)
[1] 15 2
> z<-tree[,1]-tree[,2]
> z
[1] 6.125 -8.375 1.000 2.000 0.750 2.875 3.500 5.125 1.750 3.625
[11] 7.000 3.000 9.375 7.500 -6.000
> t=mean(z)/(sd(z)/sqrt(length(z)))
> t
[1] 2.147987
> t>=qt(0.95,length(z)-1)
[1] TRUE
>
> qt(0.95,length(z)-1)
[1] 1.76131
> pval=1-pt(t,length(z)-1)
> pval
[1] 0.02485147
The paired-t test gives the t-statistic value 2.14789 and at 5% level the cut off is 1.76. So we reject H0(that mean difference = 0) and accept H1(that mean difference is >0). So at 5% level, the claim that growth in cross-fert plants is more is accepted ( a p-value of [1] 0.02485147 strongly supports this claim). At 1% level however, we will accept H0.