HW4: You have to use R to complete this HW – copy paste your R commands and plots-and add your notes if necessary for each problem:

 

  1. For the cereals data (story behind the data), make a histogram of sodium contents of all types of cereals measured (do it with number of bins = 20 and  prob=T). Also plot a smoothed curve on the histogram showing what true density may this random variable have (do it with bw=10) . Calculate mean, standard deviation, 80th percentile for this variable.

 

  1. For the cereals data (story behind the data) compare the calorie contents in the cereals manufactured by Kellogs and Nabisco, by giving their summary statistics(min, max, quartiles), and a single boxplot for both (you can do two boxplots in the same diagram – useful for comparison) for both these measurements. How do these two measurements compare?

 

  1. Find the 67th percentile for the t distribution with d.f=7 (call this number b). Generate 1000 samples from t(df=7) distribution. Estimate probability of getting a value smaller than b, using these 1000 samples.   

 

  1. For the CEO data (story behind data), plot ages and salaries in a scatter plot. Does it show strong association between variables (justify by calculating the correlation coefficient). Do a normal q-q  plot on the ages of the CEOs – does it look like it is normally distributed? Find a 95% confidence interval for the age of CEOs.

 

  1.  Darwin’s Data: Charles Darwin measured heights (in inches) for 15 pairs of plants. Each pair consists of two plants of the same age but one grown from a seed of cross-fertilized flower and the other from a self-fertilized flower. Test the hypothesis that the growth in cross-fertilized flower is more.