HW4: You have to use R to complete
this HW – copy paste your R commands and plots-and add your notes if necessary
for each problem:
- For the cereals data (story behind
the data), make a histogram of sodium contents of all
types of cereals measured (do it with number of bins = 20 and prob=T). Also plot a smoothed
curve on the histogram showing what true density may this random
variable have (do it with bw=10)
. Calculate mean, standard deviation, 80th percentile for
this variable.
- For the cereals data (story behind
the data) compare the calorie contents in the cereals
manufactured by Kellogs and Nabisco, by giving their summary
statistics(min, max, quartiles), and a single boxplot for both (you
can do two boxplots in the same diagram – useful for comparison) for both
these measurements. How do these two measurements compare?
- Find the 67th percentile for the t
distribution with d.f=7 (call this number b). Generate 1000
samples from t(df=7) distribution. Estimate probability of
getting a value smaller than b, using these 1000 samples.
- For the CEO data (story behind
data), plot ages and salaries in a scatter plot.
Does it show strong association between variables (justify by calculating
the correlation coefficient). Do a normal q-q plot on the ages
of the CEOs – does it look like it is normally distributed? Find a 95%
confidence interval for the age of CEOs.
- Darwin’s Data: Charles
Darwin measured heights (in inches) for 15 pairs of plants. Each pair
consists of two plants of the same age but one grown from a seed of
cross-fertilized flower and the other from a self-fertilized flower. Test
the hypothesis that the growth in cross-fertilized flower is more.