stat 579
Midterm II - Sample
The three questions below give points adding up to 100 points (not counting extra credit). A passing grade will start at about 40 points, any score above 80 will receive an A.
Warm-up (15 points)
File
brfss-2009-clean.csv contains the records from the BRFSS (Behavioral Risk Factor Surveillance System) as discussed in class earlier this semester.
- For each state, find the percentage of female and male population.
- The file fips-code.csv has a list of state names by FIPS code - used in the variable XSTATE in the BRFSS data. Use this data and the states data from the
maps package to draw a map of the U.S. (mainland only) with states colored by percentage of male respondents.
Working with text (40 points + 2 + 5 points of extra credit)
- Write a function
palindrome5(x) that is TRUE, if x is a palindrome of length 5 (i.e. x is five letters long and can be read from the back or the front, e.g. 'radar', 'level'). Extra points, if you can avoid an explicit loop or recursion.
-
The file course-description.txt contains a description of all the courses offered by the Statistics Department at ISU. Read this file and convert to a data frame with columns 'Course', 'Name', 'Description'. Extra points, if you successfully extract number of credits from the description as well.
- For the following gene sequence:
sequence <-
"ATGGATTCTGGTATGTTCTAGCGCTTGCACCATCCCATTTAACTGTAAGAAGAATTGCACGGTCCCAATTGCTCGAGAGA
TTTCTCTTTTACCTTTTTTTACTATTTTTCACTCTCCCATAACCTCCTATATTGACTGATCTGTAATAACCACGATATTA
TTGGAATAAATAGGGGCTTGAAATTTGGAAAAAAAAAAAAACTGAAATATTTTCGTGATAAGTGATAGTGATATTCTTCT
TTTATTTGCTACTGTTACTAAGTCTCATGTACTAACATCGATTGCTTCATTCTTTTTGTTGCTATATTATATGTTTAGAG
GTTGCTGCTTTGGTTATTGATAACGGTTCTGGTATGTGTAAAGCCGGTTTTGCCGGTGACGACGCTCCTCGTGCTGTCTT
CCCATCTATCGTCGGTAGACCAAGACACCAAGGTATCATGGTCGGTATGGGTCAAAAAGACTCCTACGTTGGTGATGAAG
CTCAATCCAAGAGAGGTATCTTGACTTTACGTTACCCAATTGAACACGGTATTGTCACCAACTGGGACGATATGGAAAAG
ATCTGGCATCATACCTTCTACAACGAATTGAGAGTTGCCCCAGAAGAACACCCTGTTCTTTTGACTGAAGCTCCAATGAA
CCCTAAATCAAACAGAGAAAAGATGACTCAAATTATGTTTGAAACTTTCAACGTTCCAGCCTTCTACGTTTCCATCCAAG
CCGTTTTGTCCTTGTACTCTTCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATGGTGTTACTCACGTCGTTCCA
ATTTACGCTGGTTTCTCTCTACCTCACGCCATTTTGAGAATCGATTTGGCCGGTAGAGATTTGACTGACTACTTGATGAA
GATCTTGAGTGAACGTGGTTACTCTTTCTCCACCACTGCTGAAAGAGAAATTGTCCGTGACATCAAGGAAAAACTATGTT
ACGTCGCCTTGGACTTCGAACAAGAAATGCAAACCGCTGCTCAATCTTCTTCAATTGAAAAATCCTACGAACTTCCAGAT
GGTCAAGTCATCACTATTGGTAACGAAAGATTCAGAGCCCCAGAAGCTTTGTTCCATCCTTCTGTTTTGGGTTTGGAATC
TGCCGGTATTGACCAAACTACTTACAACTCCATCATGAAGTGTGATGTCGATGTCCGTAAGGAATTATACGGTAACATCG
TTATGTCCGGTGGTACCACCATGTTCCCAGGTATTGCCGAAAGAATGCAAAAGGAAATCACCGCTTTGGCTCCATCTTCC
ATGAAGGTCAAGATCATTGCTCCTCCAGAAAGAAAGTACTCCGTCTGGATTGGTGGTTCTATCTTGGCTTCTTTGACTAC
CTTCCAACAAATGTGGATCTCAAAACAAGAATACGACGAAAGTGGTCCATCTATCGTTCACCACAAGTGTTTCTAA"
find all 3 letter 'words' (a word is a sequence of the letters 'A', 'G', 'C', and 'T') and the frequency of their occurence
Integration by simulation (45 points)
- Write a function
integral (n, a,b,f)
that estimates the area under function f between limits a and b using n uniform random numbers. The picture below shows a description of the process. In this example, 1000 pairs of random uniform values between [0.5,3.5] x [0,15] were used. 592 of them resulted in falling below the curve, giving an estimate for the are under curve as 0.592 * 3 * 15 = 26.64.
- Apply your function repeatedly (say, 20 times) to the curve f(x) = 2x(x-3)^2+5 between limits 1 and 4 using 2000 pairs of random numbers. Give an estimate for the area under the curve and a standard deviation.