* Instructor: * Dr Di Cook

325 Snedecor Hall

515-294-8865

dicook@iastate.edu `
www.public.iastate.edu/~dicook `

* Meeting Times: *
MWF 2:10-3:00 Room TBA

* Office hours: *
TBA

Most material is on Web CT

* Textbooks: *
W.N. Venables and B.D. Ripley (2003) "Modern Applied Statistics with S (4th ed)" Springer

Chatfield, C. (1995) "Problem Solving: A Statistician's Guide" Chapman and Hall/CRC

* Recommended Reading: *
Bishop, C. (2006) "Pattern Recognition and Machine Learning" Springer

Hastie, T., Tibshirani, R., and Friedman, J. (2001) "The Elements of Statistical Learning", Springer

Brian Ripley,
"Pattern Recognition and Neural Networks"

* Prerequisites: * Statistics 401, 447 or 341, or permission of the
instructor

* Description: *
Approaches to finding the unexpected in data: data mining, pattern
recognition and gaining understanding. Emphasis is on data-centered,
non-inferential statistics, for large or high-dimensional data, and
topical problems. Simple graphical methods, as well as classical and
computer-intensive methods applied in an exploratory manner, and
presentation graphics.

* Objectives: *
Information in our age is exploding in amount and complexity. New
disciplines, such as data mining, are emerging to address the needs in
this area. This course is designed to provide students with the
essentials for approaching new, complex data, and arriving at
preliminary descriptive statements.

* Approach: *
This will be a data-centered course, with segments of the semester
focusing on particular data sets, each getting more complex as the
semester progresses.

Dates | Topic | Notes | Data | Assignment | Homework | Code |

Jan 8, 10, 12 | Introduction to R. | Intro to R | Data | . | . | . |

Jan 17, 19 | What is data mining? | Outline , Background | . | . | Hwk 1 | Code |

end of Jan | Case study 1: Tipping Data | Notes | Data | Assignment 1, Womens tennis statistics, Mens tennis statistics | . | Code |

Feb | Case study 2: Olive Oils Data | Notes | Data | . | Hwk 2 , Hwk 3 Music data | Code |

Mar | Case study 3: Clustering music clips | Notes | Data | . | . | Code |

Classification of music clips | . | Notes | Data | . | . | . |

Apr | Case study 4: Hurricanes | Notes | Data | . | . | Code |

Mar 12 Exam 1 | 2005 Exam 1, Solution, Guide | Apr 30 Exam 2 | 2005 Exam 2, Solution, Guide | Apr 11 Project due | Apr 23, 25, 27 project presentations | . | . | . |

- Data cleaning: re-formulating variables to extract different types of information, handling missing values, fixing errors in data, transformations
- Interactive and dynamic graphics
- Classical procedures: regression, GLM, PCA, hierarchical and k-means clustering, model-based clustering
- Computationally intensive procedures: neural networks, CART, forests, support vector machines, smoothing, bootstrap, boosting, projection pursuit
- Presentation graphics: trellis/lattice/ggplot.

Copyright for the material on this page belongs to the course instructor.

Dianne Cook, Dept of Statistics, ISU, 325 Snedecor Hall, Ames, IA 50011-1210

Tel: (515) 294 8865, Fax: (515) 294 4040

email: dicook@iastate.edu

http://www.public.iastate.edu/~dicook/