STAT 690E: Advanced Statistical Computing
Class meets: MW 11:00 -- 13:00 in: Snedecor 119
(see note at end of page)
Instructor: Ranjan Maitra (Ron-joan Moi-tro)
-
Office:123 Snedecor Hall
- Phone: (515)-294-7757
- E-mail:maitra
- Office Hours: Th 2:00-3:00 pm
Course Prerequisites:
- Good knowledge of Stat 580 or permission of instructor
- Excellent grounding in Stat 543
- Knowledge of C or interfacing a low-level programming language with C.
Grading Scheme:
- Homeworks: 50%
- Projects and Presentation (Oral/Written): 50%
Course Description: This course is designed for
Ph. D.-level students. It is designed to survey several
computer-intensive methods in statistical
inference. Because of the wide range of topics to be covered, there is
no prescribed text-book for the class. I will try to provide a
comprehensive set of notes. While the course is geared towards
statistical computation, we will also look into the theory behind
these methods. The broad outline for this course is as follows:
-
Review of Simulation: Brief review of methods for sampling
from given distributions: direct methods, rejection sampling, adaptive
rejection. Variance-reduction methods in simulation, importance sampling and
use of antithetic variates.
-
Expectation-Maximization Algorithm and MC-EM: Theory behind
Expectation-Maximization algorithm, EM in exponential
families. Variance estimation for E-M. Monte Carlo-EM: Applications to
medical image analysis and/or statistical genetics
-
Randomization tests and the Bootstrap: Idea and techniques
of Randomization tests; permutation tests, reference distributions,
re-sampling methods such as jackknife and bootstrap. Parametric and
non-parametric bootstrap. Estimation, confidence intervals and
hypothesis testing.
-
Markov Chain Monte Carlo: Markov Chains and Markov Random Fields;
use of Markov Random Field priors in Bayesian Inference and Markov Chain Monte
Carlo Methods; Gibbs sampling, burn-in, exact simulation; Multi-grid
methods. Applications to Spatial Models and Image Analysis
Homeworks:Homeworks will be handed out and due bi-weekly. These
will consist of proving results and applying and exploring concepts
learnt in class. A considerable part of the homework will involve
computer work. All homework turned in must be professionally presented.
Projects: There will be one very substantive final project project assigned to each person during the semester. This project will involve either doing some exploratory work on a research problem, including detailed background literature study and analysis, etc or the statistical analysis of some appropriate dataset that you are interested in. You are welcome
to provide this dataset in consultation with me. Another possibility would be some methodological investigation into some aspect of the topics covered n the class, or some investigation into, for instance, an engineering design problem, where the methods covered may be used to answer a question quantitatively. The final project will culminate in a written report and an oral presentation on the level of that for a professional meeting in statistics and its applied disciplines. For advanced Ph. D. students, an appropriately chosen project may serve as a portion of your dissertation. Each report will be graded on a 15-point scale, with 5 points each for (a) the validity of the statistical analysis, (b) the scientific component, and (c) quality of the write-up and presentation in communicating the results to an intended professional audience.. You are
required to electronically provide me with all written code,
documentation and datasets used in the project. Your rights, if any,
to the data and software, will be preserved.
Statistical Software:
Most of the topics covered in this class will be exhibited
using the low-level language C, the mathematical libraries LAPACK and
FFTW and the statistical software package (publicly-available) "R", available
for download at http://cran.r-project.org. "R"
is developed by a team of international researchers and operates under
the GNU Public License and is therefore free. It is very similar,
though not exactly the same software as the commercially available
Splus. Most commands in Splus work with "R". All lab machines
running Windows have "R" installed. Since the software is freely
available, you may download it from the above web site and use it on
your home computer. You may use either the Windows or Unix/Linux
versions. (You may also need to install additional free packages from
"R", using install.packages() as root or super-user or the
graphical-user interface in R.)
Please note that this is NOT a class in learning statistical
software. As such, you are welcome to use other software packages
but please be forewarned that not all packages may be
capable of doing everything connected with the class. You may of
course, write your own code in a low-level programming language such
as C but this should typically be a last resort option.
Other: The times of the class may be adjusted to meet student
requests and preferences. Further, to allow sufficient time for work
on the final project, the lectures will be mostly concentrated in the
first half of the semester: each class period will be lengthened to 2
hours.
Course Homepage: The course homepage will be located on
the WWW at
http://www.public.iastate.edu/~maitra/stat680/fall06.html.
I will try and keep this homepage as upto date as possible. However,
you are still responsible for any announcements made in class.