STAT 690E: Advanced Statistical Computing
Class meets: TTh 11:00 am -- 12:30 am in: Sweeney 1120
(see note at end of page)
Instructor: Ranjan Maitra (Ron-joan Moi-tro)
-
Office:123 Snedecor Hall
- Phone: (515)-294-7757
- E-mail:maitra
- Office Hours: tbd
Course Prerequisites:
- Good knowledge of Stat 580 or permission of instructor
- Excellent grounding in Stat 543
- Knowledge of C or interfacing a low-level programming language with C.
Grading Scheme:
- Homeworks: 50%
- Projects and Presentation (Oral/Written): 50%
Course Description: This course is designed for
Ph. D.-level students. It is designed to survey several
computer-intensive methods in statistical
inference. Because of the wide range of topics to be covered, there is
no prescribed text-book for the class. I will try to provide a
comprehensive set of notes. While the course is geared towards
statistical computation, we will also look into the theory behind
these methods. The broad outline for this course is as follows:
-
Review of Simulation: Brief review of methods for sampling
from given distributions: direct methods, rejection sampling, adaptive
rejection. Variance-reduction methods in simulation, importance sampling and
use of antithetic variates.
-
Expectation-Maximization Algorithm and MC-EM: Theory behind
Expectation-Maximization algorithm, EM in exponential
families. Variance estimation for E-M. Monte Carlo-EM: Applications to
medical image analysis and/or statistical genetics
-
Randomization tests and the Bootstrap: Idea and techniques
of Randomization tests; permutation tests, reference distributions,
re-sampling methods such as jackknife and bootstrap. Parametric and
non-parametric bootstrap. Estimation, confidence intervals and
hypothesis testing.
-
Markov Chain Monte Carlo: Markov Chains and Markov Random Fields;
use of Markov Random Field priors in Bayesian Inference and Markov Chain Monte
Carlo Methods; Gibbs sampling, burn-in, exact simulation; Multi-grid
methods. Applications to Spatial Models and Image Analysis
Homeworks:Homeworks will be handed out and due bi-weekly. These
will consist of proving results and applying and exploring concepts
learnt in class. A considerable part of the homework will involve
computer work. All homework turned in must be professionally presented.
Projects: There will be one very substantive final
project project assigned to each person during the semester. This
project will involve either doing some exploratory work on a research
problem, including
detailed background literature study etc, or the statistical analysis
of some appropriate dataset that you are interested in. You are
welcome to provide this dataset in consultation with me. The class
will also entail a very substantive final project involving a much
broader problem than a homework would, for instance, an engineering
design problem, where statistical computing is used to answer a
question quantitatively. The final project will culminate in a written
report and one or more oral presentations on the level of that for a
professional meeting in statistics and its applied disciplines.
For advanced students in the Ph. D. program, an appropriately chosen
project may serve as a portion of your dissertation.
Please note that you are individually responsible for performing the
statistical analysis, and for writing the final report. Each report
will be graded on a 15-point scale, with 5 points each for (a) the
validity of the statistical analysis, (b) the scientific
component, and (c) quality of the write-up in communicating the
results to an intended professional audience. You are
required to electronically provide me with all written code,
documentation and datasets used in the project. Your rights, if any,
to the data and software, will be preserved.
Statistical Software:
Most of the topics covered in this class will be exhibited
using the low-level language C, the mathematical libraries LAPACK and
SLATEC or CMLIB and the statistical software package
(publicly-available) "R", available for download at http://cran.r-project.org. "R"
is developed by a team of international researchers and operates under
the GNU Public License and is therefore free. It is very similar,
though not exactly the same software as the commercially available
Splus. Most commands in Splus work with "R". All lab machines
running Windows have "R" installed. Since the software is freely
available, you may download it from the above web site and use it on
your home computer. You may use either the Windows or Unix/Linux
versions. (You may also need to install additional free packages from
"R", using install.packages() as root or super-user or the
graphical-user interface in R.)
Please note that this is NOT a class in learning statistical
software. As such, you are welcome to use other software packages
but please be forewarned that not all packages may be
capable of doing everything connected with the class. You may of
course, write your own code in a low-level programming language such
as C but this should typically be a last resort option.
Other: The times of the class may be adjusted to meet student
requests and preferences. Further, to allow sufficient time for work
on the final project, the lectures will be mostly concentrated in the
first half of the semester: each class period will be lengthened to 1
hour and 50 minutes.
Course Homepage: The course homepage will be located on
the WWW at
http://www.public.iastate.edu/~maitra/stat690E/fall05.html.
I will try and keep this homepage as upto date as possible. However,
you are still responsible for any announcements made in class.