STAT 580: Statistical Computing - I.  

Class meets: TTh 11:00 am -- 12:30 pm in: Mol Biol 1428

Instructor: Ranjan Maitra (Ron-joan Moi-tro)

Course Prerequisites:
  1. Stat 579
  2. Stat 447 or Stat 542
Grading Scheme: Course Syllabus: Introduction to scientific computing for statistics using tools and concepts in R: programming tools, modern programming methodologies, modularization, design of statistical algorithms. Introduction to C programming for efficiency; interfacing R with C. Building statistical libraries. Use of algorithms in modern subroutine packages, such as LAPACK, QUADPACK, MINPACK for computational linear algebra, optimization and integration. Implementation of simulation methods: inversion of the probability integral transform, rejection sampling, importance sampling. Monte Carlo integration. Course Description: This course is designed for Ph. D.-level and advanced Masters-level students. In this course, we will study the tools needed for scientific computing for statistics. We shall do this by developing theory and methodology of simulation and estimation methods. Since an important part of developing numerical solutions is mastering how computer hardware and software work, a major emphasis on the class will be in programming concepts and methods.   Textbook: Because all the material is spread out over three to four books, there are no required textbooks for this class. However, the following books are highly recommended reading material: Statistical Software: The statistical software used throughout this class will be R. R is very similar to Splus but comes under the GNU Public License. It is a comprehensive statistical software package freely available from http://www.R-project.org/. R is developed by a team of international researchers and operates under the GNU Public License and is free. It is very similar, though not the exact same software as the commercially available Splus. Most commands in Splus work with R. All lab machines running Windows and Linux have R installed. Since the software is freely available, you may download it from the above web site and use it on your home computer. You may use either the Windows version or the Unix/Linux version. Please note that your installation of R is at your own risk, though the department systems administrators can perhaps help. You may not use Splus in lieu of R in this class. Computer Programming: The low-level programming language taught and used in this class will be C. C is a programming language developed by Dennis M. Ritchie in the early 1970's at AT&T's Bell Laboratories. We will be using the language as officially standardized by the American National Standards Institute in 1989 -- hence we will be using ANSI C (also called C89, for short). Note that in 1989, the International Organization for Standardization (ISO) adopted ANSI's definition of C as the international standard, whereupon the language became known as ANSI/ISO C. Further changes and extensions to the C language continued to be made, in response to rapid changes in computer hardware technology, hardware and software. A new standard for the C language was announced in 1999, known as ISO/IEC 9899:1999, or C99 for short. Compilers for C99 are beginning to become available, and the GNU compilers support C99 from release 3.0 onwards. A history of the development of the C Language is provided by Dennis M. Ritchie at http://cm.bell-labs.com/cm/cs/who/dmr/chist.html. Computer Operating System: The operating system used in this class will be the Linux operating system. In layman's terms, and operating system is a program that supervises the working of a computer. (Other popular operating systems are Microsoft Windows, Mac OS and the Unix operating system.) Similar to R, Linux is a free Unix-type operating system originally created by Linus Torvalds with the assistance of developers around the world. Developed under the GNU General Public License (GPL), the source code for Linux is freely available to everyone. More information on the Linux operating system is available at www.linux.org/info/index.html. Like Unix, Linux provides flexibility and is very useful for scientific computing. This is because one can optimize the resources and gear it towards computing that is of interest to us. Because the source code for Linux is freely available, it can be (and is) packaged in several forms. Each of these are called distributions: examples are Fedora Core (sponsored by Redhat), SuSE (now owned by Novell), Mandrake, etc. Iowa State University has site licenses for the Redhat Enterprise Linux operating system. You may use this, or any other linux distribution. The statistics department labs currently dual-boot into Microsoft Windows and Fedora Core 1. There is also a linux lab at the University's Academic Information Technologies building in Durham Hall. Feel free to install Linux at home. An added advantage of using Linux is that you can log in from off-campus and run your program on a department machine, using X terminal emulators. You will be taught how to use Linux in the department lab. Homeworks: Homeworks will be handed out every two weeks. This will mostly consist of applying and exploring the concepts learnt in class. Parts of the homeworks will involve theoretical derivations. A considerable part of the homework will involve programming computer work. Please note that your program should be e-mailed to me, and should be annotated. Your grade on the program will be dependent on the output of your program. Course Homepage: The course homepage will be located on the WWW at http://www.public.iastate.edu/~maitra/stat580/spring2005.html. I will try and keep this homepage as up to date as possible. However, you are still responsible for any announcements made in class.