IE 483x/583x
Knowledge Discovery and Data Mining
Course Information (Back)
Objective
Upon completing this course students will understand
how data mining can be used together with data warehousing and other knowledge
discovery technologies to create a competetive advantage for the enterprise.
In particular students will know how to implement common data mining techniques,
including statistical methods, machine learning methods, and visualization
techniques to extract patterns, trends, and other useful information from
databases.
Description
This course focuses on algorithm techniques
that can be used for data mining tasks such as classification, association
rule mining, clustering, and numerical prediction. This includes
probabilistic and statistical methods, genetic algorithms and neural networks,
visualization techniques, and mathematical programming. We also place
such data mining within the larger picture of knowledge discovery in databases
and in particular its relationship with data warehousing. We will
consider numereous case studies from both manufacturing and service industries.
Readings
Primary textbook:
The only required text is Ian
H. Witten and Eibe Frank, (2000), Data Mining: practical machine learning
tools and techniques with Java implementations, Morgan Kaufmann Publishers,
San Fransisco, CA. This book emphasizes
algorithms for model induction, that is, the actual data mining rather
than data preperation and other steps in the knowledge discovery process.
Secondary texts and resources:
-
Other general textbooks:
-
For a more business oriented introduction to
data mining, written primarily for data mining professionals, check out
Michael
J.A. Berry and Gordon Linoff (2000), Mastering Data Mining: the art
and science of customer relationship managment, John Wiley & Sons,
New York.
-
Another very good introductory textbook, that
takas a more database oriented approach, is Jiawei
Han and Micheline Kamber, (2001), Data Mining: Concepts and Techniques,
Morgan Kaufmann Publishers, San Fransisco, CA.
-
Some algorithms that will be discussed in class
are not covered by the textbook but can be found in Tom
Mitchell (1997), Machine Learning, McGraw-Hill, Boston.
This text discusses many of the same concepts from a more traditional artificial
intelligence perspective, including artificial neural networks and genetic
algorithms.
-
Clustering:
-
Association rule discovery:
-
Optimization and data mining:
-
Web mining:
-
The book Mining the
Web by Soumen Chakrabarti (Morgan Kaufmann, 2003) covers all
of the web mining material discussed in class (and more), as well as much
of the material on complex data types and text mining.
-
Explore the web
resources for more information!