CS 580 - Data Mining
Fall-2003
Classes: TR, 6:45 pm - 8:00 pm, Maria Sanford Hall 210
Instructor: Dr. Zdravko Markov, MS 203, (860)-832-2711, http://www.cs.ccsu.edu/~markov/,
e-mail: markovz@ccsu.edu
Office hours: TR 8:00pm-9:00pm, MW 5:00pm-6:45pm, or by
appointment
Description: Data Mining studies algorithms and computational
paradigms that allow computers to find patterns and regularities in databases,
perform prediction and forecasting, and generally improve their performance
through interaction with data. It is currently regarded as the key element
of a more general process called Knowledge Discovery that deals
with extracting useful knowledge from raw data. The knowledge discovery
process includes data selection, cleaning, coding, using different statistical,
pattern recognition and machine learning techniques, and reporting and
visualization of the generated structures. The course will cover all these
issues and will illustrate the whole process by examples of practical applications.
Important related technologies as Data Warehousing and
On-line Analytical Processing (OLAP) will be also discussed. The students
will use recent Data Mining software.
Prerequisites: CS 501 and CS 502, basic knowledge of algebra,
discrete math and statistics.
Course Objectives
To introduce students to the basic concepts and techniques of Data Mining.
To develop skills of using recent data mining software for solving practical
problems.
To gain experience of doing independent study and research.
Required text: Ian H. Witten and Eibe Frank, Data Mining: Practical
Machine Learning Tools and Techniques with Java Implementations, Morgan
Kaufmann, 1999, ISBN 1-55860-552-5. Chapter 8 (Nuts And Bolts: Machine
Learning Algorithms In Java) is available online from the Weka
3 homepage.
Required software: Weka
3 Data Mining System - a free Machine Learning Software in Java.
Assignments, projects and grading: There will be 8 projects
requiring independent study and practical work with a data mining system
for solving data mining tasks and 2 quizzes. The final grade will
be based 80% on projects and 20% on quizzes. The letter grades will be
calculated according to the following table:
A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
WEB resources
-
Data Mining
-
Machine Learning
Tentative Schedule (don't print everything, keep track of the updates)
-
Introduction
-
Topics:
-
What is data mining?
-
Related technologies - Machine Learning, DBMS, OLAP, Statistics
-
Data Mining Goals
-
Stages of the Data Mining Process
-
Data Mining Techniques
-
Knowledge Representation Methods
-
Applications
-
Example: weather data
-
Reading: Lecture notes - Chapter 1, Witten
& Frank - Chapter 1, KDnuggets news article RE:
Statisticians vs Data Miners.
-
Lecture slides: Witten & Frank, Examples
of Data Mining Systems: Weka
3, DBMiner.
-
Data Warehouse and OLAP
-
Data preprocessing
-
Data mining knowledge representation
-
Attribute-oriented analysis
-
Topics:
-
Attribute generalization
-
Attribute relevance
-
Class comparison
-
Statistical measures
-
Experiments: Attribute Selection with Weka (Instance-based: RefiefF and
Entropy-based: InfoGain, GainRatio)
-
Reading: Lecture notes - Chapter 5, Witten
& Frank - Section 7.1.
-
Quiz 1
-
Data mining algorithms: Association rules
-
Topics:
-
Motivation and terminology
-
Example: mining weather data
-
Basic idea: item sets
-
Generating item sets and rules efficiently
-
Correlation analysis
-
Approximate Association Rule Mining: dealing with missing values and numeric
data
-
Experiments: Association rule mining
-
Reading: Lecture notes - Chapter 6, Witten
& Frank - Section 4.5.
-
Lecture slides: Witten & Frank.
-
Advanced reading: Approximate
Association Rule Mining.
-
Project 4 - Association mining
(due 10/16/2003)
-
Data mining algorithms: Classification
-
Topics:
-
Basic learning/mining tasks
-
Inferring rudimentary rules: OneR algorithm
-
Decision trees
-
Covering rules
-
Experiments with Weka - ZeroR, OneR, J48, Prism.
-
Reading: Lecture notes - Chapter 7, Witten
& Frank - Sections 4.1, 4.3, 4.4.
-
Lecture slides: Witten & Frank
-
Project 5 - Classification
(due 10/31/2003)
-
Data mining algorithms: Prediction
-
Evaluating what's been learned
-
Quiz 2
-
Mining real data
-
Clustering
-
Advanced techniques and applications Reading: Witten & Frank - Chapter
9.
-
Relational data mining: building relational concepts, using more elaborated
background knowledge.
-
Text mining: extracting attributes (keywords), structural approaches (parsing,
soft parsing).
-
Web Usage
Analysis and User Profiling - 1999 SIGKDD
Conference
-
Semantic Web Mining
-
Semantic Web
-
Semantic Web Mining
Project 1 - create a data cube
No available at this time.
Quiz 1
The quiz description appears here at due time.
Quiz 2
The quiz description appears here at due time.