CS 580 - Data Mining
Classes: TR, 6:45 pm - 8:00 pm, Maria Sanford Hall 210
Instructor: Dr. Zdravko Markov, MS 203, (860)-832-2711, http://www.cs.ccsu.edu/~markov/,
e-mail: markovz@ccsu.edu
Office hours: TR 8:00pm-9:00pm, MW 5:00pm-6:45pm, or by
Description: Data Mining studies algorithms and computational
paradigms that allow computers to find patterns and regularities in databases,
perform prediction and forecasting, and generally improve their performance
through interaction with data. It is currently regarded as the key element
of a more general process called Knowledge Discovery that deals
with extracting useful knowledge from raw data. The knowledge discovery
process includes data selection, cleaning, coding, using different statistical,
pattern recognition and machine learning techniques, and reporting and
visualization of the generated structures. The course will cover all these
issues and will illustrate the whole process by examples of practical applications.
Important related technologies as Data Warehousing and
On-line Analytical Processing (OLAP) will be also discussed. The students
will use recent Data Mining software.
Prerequisites: CS 501 and CS 502, basic knowledge of algebra,
discrete math and statistics.
Course Objectives
To introduce students to the basic concepts and techniques of Data Mining.
To develop skills of using recent data mining software for solving practical
To gain experience of doing independent study and research.
Required text: Ian H. Witten and Eibe Frank, Data Mining: Practical
Machine Learning Tools and Techniques with Java Implementations, Morgan
Kaufmann, 1999, ISBN 1-55860-552-5. Chapter 8 (Nuts And Bolts: Machine
Learning Algorithms In Java) is available online from the Weka
3 homepage.
Required software: Weka
3 Data Mining System - a free Machine Learning Software in Java.
Assignments, projects and grading: There will be 8 projects
requiring independent study and practical work with a data mining system
for solving data mining tasks and 2 quizzes. The final grade will
be based 80% on projects and 20% on quizzes. The letter grades will be
calculated according to the following table:
A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
WEB resources
Data Mining
Machine Learning
Tentative Schedule (don't print everything, keep track of the updates)
What is data mining?
Related technologies - Machine Learning, DBMS, OLAP, Statistics
Data Mining Goals
Stages of the Data Mining Process
Data Mining Techniques
Knowledge Representation Methods
Example: weather data
Reading: Lecture notes - Chapter 1, Witten
& Frank - Chapter 1, KDnuggets news article RE:
Statisticians vs Data Miners.
Lecture slides: Witten & Frank, Examples
of Data Mining Systems: Weka
3, DBMiner.
Data Warehouse and OLAP
Data preprocessing
Data mining knowledge representation
Attribute-oriented analysis
Attribute generalization
Attribute relevance
Class comparison
Statistical measures
Experiments: Attribute Selection with Weka (Instance-based: RefiefF and
Entropy-based: InfoGain, GainRatio)
Reading: Lecture notes - Chapter 5, Witten
& Frank - Section 7.1.
Quiz 1
Data mining algorithms: Association rules
Motivation and terminology
Example: mining weather data
Basic idea: item sets
Generating item sets and rules efficiently
Correlation analysis
Approximate Association Rule Mining: dealing with missing values and numeric
Experiments: Association rule mining
Reading: Lecture notes - Chapter 6, Witten
& Frank - Section 4.5.
Lecture slides: Witten & Frank.
Advanced reading: Approximate
Association Rule Mining.
Project 4 - Association mining
(due 10/16/2003)
Data mining algorithms: Classification
Basic learning/mining tasks
Inferring rudimentary rules: OneR algorithm
Decision trees
Covering rules
Experiments with Weka - ZeroR, OneR, J48, Prism.
Reading: Lecture notes - Chapter 7, Witten
& Frank - Sections 4.1, 4.3, 4.4.
Lecture slides: Witten & Frank
Project 5 - Classification
(due 10/31/2003)
Data mining algorithms: Prediction
Evaluating what's been learned
Quiz 2
Mining real data
Advanced techniques and applications Reading: Witten & Frank - Chapter
Relational data mining: building relational concepts, using more elaborated
background knowledge.
Text mining: extracting attributes (keywords), structural approaches (parsing,
soft parsing).
Web Usage
Analysis and User Profiling - 1999 SIGKDD
Semantic Web Mining
Semantic Web
Semantic Web Mining
Project 1 - create a data cube
No available at this time.
Quiz 1
The quiz description appears here at due time.
Quiz 2
The quiz description appears here at due time.