CS 580  Data Mining
Fall2003
Classes: TR, 6:45 pm  8:00 pm, Maria Sanford Hall 210
Instructor: Dr. Zdravko Markov, MS 203, (860)8322711, http://www.cs.ccsu.edu/~markov/,
email: markovz@ccsu.edu
Office hours: TR 8:00pm9:00pm, MW 5:00pm6:45pm, or by
appointment
Description: Data Mining studies algorithms and computational
paradigms that allow computers to find patterns and regularities in databases,
perform prediction and forecasting, and generally improve their performance
through interaction with data. It is currently regarded as the key element
of a more general process called Knowledge Discovery that deals
with extracting useful knowledge from raw data. The knowledge discovery
process includes data selection, cleaning, coding, using different statistical,
pattern recognition and machine learning techniques, and reporting and
visualization of the generated structures. The course will cover all these
issues and will illustrate the whole process by examples of practical applications.
Important related technologies as Data Warehousing and
Online Analytical Processing (OLAP) will be also discussed. The students
will use recent Data Mining software.
Prerequisites: CS 501 and CS 502, basic knowledge of algebra,
discrete math and statistics.
Course Objectives
To introduce students to the basic concepts and techniques of Data Mining.
To develop skills of using recent data mining software for solving practical
problems.
To gain experience of doing independent study and research.
Required text: Ian H. Witten and Eibe Frank, Data Mining: Practical
Machine Learning Tools and Techniques with Java Implementations, Morgan
Kaufmann, 1999, ISBN 1558605525. Chapter 8 (Nuts And Bolts: Machine
Learning Algorithms In Java) is available online from the Weka
3 homepage.
Required software: Weka
3 Data Mining System  a free Machine Learning Software in Java.
Assignments, projects and grading: There will be 8 projects
requiring independent study and practical work with a data mining system
for solving data mining tasks and 2 quizzes. The final grade will
be based 80% on projects and 20% on quizzes. The letter grades will be
calculated according to the following table:
A 
A 
B+ 
B 
B 
C+ 
C 
C 
D+ 
D 
D 
F 
95100 
9094 
8789 
8486 
8083 
7779 
7476 
7073 
6769 
6466 
6063 
059 
WEB resources

Data Mining

Machine Learning
Tentative Schedule (don't print everything, keep track of the updates)

Introduction

Topics:

What is data mining?

Related technologies  Machine Learning, DBMS, OLAP, Statistics

Data Mining Goals

Stages of the Data Mining Process

Data Mining Techniques

Knowledge Representation Methods

Applications

Example: weather data

Reading: Lecture notes  Chapter 1, Witten
& Frank  Chapter 1, KDnuggets news article RE:
Statisticians vs Data Miners.

Lecture slides: Witten & Frank, Examples
of Data Mining Systems: Weka
3, DBMiner.

Data Warehouse and OLAP

Data preprocessing

Data mining knowledge representation

Attributeoriented analysis

Topics:

Attribute generalization

Attribute relevance

Class comparison

Statistical measures

Experiments: Attribute Selection with Weka (Instancebased: RefiefF and
Entropybased: InfoGain, GainRatio)

Reading: Lecture notes  Chapter 5, Witten
& Frank  Section 7.1.

Quiz 1

Data mining algorithms: Association rules

Topics:

Motivation and terminology

Example: mining weather data

Basic idea: item sets

Generating item sets and rules efficiently

Correlation analysis

Approximate Association Rule Mining: dealing with missing values and numeric
data

Experiments: Association rule mining

Reading: Lecture notes  Chapter 6, Witten
& Frank  Section 4.5.

Lecture slides: Witten & Frank.

Advanced reading: Approximate
Association Rule Mining.

Project 4  Association mining
(due 10/16/2003)

Data mining algorithms: Classification

Topics:

Basic learning/mining tasks

Inferring rudimentary rules: OneR algorithm

Decision trees

Covering rules

Experiments with Weka  ZeroR, OneR, J48, Prism.

Reading: Lecture notes  Chapter 7, Witten
& Frank  Sections 4.1, 4.3, 4.4.

Lecture slides: Witten & Frank

Project 5  Classification
(due 10/31/2003)

Data mining algorithms: Prediction

Evaluating what's been learned

Quiz 2

Mining real data

Clustering

Advanced techniques and applications Reading: Witten & Frank  Chapter
9.

Relational data mining: building relational concepts, using more elaborated
background knowledge.

Text mining: extracting attributes (keywords), structural approaches (parsing,
soft parsing).

Web Usage
Analysis and User Profiling  1999 SIGKDD
Conference

Semantic Web Mining

Semantic Web

Semantic Web Mining
Project 1  create a data cube
No available at this time.
Quiz 1
The quiz description appears here at due time.
Quiz 2
The quiz description appears here at due time.