CS 580 - Data Mining


Classes: TR, 6:45 pm - 8:00 pm, Maria Sanford Hall 210
Instructor: Dr. Zdravko Markov, MS 203, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz@ccsu.edu
Office hours:  TR 8:00pm-9:00pm, MW 5:00pm-6:45pm, or by appointment

Description: Data Mining studies algorithms and computational paradigms that allow computers to find patterns and regularities in databases, perform prediction and forecasting, and generally improve their performance through interaction with data. It is currently regarded as the key element of a more general process called Knowledge Discovery that deals with extracting useful knowledge from raw data. The knowledge discovery process includes data selection, cleaning, coding, using different statistical, pattern recognition and machine learning techniques, and reporting and visualization of the generated structures. The course will cover all these issues and will illustrate the whole process by examples of practical applications. Important related technologies as Data Warehousing and  On-line Analytical Processing (OLAP) will be also discussed. The students will use recent Data Mining software.

Prerequisites: CS 501 and CS 502, basic knowledge of algebra, discrete math and statistics.

Course Objectives

  • To introduce students to the basic concepts and techniques of Data Mining.
  • To develop skills of using recent data mining software for solving practical problems.
  • To gain experience of doing independent study and research.
  • Required text: Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 1999, ISBN 1-55860-552-5. Chapter 8 (Nuts And Bolts: Machine Learning Algorithms In Java)  is available online from the Weka 3 homepage.

    Required software: Weka 3 Data Mining System - a free Machine Learning Software in Java.

    Assignments, projects and grading: There will be 8 projects requiring independent study and practical work with a data mining system for solving data mining tasks and 2 quizzes. The final grade will be based 80% on projects and 20% on quizzes. The letter grades will be calculated according to the following table:
    A A- B+ B B- C+ C C- D+ D D- F
    95-100 90-94 87-89 84-86 80-83 77-79 74-76 70-73 67-69 64-66 60-63 0-59

    WEB resources

    1. Data Mining
    2. Machine Learning
    Tentative Schedule (don't print everything, keep track of the updates)
    1. Introduction
    2. Data Warehouse and OLAP
    3. Data preprocessing
    4. Data mining knowledge representation
    5. Attribute-oriented analysis
    6. Quiz 1
    7. Data mining algorithms: Association rules
    8. Data mining algorithms: Classification
    9. Data mining algorithms: Prediction
    10. Evaluating what's been learned
    11. Quiz 2
    12. Mining real data
    13. Clustering
    14. Advanced techniques and applications Reading: Witten & Frank - Chapter 9.

    Project 1 - create a data cube

    No available at this time.

    Quiz 1

    The quiz description appears here at due time.

    Quiz 2

    The quiz description appears here at due time.