CS570 - Topics in AI: Machine Learning

Summer/2003 (May 27 - Jun 26)

Classes: MTWR 5:30 pm - 7:30 pm, Frank J. DiLoreto Hall 012
Instructor: Dr. Zdravko Markov, MS 203, (860)-832-2711, http://www.cs.ccsu.edu/~markov/, e-mail: markovz@ccsu.edu
Office hours: MTWR 7:30pm - 8:30pm, or by appointment

Description: One of the many definitions of Machine Learning (ML) is "Any change in a system that allows it to perform better the second time on  repetition of the same task or on another task drawn from the same population" (Simon, 1983). Practically this means developing computer programs that automatically improve their performance through experience. The course covers the basic concepts and techniques of Machine Learning from both theoretical and practical perspective. The material includes classical ML approaches as Version Spaces and Decision Trees, new approaches as Inductive Logic Programming and Minimum Description Length Principle (MLD) as well as "hot" topics as Knowledge Discovery and Data Mining. The students will be able to experiment with implementations of almost all algorithms discussed in class and will use a recent ML system to solve practical problems.

Prerequisites: CS 501 and CS 502, basic knowledge of algebra, discrete math and statistics.

Course Objectives

  • To introduce students to the basic concepts and techniques of Machine Learning.
  • To develop skills of using recent machine learning software for solving practical problems.
  • To gain experience of doing independent study and research.
  • Required Texts: Required software: Zprolog (updated on 11/7/2001) - a free DOS window-based Prolog interpreter.

    Assignments and projects: The students will use a set of ML programs written in Prolog and a simple Prolog interpreter to do experiments and to complete their assignments. There will be 5 projects requiring independent study and practical work with the ML programs.

    Grading and grading policy: The final grade will be based 90% on projects and 10% on participation in class discussions. The letter grades will be calculated according to the following table:
     
    A A- B+ B B- C+ C C- D+ D D- F
    95-100 90-94 87-89 84-86 80-83 77-79 74-76 70-73 67-69 64-66 60-63 0-59

    Late assignments will be marked one letter grade down for each 3 days they are late. It is expected that all students will conduct themselves in an honest manner and NEVER claim work which is not their own. Violating this policy will result in a substantial grade penalty or a final grade of F.

    WEB resources

    1. Journal of Machine Learning Research
    2. Journal of Machine Learning Gossip (ML humor)
    3. Machine Learning Database Repository at UC Irvine
    4. Machine Learning subject index maintained by the Knowledge Systems Laboratory, Canada
    5. MLnet Online Information Service
    6. ML Information Services maintained by the Austrian Research Institute for Artificial Intelligence (OFAI)
    7. David Aha's list of machine learning resources
    8. Avrim Blum's Machine Learning Page
    9. The AutoClass Project
    10. UCI - Machine Learning information, software and databases
    11. UTCS Machine Learning Research Group
    12. Microsoft Bayesian Network Editor (MSBNx)
    13. Machine Learning Journal Online
    14. Weka 3 -- Machine Learning Software in Java
    15. Journal of AI Research
    16. Journal of Data Mining and Knowledge Discovery
    17. C5/See5
    18. MLC++, A Machine Learning Library in C++
    19. Web->KB project
    20. KD Nuggets Directory
    21. KDNet
    Tentative schedule of classes and assignments
    1. Introduction
    2. Inductive learning
    3. Languages for learning
    4. Project 1: Representing examples and hypotheses. Due date: 06/2/2003
    5. Version space learning
    6. Induction of Decision Trees
    7. Covering strategies
    8. Searching the generalization/specialization graph
    9. Project 2: Basic concept learning methods. Due date: 6/10/2003
    10. Inductive Logic Programming
    11. Evaluating hypotheses
    12. Project 3: Evaluating hypotheses. Due date: 6/16/2003
    13. Bayesian learning
    14. Bayesian Belief Networks
    15. Instance-based learning
    16. Project 4: Prediction. Due date: 6/23/2003
    17. Unsupervised Learning
    18. Project 5: ClusteringDue date: 6/26/2003
    19. Analytical (Explanation-based) learning
    20. Artificial Neural Networks. Reading: Mitchell - Chapter 4. Lecture slides: Mitchell - Chapter 4.
    21. Genetic Algorithms. Reading: Mitchell - Chapter 9. Lecture slides: Mitchell - Chapter 9.
    22. Reinforcement Learning. Reading: Mitchell - Chapter 13. Lecture slides: Mitchell - Chapter 13.
    23. Knowledge Discovery and Data Mining:

    Inferring rudimentary rules

    1. OneR: learns a one-level decision tree, i.e. generates a set of rules that test one particular attribute. Basic version (assuming nominal attributes):
    2. Example: evaluating the weather attributes
     
    outlook temperature humidity windy play
    sunny hot high false no
    sunny hot high true no
    overcast hot high false yes
    rainy mild high false yes
    rainy cool normal false yes
    rainy cool normal true no
    overcast cool normal true yes
    sunny mild high false no
    sunny cool normal false yes
    rainy mild normal false yes
    sunny mild normal true yes
    overcast mild high true yes
    overcast hot normal false yes
    rainy mild high true no

     
    Attribute Rules Errors Total errors
    outlook sunny -> no 
    overcast -> yes 
    rainy -> yes
     2/5 
     0/4 
     2/5
     4/14
    temperature hot -> no 
    mild -> yes 
    cool -> yes
     2/4 
     2/6 
     1/4
     5/14
    humidity high -> no 
    normal -> yes
     3/7 
     1/7
     4/14
    windy false -> yes 
    true -> no
     2/8 
     3/5
     5/14
    1. Dealing with numeric attributes
    2. Discussion of OneR


    Covering algorithms

    1. General strategy: for each class find rule set that covers all instances in it (excluding instances not in the class). This approach is called a covering approach because at each stage a rule is identified that covers some of the instances.
    2. General to specific rule induction (PRISM algorithm):
    3. Example: covering class "play=yes" in weather data.
    4. Specific to general rule induction:
    5. Problems: rule overlapping, rule subsumption.

    Last updated: 6-5-2003