Data mining algorithms: Classification

Basic learning/mining tasks

Supervised learning

  1. Learning from examples, concept learning
  2. Instance-based (lazy) learning: predict class label of new examples using training data directly (without creating an explicit model, e.g. rules). Popular approaches:
  3. Regression: learning functions or predicting numeric class value

Unsupervised learning

No class label, finding common patterns, grouping similar examples

Inferring rudimentary rules

  1. OneR: learns a one-level decision tree, i.e. generates a set of rules that test one particular attribute. Basic version (assuming nominal attributes):
  2. Example: evaluating the weather attributes
 
outlook temperature humidity windy play
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
overcast cool normal true yes
sunny mild high false no
sunny cool normal false yes
rainy mild normal false yes
sunny mild normal true yes
overcast mild high true yes
overcast hot normal false yes
rainy mild high true no

 
Attribute Rules Errors Total errors
outlook sunny -> no 
overcast -> yes 
rainy -> yes
 2/5 
 0/4 
 2/5
 4/14
temperature hot -> no 
mild -> yes 
cool -> yes
 2/4 
 2/6 
 1/4
 5/14
humidity high -> no 
normal -> yes
 3/7 
 1/7
 4/14
windy false -> yes 
true -> no
 2/8 
 3/5
 5/14
  1. Dealing with numeric attributes
  2. Discussion of OneR

Decision tree learning

  1. Type of learning: supervised, concept learning, divide-and-conquer strategy.
  2. Strategies for concept learning:
  3. Top-down induction of decision trees (TDIDT, old approach know from pattern recognition):
  4. A criterion for attribute selection
  5. Highly-branching attributes (with large number of values)
  6. The gain ratio: a modification of the information gain that reduces its bias.
  7. Decision tree pruning: avoiding overfitting (overspecialization) and fragmentation.
  8. Generating rules from decision trees
  9. Discussion
    1. Basic ideas of TDIDT are developed in 60's (CLS, 1966).
    2. Algorithm for top-down induction of decision trees using information gain for attribute selection (ID3) was developed by Ross Quinlan (1981).
    3. Gain ratio and other modifications and improvements led to development of C4.5, which can deal with numeric attributes, missing values, and noisy data, and also can extract rules from the tree (one of the best concept learners).
    4. There are many other attribute selection criteria (but almost no difference in accuracy of result.)

Covering algorithms

  1. General strategy: for each class find rule set that covers all instances in it (excluding instances not in the class). This approach is called a covering approach because at each stage a rule is identified that covers some of the instances.
  2. General to specific rule induction (PRISM algorithm):
  3. Example: covering class "play=yes" in weather data.
  4. Specific to general rule induction:
  5. Problems: rule overlapping, rule subsumption.