Machine Learning - Spring 2004
==============================
Lab experiments 8
-----------------
Program: bayes.pl
Data: loandata.pl, loandat2.pl, animals.pl
------------------------------------------
I. Simple (Naive) Bayes algorithm: basic procedures
===================================================
cond_prob(E,Class,LP) - generates a list LP of conditional probabilities of
each attribute-value pair in E, given Class.
class_prob(Class,CP) - calculates the probability of Class
(proportion of examples in Class w.r.t. the whole data set).
probs(E,LP) - generates likelihoods of E belonging to each class.
mult(LP,P) - multiplies the probabilities in LP.
II. Simple (Naive) Bayes algorithm: experiments with the basic procedures
=========================================================================
?- ['c:/prolog/bayes.pl']. % load program
?- ['c:/prolog/loandata.pl']. % load data set
?- example(N,C,E),cond_prob(E,approve,LP),mult(LP,PP),class_prob(approve,CP),PN is PP*CP.
N = 1
C = approve
E = [emp=yes, buy=comp, sex=f, married=no]
LP = [ (emp=yes)/1, (buy=comp)/0.875, (sex=f)/0.5, (married=no)/0.5]
PP = 0.21875
CP = 0.666667
PN = 0.145833
Here is what this query does:
1. cond_prob(E,approve,LP) calculates the conditional probabilities of each attribute value
in example 1, given class "approve". This means the folowing:
P(emp=yes|approve) = 1
P(buy=comp|approve) = 0.875
P(sex=f|approve) = 0.5
P(married=no|approve) = 0.5
2. mult(LP,PP) computes PP = 1*0.875*0.5*0.5 = 0.21875
3. class_prob(approve,CP) computes the prior probabiltiy of approve, P(approve) = 0.666667
4. Finally, PN = 0.145833 is the likelihood of example 1 belonging to class approve. That is:
P(emp=yes,buy=comp,sex=f,married=no|approve) = 0.145833
Let's compute now the likelihood of the alternative - example 1 belonging to class reject.
?- example(N,C,E),cond_prob(E,reject,LP),mult(LP,PP),class_prob(reject,CP),PN is PP*CP.
N = 1
C = approve
E = [emp=yes, buy=comp, sex=f, married=no]
LP = [ (emp=yes)/0.25, (buy=comp)/0.5, (sex=f)/0.75, (married=no)/0.25]
PP = 0.0234375
CP = 0.333333
PN = 0.0078125
So, the likelihood of the alternative - example 1 belonging to class reject is 0.0078125,
which means that
P(emp=yes,buy=comp,sex=f,married=no|reject) = 0.0078125
Thus, because of its higher likelihood the predicted classification of example 1 is approve,
which is the same as its actual class.
We can get both likelihoods with one query:
?- example(N,C,E),probs(E,P).
N = 1
C = approve
E = [emp=yes, buy=comp, sex=f, married=no]
P = [approve/0.145833, reject/0.0078125]
If we want probabilities instead of likelihoods we can apply normalization:
?- example(N,C,E),probs(E,[C1/L1,C2/L2]),P1 is L1/(L1+L2),P2 is L2/(L1+L2).
N = 1
C = approve
E = [emp=yes, buy=comp, sex=f, married=no]
C1 = approve
L1 = 0.145833
C2 = reject
L2 = 0.0078125
P1 = 0.949153
P2 = 0.050847
Now P1 and P2 are probabilities (P1+P2=1).
We can also get directly the classification according to the Naive Bayes procedure:
?- example(N,C,E),bayes(E,Predicted).
N = 1
C = approve
E = [emp=yes, buy=comp, sex=f, married=no]
Predicted = approve
III. Simple (Naive) Bayes algorithm: LOO testing
================================================
Note that the following line should be included in the data file to
allow dynamic modification of the examples in the Prolog database.
:-dynamic(example/3).
The following query does the LOO-CV:
?- retract(example(N,Actual,E)),bayes(E,Predicted),assert(example(N,Actual,E)),write(N-Actual-Predicted),nl,fail.
1-approve-approve
2-reject-reject
3-approve-approve
4-approve-reject
5-reject-approve
6-approve-approve
7-approve-approve
8-approve-approve
9-approve-approve
10-approve-approve
11-reject-reject
12-reject-reject
For 10 examples the predicted and actual classes are the same. For examples 4 and 5
however the predicted and actual classes are different. Thus the LOO-CV error
is 2/12 = 0.167.
Interestingly, even without taking off the examples we get the same error.
?- example(N,Actual,E),bayes(E,Predicted),write(N-Actual-Predicted),nl,fail.
1-approve-approve
2-reject-reject
3-approve-approve
4-approve-reject
5-reject-approve
6-approve-approve
7-approve-approve
8-approve-approve
9-approve-approve
10-approve-approve
11-reject-reject
12-reject-reject
Here are some more details about the misclassifications 4 and 5:
?- example(4,C,E),probs(E,P).
C = approve
E = [emp=yes, buy=car, sex=f, married=yes]
P = [approve/0.0208333, reject/0.0234375]
Yes
?- example(5,C,E),probs(E,P).
C = reject
E = [emp=yes, buy=car, sex=f, married=no]
P = [approve/0.0208333, reject/0.0078125]
Obviously, these examples do not agree with the rest of the data
and an interesting question is how to fix them. Generally there are two solutions.
The first one (the easiest) would be to change their classifications as predicted
by Naive Bayes. The other one would be to change some of the attribute values.
In the latter case we would like to know which attributes would affect the
classification most. To investigate this we can try classifying
examples with single attribute values (i.e. examples with missing values).
Naive Bayes can handle easily this. For example:
?- probs([emp=yes],P).
P = [approve/0.666667, reject/0.0833333]
?- probs([sex=f],P).
P = [approve/0.333333, reject/0.25]
?- probs([married=yes],P).
P = [approve/0.25, reject/0.333333]
These results suggest that [emp=yes] is good feature to classify examples as approve.
But, [sex=f] and [married=yes] don't work well to discriminate the class - the likelihoods
of "approve" and "reject" are too close. Thus we may conclude that [emp=yes] is enough
to classify an example as approve (somethig we've seen from other experiments too),
however with [sex=f] or [married=yes] we need more evidence (attribute value pairs)
to make a decision.
Here is another interesting result:
?- probs([emp=no],P).
P = [approve/0, reject/0.25]
Can we conclude that [emp=no] is enough to decide for approve?
Why do we get the zero probability and can we use it for classification?
IV. Simple (Naive) Bayes algorithm: holdout testing
===================================================
There are two basic differences between Naive Baeys (NB) and the concept learning
algorithms (as id3, lgg or search):
1. NB does not create a hypothesis, it just predicts the class value.
2. To make the prediction NB need all examples.
The second difference affects the way holdout testing works with Naive Bayes.
The thing is that when testing we need the training examples too.
One solution that works for our Prolog data files is to change the names
of the example to be used for testing. This is done in the loandat2.pl data file.
The training set still uses the name "example", but the examples from the test set
are renamed to "test".
?- ['c:/prolog/loandat2.pl']. % load both training and test data sets
Let's see what we have in the Prolog database now:
?- listing(example).
example(1, approve, [emp=yes, buy=comp, sex=f, married=no]).
example(2, reject, [emp=no, buy=comp, sex=f, married=yes]).
example(3, approve, [emp=yes, buy=comp, sex=m, married=no]).
example(4, approve, [emp=yes, buy=car, sex=f, married=yes]).
example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]).
example(7, approve, [emp=yes, buy=comp, sex=f, married=no]).
example(8, approve, [emp=yes, buy=comp, sex=m, married=no]).
example(11, reject, [emp=no, buy=comp, sex=m, married=yes]).
?- listing(test).
test(5, reject, [emp=yes, buy=car, sex=f, married=no]).
test(9, approve, [emp=yes, buy=comp, sex=m, married=yes]).
test(10, approve, [emp=yes, buy=comp, sex=m, married=yes]).
test(12, reject, [emp=no, buy=car, sex=f, married=yes]).
So, our holdout split is:
- training set: [1,2,3,4,6,7,8,11]
- test set: [5,9,10,12]
Now we can start tesing directly, because we don't have a learning step here
(as it's in the concept learning setting).
?- test(N,C,E),bayes(E,Class),write(N-C-Class),nl,fail.
5-reject-approve
9-approve-approve
10-approve-approve
12-reject-reject
For example 5 the actual and predicted classes differ, so the holdout
error for this split is 1/4 = 0.25.
Note that what actually happens here is that the probabilties are computed
by using the examples stored as "example" (NB uses them through "bayes(E,Class)")
and the test examples are provided by accessing the "test" structures directly
from the query.
V. Simple (Naive) Bayes algorithm: LOO-CV testing with the animals data
=======================================================================
First include the dynamic declaration in you animals.pl file.
:-dynamic(example/3).
Then load the animals.pl data.
?- ['c:/prolog/animals.pl']. % load data set
?- retract(example(N,Actual,E)),bayes(E,Predicted),assert(example(N,Actual,E)),write(N-Actual-Predicted),nl,fail.
1-mammal-reptile
2-mammal-reptile
3-mammal-reptile
4-mammal-reptile
5-fish-reptile
6-reptile-reptile
7-reptile-reptile
8-bird-reptile
9-bird-reptile
10-amphibian-reptile
The situation here seems very different from what we've seen before.
The LOO-CV error is too high, 80%. Why is that?
We may get a clue if we try it without removing examples (as wee did it with the loandata).
?- example(N,Actual,E),bayes(E,Predicted),write(N-Actual-Predicted),nl,fail.
1-mammal-mammal
2-mammal-mammal
3-mammal-mammal
4-mammal-mammal
5-fish-fish
6-reptile-reptile
7-reptile-reptile
8-bird-bird
9-bird-bird
10-amphibian-amphibian
Now the situation changes - LOO-CV error is 0!
Look at the probabilities to find out why all this happens.