Machine Learning - Spring 2004 ============================== Lab experiments 8 ----------------- Program: bayes.pl Data: loandata.pl, loandat2.pl, animals.pl ------------------------------------------ I. Simple (Naive) Bayes algorithm: basic procedures =================================================== cond_prob(E,Class,LP) - generates a list LP of conditional probabilities of each attribute-value pair in E, given Class. class_prob(Class,CP) - calculates the probability of Class (proportion of examples in Class w.r.t. the whole data set). probs(E,LP) - generates likelihoods of E belonging to each class. mult(LP,P) - multiplies the probabilities in LP. II. Simple (Naive) Bayes algorithm: experiments with the basic procedures ========================================================================= ?- ['c:/prolog/bayes.pl']. % load program ?- ['c:/prolog/loandata.pl']. % load data set ?- example(N,C,E),cond_prob(E,approve,LP),mult(LP,PP),class_prob(approve,CP),PN is PP*CP. N = 1 C = approve E = [emp=yes, buy=comp, sex=f, married=no] LP = [ (emp=yes)/1, (buy=comp)/0.875, (sex=f)/0.5, (married=no)/0.5] PP = 0.21875 CP = 0.666667 PN = 0.145833 Here is what this query does: 1. cond_prob(E,approve,LP) calculates the conditional probabilities of each attribute value in example 1, given class "approve". This means the folowing: P(emp=yes|approve) = 1 P(buy=comp|approve) = 0.875 P(sex=f|approve) = 0.5 P(married=no|approve) = 0.5 2. mult(LP,PP) computes PP = 1*0.875*0.5*0.5 = 0.21875 3. class_prob(approve,CP) computes the prior probabiltiy of approve, P(approve) = 0.666667 4. Finally, PN = 0.145833 is the likelihood of example 1 belonging to class approve. That is: P(emp=yes,buy=comp,sex=f,married=no|approve) = 0.145833 Let's compute now the likelihood of the alternative - example 1 belonging to class reject. ?- example(N,C,E),cond_prob(E,reject,LP),mult(LP,PP),class_prob(reject,CP),PN is PP*CP. N = 1 C = approve E = [emp=yes, buy=comp, sex=f, married=no] LP = [ (emp=yes)/0.25, (buy=comp)/0.5, (sex=f)/0.75, (married=no)/0.25] PP = 0.0234375 CP = 0.333333 PN = 0.0078125 So, the likelihood of the alternative - example 1 belonging to class reject is 0.0078125, which means that P(emp=yes,buy=comp,sex=f,married=no|reject) = 0.0078125 Thus, because of its higher likelihood the predicted classification of example 1 is approve, which is the same as its actual class. We can get both likelihoods with one query: ?- example(N,C,E),probs(E,P). N = 1 C = approve E = [emp=yes, buy=comp, sex=f, married=no] P = [approve/0.145833, reject/0.0078125] If we want probabilities instead of likelihoods we can apply normalization: ?- example(N,C,E),probs(E,[C1/L1,C2/L2]),P1 is L1/(L1+L2),P2 is L2/(L1+L2). N = 1 C = approve E = [emp=yes, buy=comp, sex=f, married=no] C1 = approve L1 = 0.145833 C2 = reject L2 = 0.0078125 P1 = 0.949153 P2 = 0.050847 Now P1 and P2 are probabilities (P1+P2=1). We can also get directly the classification according to the Naive Bayes procedure: ?- example(N,C,E),bayes(E,Predicted). N = 1 C = approve E = [emp=yes, buy=comp, sex=f, married=no] Predicted = approve III. Simple (Naive) Bayes algorithm: LOO testing ================================================ Note that the following line should be included in the data file to allow dynamic modification of the examples in the Prolog database. :-dynamic(example/3). The following query does the LOO-CV: ?- retract(example(N,Actual,E)),bayes(E,Predicted),assert(example(N,Actual,E)),write(N-Actual-Predicted),nl,fail. 1-approve-approve 2-reject-reject 3-approve-approve 4-approve-reject 5-reject-approve 6-approve-approve 7-approve-approve 8-approve-approve 9-approve-approve 10-approve-approve 11-reject-reject 12-reject-reject For 10 examples the predicted and actual classes are the same. For examples 4 and 5 however the predicted and actual classes are different. Thus the LOO-CV error is 2/12 = 0.167. Interestingly, even without taking off the examples we get the same error. ?- example(N,Actual,E),bayes(E,Predicted),write(N-Actual-Predicted),nl,fail. 1-approve-approve 2-reject-reject 3-approve-approve 4-approve-reject 5-reject-approve 6-approve-approve 7-approve-approve 8-approve-approve 9-approve-approve 10-approve-approve 11-reject-reject 12-reject-reject Here are some more details about the misclassifications 4 and 5: ?- example(4,C,E),probs(E,P). C = approve E = [emp=yes, buy=car, sex=f, married=yes] P = [approve/0.0208333, reject/0.0234375] Yes ?- example(5,C,E),probs(E,P). C = reject E = [emp=yes, buy=car, sex=f, married=no] P = [approve/0.0208333, reject/0.0078125] Obviously, these examples do not agree with the rest of the data and an interesting question is how to fix them. Generally there are two solutions. The first one (the easiest) would be to change their classifications as predicted by Naive Bayes. The other one would be to change some of the attribute values. In the latter case we would like to know which attributes would affect the classification most. To investigate this we can try classifying examples with single attribute values (i.e. examples with missing values). Naive Bayes can handle easily this. For example: ?- probs([emp=yes],P). P = [approve/0.666667, reject/0.0833333] ?- probs([sex=f],P). P = [approve/0.333333, reject/0.25] ?- probs([married=yes],P). P = [approve/0.25, reject/0.333333] These results suggest that [emp=yes] is good feature to classify examples as approve. But, [sex=f] and [married=yes] don't work well to discriminate the class - the likelihoods of "approve" and "reject" are too close. Thus we may conclude that [emp=yes] is enough to classify an example as approve (somethig we've seen from other experiments too), however with [sex=f] or [married=yes] we need more evidence (attribute value pairs) to make a decision. Here is another interesting result: ?- probs([emp=no],P). P = [approve/0, reject/0.25] Can we conclude that [emp=no] is enough to decide for approve? Why do we get the zero probability and can we use it for classification? IV. Simple (Naive) Bayes algorithm: holdout testing =================================================== There are two basic differences between Naive Baeys (NB) and the concept learning algorithms (as id3, lgg or search): 1. NB does not create a hypothesis, it just predicts the class value. 2. To make the prediction NB need all examples. The second difference affects the way holdout testing works with Naive Bayes. The thing is that when testing we need the training examples too. One solution that works for our Prolog data files is to change the names of the example to be used for testing. This is done in the loandat2.pl data file. The training set still uses the name "example", but the examples from the test set are renamed to "test". ?- ['c:/prolog/loandat2.pl']. % load both training and test data sets Let's see what we have in the Prolog database now: ?- listing(example). example(1, approve, [emp=yes, buy=comp, sex=f, married=no]). example(2, reject, [emp=no, buy=comp, sex=f, married=yes]). example(3, approve, [emp=yes, buy=comp, sex=m, married=no]). example(4, approve, [emp=yes, buy=car, sex=f, married=yes]). example(6, approve, [emp=yes, buy=comp, sex=f, married=yes]). example(7, approve, [emp=yes, buy=comp, sex=f, married=no]). example(8, approve, [emp=yes, buy=comp, sex=m, married=no]). example(11, reject, [emp=no, buy=comp, sex=m, married=yes]). ?- listing(test). test(5, reject, [emp=yes, buy=car, sex=f, married=no]). test(9, approve, [emp=yes, buy=comp, sex=m, married=yes]). test(10, approve, [emp=yes, buy=comp, sex=m, married=yes]). test(12, reject, [emp=no, buy=car, sex=f, married=yes]). So, our holdout split is: - training set: [1,2,3,4,6,7,8,11] - test set: [5,9,10,12] Now we can start tesing directly, because we don't have a learning step here (as it's in the concept learning setting). ?- test(N,C,E),bayes(E,Class),write(N-C-Class),nl,fail. 5-reject-approve 9-approve-approve 10-approve-approve 12-reject-reject For example 5 the actual and predicted classes differ, so the holdout error for this split is 1/4 = 0.25. Note that what actually happens here is that the probabilties are computed by using the examples stored as "example" (NB uses them through "bayes(E,Class)") and the test examples are provided by accessing the "test" structures directly from the query. V. Simple (Naive) Bayes algorithm: LOO-CV testing with the animals data ======================================================================= First include the dynamic declaration in you animals.pl file. :-dynamic(example/3). Then load the animals.pl data. ?- ['c:/prolog/animals.pl']. % load data set ?- retract(example(N,Actual,E)),bayes(E,Predicted),assert(example(N,Actual,E)),write(N-Actual-Predicted),nl,fail. 1-mammal-reptile 2-mammal-reptile 3-mammal-reptile 4-mammal-reptile 5-fish-reptile 6-reptile-reptile 7-reptile-reptile 8-bird-reptile 9-bird-reptile 10-amphibian-reptile The situation here seems very different from what we've seen before. The LOO-CV error is too high, 80%. Why is that? We may get a clue if we try it without removing examples (as wee did it with the loandata). ?- example(N,Actual,E),bayes(E,Predicted),write(N-Actual-Predicted),nl,fail. 1-mammal-mammal 2-mammal-mammal 3-mammal-mammal 4-mammal-mammal 5-fish-fish 6-reptile-reptile 7-reptile-reptile 8-bird-bird 9-bird-bird 10-amphibian-amphibian Now the situation changes - LOO-CV error is 0! Look at the probabilities to find out why all this happens.