CS462 - Artificial Intelligence
Spring-2005
Classes: TR 5:15 pm - 6:30 pm, RVAC 107
Instructor: Dr. Zdravko Markov, MS 203, (860)-832-2711,
http://www.cs.ccsu.edu/~markov/,
e-mail: markovz@ccsu.edu
Office hours: MW: 6:45 pm - 7:45 pm, TR: 10:00 am - 12:00 pm,
or by appointment
Catalog Description
Artificial Intelligence ~ Spring. ~ [c] ~ Prereq.: CS 253 or (for
graduates)
CS 501. ~ Presentation of artificial intelligence as a coherent body of
ideas and methods to acquaint the student with the classic programs in
the field and their underlying theory. Students will explore this
through
problem-solving paradigms, logic and theorem proving, language and
image
understanding, search and control methods, and learning.
Course Goals
- Introducing students to the basic concepts and techniques of
Artificial
Intelligence.
- Learning AI by doing it, i.e. developing skills of using
AI
algorithms
for solving practical problems.
- To gain experience of doing independent study and research.
Definitions of AI
- Short
- Russell Beale, University of Birmingham, UK: AI can be defined
as the
attempt
to get real machines to behave like the ones in the movies.
- John
McCarthy, Stanford University: It is the science and engineering of
making intelligent machines, especially intelligent computer programs.
It is related to the similar task of using computers to understand
human
intelligence, but AI does not have to confine itself to methods that
are
biologically observable.
- Ron Brachman, AAAI: There's a lot to try to
understand about
the kind of reasoning and learning that people do that's very different
than sort of classical, von Neumann, step-by-step computer algorithms.
- Long (Aaron
Sloman, University of Birmingham, UK): Physics and chemistry study
matter, energy, forces, and the various ways they can be combined and
transformed.
Biology, geology, medicine, and many other sciences and engineering
disciplines
build on this by studying more and more complex systems built from
physical
components. All this research requires an understanding of naturally
occurring
and artificial machines which operate on forces, energy of various
kinds,
and transform or rearrange matter. But some of the machines, both
natural
and artificial, also manipulate knowledge. It is now clear that a new
kind
of science is required for the study of the principles by which
- knowledge is acquired and used,
- goals are generated and achieved,
- information is communicated,
- collaboration is achieved,
- concepts are formed,
- languages are developed.
We could call it the science of knowledge or the science of
intelligence.
Required Text
Recommended Text
Required software
- SWI-Prolog. Use the
stable
versions
and the self-installing executable for Windows 95/98/ME/NT/2000/XP. For
this course you need only the basic components, so you may uncheck all
optional components.
- Prolog Tutorials
General Web Resources
Grading and grading policy
The final grade will be based on:
- Semester project (30%).
- Midterm and final exam (20%).
- Programming assignments (40%)
- Paper (10%)
More information about semester project is available here.
Descriptions of the exams and the programming assignments are included
in the "Tentative schedule of classes and assignments". The paper is
described
here.
The letter grades will be calculated according to the following
table:
A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
95-100 |
90-94 |
87-89 |
84-86 |
80-83 |
77-79 |
74-76 |
70-73 |
67-69 |
64-66 |
60-63 |
0-59 |
Late assignments will be marked one letter grade down for each two
classes
they are late. It is expected that all students will conduct themselves
in an honest manner and NEVER claim work which is not their own.
Violating
this policy will result in a substantial grade penalty or a final grade
of F.
Tentative schedule of classes and assignments (will be updated
regularly)
- Introduction - Intelligent Agents
- Problem Solving by Search, Introduction to Prolog
- Lecture notes/slides: Problem
Solving by Search
- Reading: AIMA - Chapter 3
- Additional reading: Quick
Introduction to Prolog
- Programs: path.pl,
kinship.pl,
farmer.pl,
search1.pl,
graph.pl,
map.pl,
8-puzzle.pl
- Lab experiments:
- Search in Prolog: use path.pl
and try all versions of path with the modified graph too (add/remove
arc(4,1)
in two different places. Explain the results.
- Prolog as Search: use kinship.pl
and do the exercises described in it.
- Solve the the farmer, wolf, cabbage and goat problem by using
Prolog
search: farmer.pl
- Use search1.pl
to find paths from node 1 to node 9 in graph.pl
(explain the results from the following queries):
- ?- depth_first([[1]],9,P,N).
- ?- breadth_first([[1]],9,P,N)
- ?- iterative_deepening([[1]],9,P)
- ?- uni_cost([[1]],9,P,N)
- ?- uni_cost([[1]],9,P,N),reverse_path_cost(P,C). or ?-
uni_cost([[1]],9,P,N),reverse(P,P1),path_cost(P1,C).
- Add writeln(NewQueue) after append (in depth_first and
breadth_first)
and
after sort_queue (in uni_cost) to see all step of the search process.
- Evaluate search algorithms:
- Complexity: time (number of explored nodes N) and space
(max length of
the queue).
- Completeness: compare ?- depth_bound(3,[[1]],9,P). and ?-
depth_bound(4,[[1]],9,P).
- Optimality: lowest path cost
- Search the Web: use the WebSPHINX:
A Personal, Customizable Web Crawler to search the web in
depth-first
and breadth-first fashion (run the Web Crawler here)
- Solving the 8-puzzle: board(2,3,5,0,1,4,6,7,8) =>
board(0,1,2,3,4,5,6,7,8)
- Compare breadth_first, iterative_deepening and depth_first.
- Explain why depth_first fails.
- How to find a board state that can be solved.
- Exercises:
- Find the shortest path (by number of towns passed and by
distance) from
Arad to Bucharest (use map.pl)
- Solve the farmer, wolf, cabbage and goat problem by using search1.pl
- Implement bubble sort by state space search
- Heuristic (Informed) Search
- Lecture notes/slides: Heuristic
Search
- Reading: AIMA - Chapter 4
- Programs: search2.pl,
map.pl,
8-puzzle.pl
- Lab experiments:
- Solving the 8-puzzle: board(2,3,5,0,1,4,6,7,8) =>
board(0,1,2,3,4,5,6,7,8)
- Use best_first, a_star and beam search. For beam search try
n=100,10,1.
What changes? Explain why it fails with n=1?
- Compare results with depth_first, breadth_first and
iterative deepening.
- Print the queue and see how memory complexity changes.
- Finding paths from Arad
to Bucharest. Compare cost and size of paths with informed and
uninformed
search.
- Exercises:
- Define a non-admissible heuristic function for graph.pl
and compare the results from best_first, a_star and uni_cost algorithms.
- Find admissible heuristic functions and implement bubble sort
by
heuristic
search.
- Constraint Satisfaction
- Lecture notes/slides: Constraint
Satisfaction
- Reading: AIMA - Chapter 5
- Programs: backtrack.pl,
csp.pl,
4-queens.pl
- Lab experiments: solving the map coloring problem by
backtracking and
constraint
propagation (using heuristics)
- Exercises:
- Sovle the coloring problem by csp.pl
- Represent the list sorting problem in CSP terms and solve it
by csp.pl
- Assignment
#1: Problem Solving by Searching. Due on February 24.
- Games
- Lecture notes/slides: Searching
Game Trees
- Reading: AIMA - Chapter 6
- Additional reading:
- Programs: minimax.pl,
gametree.pl
- Lab experiments and exercises:
- The game of tic-tac-toe (xo.pl)
using minimax.pl
- Find an evaluation function for the game of tic-tac-toe and
implement a
depth-bounded search.
- Knowledge-Based Agents - Propositional and First-Order Logic
- Inference in First-Order Logic, Logic Programming and Prolog
- Lecture notes/slides: inference.pdf
- Reading: AIMA - Chapter 9.
- Programs: clausify.pl,
resolve.pl,
logic.pl,
wumpus.pl,
agents.pl
- Lab experiments/exercises: logic.pl
(read the comment inside)
- Finite/infinite models: ?-
resolve([cl([p(f(X))],[p(X)]),cl([p(a)],[])]).
- Clause subsumption
- Clausal resolution and SLD (Prolog) resolution.
- Incompleteness of Prolog:
?- p(a,c).
p(a,b).
p(c,b).
p(X,Y) :- p(X,Z),p(Z,Y).
p(X,Y) :- p(Y,X). - Completness of SLD resolution: draw the
refutation tree of P.
- Knowledge Representation
- Reading: AIMA - Chapter 10
- Programs: es.pl,
cars.pl
- Lab experiments/exercises: car diagnostic with a MYCIN-like
expert
system
shell.
- Knowledge representation for the Web: Semantic Web
- Assignment #2: Reasoning with
Propositional
and
First-Order Logic. Due on April 5.
- Planning
- Lecture notes/slides: planning.pdf
- Reading: AIMA - Chapter 11
- Additional reading: Shakey
the Robot
- Programs: planner.pl,
scalc.pl
- Lab experiments/exercises:
- Comparing deductive (scalc.pl) and STRIPS planning
(planner.pl):
efficiency
and optimality.
- Adding constraints: e.g. put a smaller block on a bigger one.
- Solving Hanoi towers by planning.
- Uncertainty and Probabilistic Reasoning
- Machine Learning - Basic Concepts, Version Space, Decision Trees
- Illustrative example/lab project: Web/text
document classification (files: textmine.pl,
webdata.zip,
artsci.pl)
- Lecture notes/slides:
- Reading: AIMA - Chapters 18, 19
- Additional Reading: Chris
Thornton, Truth from Trash:How Learning Makes Sense, MIT Press, 2000
- Programs: covering.pl,
vs.pl,
id3.pl
- Data: taxonomy.pl,
shapes.pl,
tennis.pl,
animals.pl,
loandata.pl
- Lab experiments:
- Machine Learning - Numeric Approaches, Clustering, Evaluation
- Learning with Background Knowledge - Explanation-Based Learning,
Inductive
Logic Programming
- Lecture notes/slides:
- Reading: AIMA - Chapter 19
- Programs: ebl.pl,
ilp.pl,
foil.pl
- Lab experiments:
- Exercises:
- Assignment #3: Probabilistic Reasoning
and Learning
- Natural Language Processing
- Lecture notes/slides:
- Reading: AIMA - Chapter 22
- Programs: grammar.pl,
qa.pl
- Lab experiments:
- Exercises:
- Other Topics and Philosophical Foundations
- Lecture notes/slides:
- Reading: AIMA - Chapter 26
Semester Project
The semester project is based on a suite of projects developed in the
framework
of a grant from the National Science
Foundation
(NSF CCLI-A&I Award Number 0409497), "Machine
Learning Laboratory Experiences for Introducing Undergraduates to
Artificial
Intelligence". The student projects done for this course will be an
important step in the evaluation of the NSF grant.
To do the semester projects students have to form teams of 3 people
(2-people teams should consult the instructor first). Each team chooses
one project to work on. The projects to choose from are the following:
- Web Document Classification Project
- Intelligent Web Browser Project
- Character Recognition and Learning with Neural Networks
- Clue Deduction: an introduction to satisfiability reasoning
- Solving the N-Puzzle Problem
The descriptions of all four projects are available from the WebCT
course
shell (Campus Pipeline/My Courses).
To complete the project students are required to:
- Choose a project and submit the project title and the
names of
the
students on the project team. Do this no later than
February
15. Note that there is a restriction that no more than
two
teams can work on the same project. Projects will be assigned
on
a first come first serve basis.
- Write an initial project description based on the general
description
provided
in the WebCT course. This must include the following (may be discussed
with the instructor before submission):
- A brief introduction to the area of the project
- Specific goals (must include all deliverables for projects 1-3
and for
project 4 - a few of the logic exercises of Section 7 and the
ClueReasoner
described in Section 8)
- Approaches and algorithms to be used
- Resources to be used (data, programming tools or applications).
- A plan how to achieve the goals and evaluate the project
results.
- Work distribution among the students on the team and a
timetable.
- Submit reports on:
- the initial project description (item 2) by February 29.
- the progress they made by midterm (due during the midterm week)
- the results they achieved upon project completion (due during
the final
week). See the requirements for this report below.
- Make a presentation of the final report (during the final week).
Requirements
for the final project report:
The final report must include the following:
- General introduction to the area
- Description of the problems addressed (experimented with or
solved)
- Descriptions of the approaches and algorithms used to solve the
problems
- Descriptions of the software applications used or the programs
implemented
- Description of the experiments done for each problem attempted or
solved
- Comment on the relation of the approaches used in the project to
the
areas
of machine learning (ML), search and knowledge representation and
reasoning
(KR&R). In particular, the following questions should be answered:
- Which ML techniques have been used in the project?
- If no ML has been used explicitly, what is the relation of the
approaches
used in the project to ML? (any ML components used or project
approaches
applicable in the area of ML).
- Which search techniques have been used in the project?
- If no search has been used explicitly, what is the relation of
the
approaches
used in the project to the area of search? (any search components used
or project approaches applicable in the area of search).
- Which KR&R techniques have been used in the project?
- If no KR&R has been used explicitly, what is the relation
of the
approaches
used in the project to the area of KR&R? (any KR&R components
used
or project approaches applicable in the area of KR&R).
Grading:
The project grading will be based on the project results
and on the completeness
(see
the requirements), comprehensibility
and quality of presentation
of the project reports. All students on one project team will get the
same
grade.
Paper/Presentation
Write a paper (no more
than 5
pages) describing an
AI
area, topic or application not covered in class. Submit the paper by
e-mail
through the WebCT course template by
May 19.
Format of the term paper
Write the paper following the general format of a scientific paper:
title, author (individual authors only, no teamwork), abstract,
introduction (short description of the subject, goals, structure of the
paper), main text (structured into sections and covering the AI
area
or topic), conclusion and references. Include all materials you use
(including
WEB resources) in the reference section and cite them properly in the
text.
You may include illustrative material (graphs, charts, diagrams,
pictures).
Type and print the paper using a word processing system (such as
Microsoft
Word). An example of a scientific paper can be found here.
Policy on copying
The paper must be original, i.e. entirely written by yourself.
Since the paper is a survey of an existing area or topic obviously you
have to use other materials (books, reference materials, web sources
etc.).
Every time you use such materials you have to include a citation and
put
the title and authors of this material in the reference section. Then
you
can rephrase or even copy a part of the material (provided that you put
it in quotes). If you describe general ideas and approaches you have
also
to cite their authors. Papers that include copied paragraphs and
phrases
without acknowledgment to their authors will get a grade of F. For
more information see the CCSU
Policy on Academic Misconduct.
Term paper grading
The term paper grade will be based on the paper format (according
to the above mentioned structure), comprehensibility and
completeness
of
the material (including all major issues within the chosen area or
topic).
Extra credit
An extra credit of maximum 5
points
will be given for the preparation of presentation
slides (e.g. in PowerPoint format) and presenting
the paper in class. The presentation will last 10-15 minutes and
should be arranged with the instructor ahead of time. The presentations
should be done before the final
week.
Introduction to AI, Intelligent
Agents
1. Goals
- Creating intelligent (thinking) systems (machines, robots etc.)
in
order
to:
- Model and study the natural (human) intelligence. Philosophy
and
psychology
also study the natural (human) intelligence, but their goals do not
include
creating intelligent systems.
- Solving problems that require intelligence.
- Questions: What problems require intelligence? What is
intelligence?
2. Definitions of AI
- Making computers "think", creating machines with "brains"
(Haugeland,
85)
- Studying psychology by computational models (Charniak &
McDermott,
85)
- Studying computational models that can reason and act rationally
(Winston,
90)
- Building machine to perform functions which require intelligence
when
performed
by humans (Kurzweil, 90)
- How to make computers do things which we (humans) still do better
(Rich
& Knight, 91)
- A branch of Computer Science, which deals with modeling
intelligent
behavior
(Luger & Stubblefield, 93)
- Building intelligent agents (Russell & Norvig, 95)
3. Approaches
- Models of human reasoning - cognitive modeling (cognitive
science), GPS
(Newell & Simon, 1972).
- Models of human behavior - The
Turing Test (1950), Loebner
Prize. Natural language processing, knowledge representation,
reasoning
and learning.
- Models of rational thought (logical approach). Aristotle's
syllogistic
logic: "Socrates is a human, humans are mortal, thus Socrates is
mortal".
Requires 100% accurate knowledge.
- Models of rational behavior - rational agents. Includes both (2)
and
(3),
but is more general.
4. Related Areas
- Philosophy: AI history starts with Aristotle's invention of
syllogistic
logic, the first formal deductive reasoning system.
- Mathematics: Logic, algorithms, satisfiability, resolution.
- Psychology: experimental psychology, cognitive science.
- Computer Science: using Von Neumann architectures to model
non-Von
Neumann
computation.
- Linguistics: Syntactic structures, formal grammars (Chomsky,
1957),
natural
language translation ("The spirit is willing but the flesh is weak" -
"The
vodka is good, but the meat is rotten").
- Brain Science: Neural networks
- Biology, Artificial Life, Evolutionary computation
5. History of AI
6. Intelligent Agents
- Agent = Architecture + Program
- Types of agents
- Reflex agent (stateless, condition-action rules)
- Model-based reflex agent (memory, knowledge representation)
- Goal-based agents (planning)
- Utility-based agents (decision making, uncertainty)
- Learning agents (machine learning)
- The role of the environment
- Web agents
7. References
- Haugeland, J. (editor). Artificial Intelligence: The Very Idea},
MIT
Press,
Cambridge, Massachusetts, 1985.
- Charniak, E. and D. McDermott. Introduction to Artificial
Intelligence,
Addison-Wesley, Reading, Massachusetts, 1985.
- Winston, P.H. Artificial Intelligence, Addison-Wesley, Reading,
Massachusetts,
third edition, 1992.
- Kurzweil, R. The Age of Intelligent Machines, MIT Press,
Cambridge,
Massachusetts,
1990.
- Rich, E. and K. Knight. Artificial Intelligence, McGraw-Hill, New
York,
Second edition, 1991.
- Luger, G.F and W. A. Stubblefield. Artificial Intelligence:
Structures
and Strategies for Complex Problem Solving, Benjamin/Cummings, Redwood
City, California, 2/E, 1993.
- Russell, S and P. Norvig. Artificial Intelligence: A Modern
Approach,
Prentice
Hall, Upper Saddle River, New Jersey, 1995.
Problem Solving by Search
1. Problem solving agents
The agent's "world" includes the agent itself and the environment it
interacts
with. To reach its goal the agent should perform a sequence of actions
so that the world reaches the goal state. For this purpose a formal
representation
of the agent's world is needed:
- A set of pairs <state, action> exhaustively describing all
possible
states and the right action leading to the goal.
- Knowledge to model the states of the world and how the agent
actions
would
change them (state transitions).
- Simulation of how various agent actions change the world, so that
the
right
sequence can be found. This is achieved by searching the state space.
2. Problem representation
- States (legal, initial)
- Operators (state transitions)
- Goal state (or test for the goal state)
3. Problem
- Given: Initial and Goal states
- Find: a sequence of operators (state transitions) which transform
and
the
initial state into the goal state.
4. Examples
- Finding a path between two towns (path in a directed graph)
- Solving the farmer, wolf, cabbage and goat problem
- Solving the 8-puzzle problem
- Solving the 8-queens problem
- Searching the web
5. Solution
- Search
- Extending a state (node)
- Selecting a successor node
- Search tree
- Algorithms: uninformed (exhaustive, blind), informed (heuristic)
- Evaluating performance:
- Optimality: path cost
- Search complexity: time complexity - number of visited nodes,
space
complexity
- maximal length of the queue
- Completeness
- Constraint satisfaction (Lecture #4)
6. Uninformed Search
- General approach - using a queue
- Depth-first search - adding the new node in the beginning of the
queue
- Depth-bound search
- Iterative deepening
- Breadth-first search - adding the new node at the end of the queue
- Uniform cost search (sorting the queue by the cost of the path)
Heuristic (Informed) Search
1. Why heuristic search
- Many problems have exponential complexity
- We know the path cost form the initial state to the current one.
Knowing
the path cost to the goal would help making the right decision when
selecting
the successor node.
- Heuristic function h(n) = an evaluation (approximation)
of the
path
cost from node n to the goal.
2. Algorithms for heuristic search
- Best-first search: sorting the queue according to h(n).
Efficient,
complete,
but not optimal.
- Beam search: using a parameter n. Limiting the size of
the
queue
by choosing the n best nodes. Memory efficient, but incomplete
and
not optimal. If n=1: Hill-climbing search.
- A* search: minimizing the total path cost. The queue is sorted by
f(n)=g(n)+h(n),
where (n) is the path cost from the initial state to n. A combination
between
uniform cost (based on g(n)) and best first search (based on h(n)).
3. Properties of A*
- Completeness if the branches are finite and the cost of each
transition
is positive.
- Optimality when an admissible heuristic is used. h(n)
is
admissible if h(n) =< path cost from n to the goal.
- Worst case complexity: O(bd)
4. Heuristic functions
- Example: 8-puzzle problem. average branching factor = 3, average
path
to
the goal = 20 transitions. Number of states = 9!=362,888. Exhaustive
search
explores 320 states. Heuristics: h1(n) =
number
of misplaced tiles, h2(n) = city block distance.
- Comparing heuristics: dominant heuristics, h2(n)
>= h1(n).
- Finding heuristics:
- Relaxing the problem restrictions (simplifying the problem),
e.g. the
rules
for moving tiles.
- Learning heuristics from experience: learning a function as
weighted
sum
of features (quantitative properties of the states).
Nearest Neghbor Learning
- Distance or similarity function defines what's learned.
- Euclidean distance (for numeric attributes):
D(X,Y) = sqrt[(x1-y1)2 + (x2-y2)2
+ ... + (xn-yn)2], where X = {x1,
x2, ..., xn}, Y = {y1, y2,
..., yn}.
- Cosine similarity
(dot product when normalized to unit length): Sim(X,Y) = x1.y1
+ x2.y2 + ... + xn.yn
- Other popular metric: city-block distance. D(X,Y) = |x1-y1|
+ |x2-y2| + ... + |xn-yn|.
- As different attributes use diferent scales, normalization
is required. Vnorm = (V-Vmin) / (Vmax
- Vmin). Thus Vnorm is within [0,1].
- Nominal attributes: number of differences, i.e. city block
distance, where |xi-yi| = 0 (xi=yi)
or 1 (xi<>yi).
- Missing attributes: assumed to be maximally distant (given
normalized attributes).
- Example: weather data
ID |
outlook |
temp |
humidity |
windy |
play |
1 |
sunny |
hot |
high |
false |
no |
2 |
sunny |
hot |
high |
true |
no |
3 |
overcast |
hot |
high |
false |
yes |
4 |
rainy |
mild |
high |
false |
yes |
5 |
rainy |
cool |
normal |
false |
yes |
6 |
rainy |
cool |
normal |
true |
no |
7 |
overcast |
cool |
normal |
true |
yes |
8 |
sunny |
mild |
high |
false |
no |
9 |
sunny |
cool |
normal |
false |
yes |
10 |
rainy |
mild |
normal |
false |
yes |
11 |
sunny |
mild |
normal |
true |
yes |
12 |
overcast |
mild |
high |
true |
yes |
13 |
overcast |
hot |
normal |
false |
yes |
14 |
rainy |
mild |
high |
true |
no |
X |
sunny |
cool |
high |
true |
|
ID |
2 |
8 |
9 |
11 |
D(X, ID) |
1 |
2 |
2 |
2 |
play |
no |
no |
yes |
yes |
- Discussion
- Instance space: Voronoi diagram
- 1-NN is very accurate but also slow: scans entire training
data to derive a prediction (possible improvements: use a sample)
- Assumes all attributes are equally important. Remedy:
attribute selection or weights (see attribute relevance).
- Dealing with noise (wrong values of some attributes)
- Taking a majority vote over the k nearest neighbors
(k-NN).
- Removing noisy instances from dataset (difficult!)
- Numeric class attribute: take mean of the class values the k
nearest neighbors.
- k-NN has been used by statisticians since early 1950s.
Question: k=?
- Distance weighted k-NN:
- Weight each vote or class value (for numeric) with the
distance.
- For example: instead of summing up votes, sum up 1 / D(X,Y)
or 1 / D(X,Y)2
- Then it makes sense to use all instances (k=n).
Naive Bayes
- Basic assumptions
- Opposite of KNN: use all examples
- Attributes are assumed to be:
- equally important: all attributes have the same relevance
to the classification task.
- statistically independent (given the class value):
knowledge about the value of a particular attribute doesn't tell us
anything about the value of another attribute (if the class is known).
- Although based on assumptions that are almost never correct,
this scheme works well in practice!
- Probabilities of weather data
outlook |
temp |
humidity |
windy |
play |
sunny |
hot |
high |
false |
no |
sunny |
hot |
high |
true |
no |
overcast |
hot |
high |
false |
yes |
rainy |
mild |
high |
false |
yes |
rainy |
cool |
normal |
false |
yes |
rainy |
cool |
normal |
true |
no |
overcast |
cool |
normal |
true |
yes |
sunny |
mild |
high |
false |
no |
sunny |
cool |
normal |
false |
yes |
rainy |
mild |
normal |
false |
yes |
sunny |
mild |
normal |
true |
yes |
overcast |
mild |
high |
true |
yes |
overcast |
hot |
normal |
false |
yes |
rainy |
mild |
high |
true |
no |
- outlook = sunny [yes (2/9); no (3/5)];
- temperature = cool [yes (3/9); no (1/5)];
- humidity = high [yes (3/9); no (4/5)];
- windy = true [yes (3/9); no (3/5)];
- play = yes [(9/14)]
- play = no [(5/14)]
- New instance: [outlook=sunny, temp=cool, humidity=high,
windy=true, play=?]
- Likelihood of the two classes (play=yes; play=no):
- yes = (2/9)*(3/9)*(3/9)*(3/9)*(9/14) = 0.0053;
- no = (3/5)*(1/5)*(4/5)*(3/5)*(5/14) = 0.0206;
- Conversion into probabilities by normalization:
- P(yes) = 0.0053 / (0.0053 + 0.0206) = 0.205
- P(no) = 0.0206 / (0.0053 + 0.0206) = 0.795
- Bayes theorem (Bayes rule)
- Probability of event H, given evidence E: P(H|E) = P(E|H) *
P(H) / P(E);
- P(H): a priori probability of H (probability of event
before evidence has been seen);
- P(H|E): a posteriori (conditional) probability of H
(probability of event after evidence has been seen);
- Bayes for classification
- What is the probability of the class given an instance?
- Evidence E = instance
- Event H = class value for instance
- Naïve Bayes assumption: evidence can be split into
independent parts (attributes of the instance).
- E = [A1,A2,...,An]
- P(E|H) = P(A1|H)*P(A2|H)*...*P(An|H)
- Bayes: P(H|E) = P(A1|H)*P(A2|H)*...*P(An|H)*P(H)
/ P(E)
- Weather data:
- E = [outlook=sunny, temp=cool, humidity=high, windy=true]
- P(yes|E) = (outlook=sunny|yes) * P(temp=cool|yes) *
P(humidity=high|yes) * P(windy=true|yes) * P(yes) / P(E) =
(2/9)*(3/9)*(3/9)*(3/9)*(9/14) / P(E)
- The “zero-frequency problem”
- What if an attribute value doesn't occur with every class
value (e. g. humidity = high for class yes)?
- Probability will be zero, for example
P(humidity=high|yes) = 0;
- A posteriori probability will also be zero: P(yes|E) = 0
(no matter how likely the other values are!)
- Remedy: add 1 to the count for every attribute value-class
combination (i.e. use the Laplace estimator: (p+1) / (n+1) ).
- Result: probabilities will never be zero! (also stabilizes
probability estimates)
- Missing values
- Calculating probabilities: instance is not included in
frequency count for attribute value-class combination.
- Classification: attribute will be omitted from calculation
- Example: [outlook=?, temp=cool, humidity=high,
windy=true, play=?]
- Likelihood of yes = (3/9)*(3/9)*(3/9)*(9/14) = 0.0238;
- Likelihood of no = (1/5)*(4/5)*(3/5)*(5/14) = 0.0343;
- P(yes) = 0.0238 / (0.0238 + 0.0343) = 0.41
- P(no) = 0.0343 / (0.0238 + 0.0343) = 0.59
- Numeric attributes
- Assumption: attributes have a normal or Gaussian
probability distribution (given the class)
- Parameters involved: mean, standard deviation, density
function for probabilty
- Discussion
- Naïve Bayes works surprisingly well (even if
independence assumption is clearly violated).
- Why? Because classification doesn't require accurate
probability estimates as long as
maximum probability is assigned to correct class.
- Adding too many redundant attributes will cause problems (e.
g. identical attributes).
- Numeric attributes are often not normally distributed.
- Yet another problem: estimating prior probability is
difficult.
- Advanced approaches: Bayesian networks.
Assignment #1: Problem Solving by
Searching
(max grade 10 pts.)
Do the following and submit the results by February 24
- Represent the problem of sorting a four-element list as a state
space
search
problem:
- Use state transitions that swap two neighboring elements
- Use state transitions that swap two neighboring elements, only
if they
are not in the correct order (like in bubble sort).
- Show how the problem is solved in both cases by simple 3-4 step
hand
solved
examples.
- Write two Prolog programs that represents both types of state
spaces.
- Implement an admissible heuristic function based on the number of
pairs
of neighboring elements that are not in the right order. Add the
function
to the representation (define a rule for h(Node, Value) and include it
in the files with the state transitions). Explain what makes the
function
admissible and how it can be made inadmissible.
- Sort the list (4,3,2,1) (goal state: (1,2,3,4)) by using
all
uninformed
(search1.pl)
and heuristic (search2.pl)
search algorithms: depth_first, breadth_first, iterative_deepening,
uni_cost,
best_first, a_star and beam (with n=10 and n=1) and collect statistics
about their performance. Do this for both types of state spaces.
- Compare the performance of all those algorithms by the following
criteria:
time complexity (number of explored nodes), space complexity (max
length
of the queue) and optimality (put the results in a table and explain
the
reasons for getting each best and worst result for each performance
criterion).
Again do this for both types of state spaces (use two separate tables).
- Compare now the two types of state representations. Which space
is
bigger?
Why? How does this affect the search algorithms' perfomance.
Extra credit (max 2 pts.): Represent the 4-element
sorting
problem as a constraint satisfaction task and solve it by using csp.pl.
Explain the representation and compare it with the state space search
representation.
Documentation and submission: Write a report
describing
the solutions to all problems and answers to all questions and
mail
it as an attachment to my instructors account for the WebCT
(available
through Campus Pipeline/My Courses/Artificial Intelligence).
Assignment #2: Reasoning with
Propositional
and First-Order Logic (max grade 10 pts. + 5 pts. extra credit)
Do the following and submit the results by April 5:
- Use the wumpus world shown in Figure1 of logic.pdf
- Represent the upper left corner of the wumpus world (rooms:
(3,1),
(3,2),
(4,1), (4,2)). For each room encode the knowledge about perceiving
stench
and breeze in the rooms neighboring to the beast and pit. Restriction:
put
the perceptions in the "if" part of the implications (left of "->").
- Create two versions of the representation - one in PL and one in
FOL.
Then for each language prove the presence of the beast in room (3,1)
given
the agent perception in rooms (3,2), (4,1) and (4,2). Do this by:
- Refutation (using the deduction theorem) in two ways:
- through satisfiability test using sat.pl
(use this for the PL version onlly).
- through resolution (inferring the empty clause) using resolve.pl
- Resolution using resolve.pl (inferring the goal clause directly
as a
resolvent)
- Explain the outputs of the Prolog queries in terms of PL or FOL
semantics.
- Define the complete wumpus world (4x4 board) in FOL and translate
it
into
clausal form (use wumpus_fol from logic.pl):
- Is the result a set of Horn clauses? If so, try to represent it
as a
Prolog
program.
- Is it possible to represent all classes in Prolog?
- What happens with the negative unit clauses?
- Explain the problems and represent as much of the wumpus world
as
possible
in Prolog.
- What can be inferred by using this program? Show the queries
and
explain
the results.
Extra credit (max 5 pts.): Program an agent for the
wumpus
world that starts at room (1,1) facing east and visits all logically
provable
safe rooms.
Restriction: action rules cannot have more than two
actions. Explain the agent's reasoning and actions and test it with
wumpus.pl.
Documentation and submission: Write a report
describing
the solutions to all problems and answers to all questions and
mail
it as an attachment to my instructors account for the WebCT
(available
through Campus Pipeline/My Courses/Artificial Intelligence).
Assignment #3: Probabilistic Reasoning
and Learning (max grade 20 pts. + 5 pts. extra credit)
Due date: May 10
Use the weather (tennis) data in tennis.pl
and do the following:
- Create a Bayesian network for the weather data using the approach
taken in loandata.pl.
Then use this network with bn.pl
to:
- Decide on playing tennis on a day described as [outlook=sunny,
temp=mild, humidity=normal, wind=weak].
- Find an attribute (if possible) that may be used to decide
whether or not to
play tennis (the class prediction based on this attribute value is the
same no matter what the values of the other attributes are).
- Add taxonomies for the attributes so that they can be used as
structural (see how this is done in loandata.pl).
Then use version space learning (vs.pl)
to create the largest possible concepts for each class ("yes" and "no")
by putting an example from it in the beginning of the set of examples.
In other words, find a good order
of the examples (one starting with an example from class "yes" and one
- from class "no"), so that the program converges after reading as many
as possible examples.
Use the following approach (see also Version
space learning):
- If VS stops before reaching the end of the examples, because of
inconsistency (empty G and S), reorder the examples so that the concept
is learned before reaching the inconsistency.
- If VS stops before reaching the end of the examples, because of
convergence (it finds a consistent hypothesis), add more examples so
that you reach a concept that covers as many examples as possible.
- Use decision tree learning (id3.pl)
with the weather data and:
- Create all possible decision trees (by varying the threshold)
and compute the total error (the proportion of misclassified training
examples) for each.
- Decide on playing tennis on a day described as [outlook=sunny,
temp=mild, humidity=normal, wind=weak] using each of the trees. Compare
the decisions.
- Compare the trees with the concepts learned with VS with
respect to their coverage and whether ot not they are disjunctive.
- Use Naive Bayes (bayes.pl)
and
Nearest Neighbor (knn.pl
with k=1,3,5 and knnw with k=14) and:
- Compute the error of each algorithm (and each parameter for
knn) on the training data (tennis.pl).
- Compute the holdout error of each algorithm (and each parameter
for knn) by splitting the weather data into 8-example training set and
6-example test set. Use
Naive Bayes Lab, part IV as a guideline.
- Classify [outlook=sunny,
temp=mild, humidity=normal, wind=weak] with each algorithm (and each
parameter for knn).
- Compare (use a table to summarize) results and find out which
algorithm performs better.
Extra credit (max 5 pts.): Use agglomerative clustering (cluster.pl
with min and max parameter) and:
- Compute the total error using cluster to classes evaluation on
the weather data (tennis.pl).
For how to compute the error see
Clustering - part II.
- Add the example [outlook=sunny,
temp=mild, humidity=normal, wind=weak] to the data and classify it by
the majority class in the cluster it falls.
- Describe the approaches you use and explain the output from the
Prolog queries.
Documentation and submission: Write a report
describing
the solutions to all problems and
mail
it as an attachment to my instructors account for the WebCT
(available
through Campus Pipeline/My Courses/Artificial Intelligence).
Midterm Test (max grade 10 pts.)
The Midterm test will be available in the WebCT course template from
April
1 through April 4. There will be 10 multiple choice or short answer
questions
that will have to be answered within 2 hours.
The test includes the following topics:
- Uninformed and informed search algorithms
- Searching game trees
- Constraint satisfaction
- Propositional and First-order logic (languages, clausal form)
- Inference in PL and FOL (clause subsumption and resolution).
Final Test (max grade 10 pts.)
The Final test will be available in the WebCT course template from
May 13 through May 19. There will be 10 multiple choice or short
answer
questions
that will have to be answered within 2 hours.
The test includes the following topics:
- Planning
- Bayesian Reasoning
- Bayesian Networks
- Decision Tree Learning
- Representing hypotheses, Generalization/Specialization with
taxonomies
- Version Space Learning
- Naive Bayes
- Nearest Neighbor
- Hypothesis Evaluation
- Clustering
Last updated: 5-10-2005