Syllabus: Tpcs in Data Base SysApplcn - CS-580-C01 Topic - Data Mining
Summer07 |
| |
| Section Information: CS-580-C01 Topic - Data
Mining Summer07 |
| |
| |
|
| Course Name |
|
Tpcs in Data Base SysApplcn
|
|
| |
| |
|
| Course Description |
|
Tpcs in Data Base Sys/Applcn
|
|
| Section Instructor: Dr. Zdravko Markov |
| |
| |
|
|
| Biography |
| |
| |
|
|
|
Dr. Zdravko Markov has an M.S. in Mathematics and
Computer Science and a Ph.D. in Artificial Intelligence. He has been
teaching and doing research in the area of Machine Learning for more
than 15 years. Recently he developed a novel approach to conceptual
clustering and is studying its application to Data Mining tasks. Dr.
Markov has published 4 textbooks and more than 50 research papers in
conference proceedings and journals. His most recent book
(co-authored with Daniel Larose) is “Data Mining The Web: Uncovering
Patterns in Web Content, Structure, and Usage", published by Wiley
in 2007. Dr. Markov’s CCSU courses are in the areas of Computer
Architecture and Design, Computing and Communication technology,
Machine Learning, Data and Web Mining.
|
|
| Course Dates |
| |
| |
|
|
|
May 29, 2007 - July 19, 2007
|
|
| Course Description |
| |
| |
|
|
|
Data Mining studies algorithms and computational
paradigms that allow computers to find patterns and regularities in
databases, perform prediction and forecasting, and generally improve
their performance through interaction with data. It is currently
regarded as the key element of a more general process called
Knowledge Discovery that deals with extracting useful knowledge from
raw data. The knowledge discovery process includes data selection,
cleaning, coding, using different statistical and machine learning
techniques, and visualization of the generated structures. The
course will cover all these issues and will illustrate the whole
process by examples. Special emphasis will be give to the Machine
Learning methods as they provide the real knowledge discovery tools.
Important related technologies, as data warehousing and on-line
analytical processing (OLAP) will be also discussed. The students
will use recent Data Mining software. Enrollment in this course is
limited to 20 students. |
|
| Course Goals |
| |
| |
|
|
|
To introduce students to the basic concepts and techniques of
Data Mining.
To develop skills of using recent data mining software for
solving practical problems.
To gain experience of doing independent study and research.
|
|
| Required Textbook |
| |
| |
|
|
|
Ian H. Witten and Eibe Frank, Data Mining: Practical
Machine Learning Tools and Techniques (Second Edition), Morgan
Kaufmann, 2005, ISBN: 0-12-088407-0. |
|
| Required Software |
| |
| |
|
|
|
Data Mining System with Free Open Source Machine
Learning Software in Java. Available at
http://www.cs.waikato.ac.nz/~ml/weka/index.html
|
|
| Grading |
| |
| |
|
|
|
Grading will be based on six assignments (60%), two quizzes (20%)
and class participation through four scheduled discussions (20%).
Assignments and qizzes will be graded on a 100 point scale,
discussions - on a 50 point scale. Thus the maximum course total
will be 1000 points.
The letter grade will be determined by the following grading
scale:
| A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |
D+ |
D |
D- |
F |
| 950-1000 |
900-949 |
870-899 |
840-869 |
800-839 |
770-799 |
740-769 |
700-739 |
670-699 |
640-669 |
600-639 |
0-599 |
Late assignments will be marked one letter grade down for each 3
days they are late.
It is expected that all students will conduct themselves in an
honest manner and NEVER claim work which is not their own. Violating
this policy will result in a substantial grade penalty or a final
grade of F. |
|
| Course Content (12 units) |
| |
| |
|
|
|
- Introduction to Data Mining
- What is data mining?
- Related technologies - Machine Learning, DBMS, OLAP,
Statistics
- Data Mining Goals
- Stages of the Data Mining Process
- Data Mining Techniques
- Knowledge Representation Methods
- Applications
- Example: weather data
- Data Warehouse and OLAP
- Data Warehouse and DBMS
- Multidimensional data model
- OLAP operations
- Example: loan data set
- Data preprocessing
- Data cleaning
- Data transformation
- Data reduction
- Discretization and generating concept hierarchies
- Installing Weka 3 Data Mining System
- Experiments with Weka - filters, discretization
- Data mining knowledge representation
- Task relevant data
- Background knowledge
- Interestingness measures
- Representing input data and output knowledge
- Visualization techniques
- Experiments with Weka - visualization
- Attribute-oriented analysis
- Attribute generalization
- Attribute relevance
- Class comparison
- Statistical measures
- Experiments with Weka - using filters and statistics
- Data mining algorithms: Association rules
- Motivation and terminology
- Example: mining weather data
- Basic idea: item sets
- Generating item sets and rules efficiently
- Correlation analysis
- Experiments with Weka - mining association rules
- Data mining algorithms: Classification
- Basic learning/mining tasks
- Inferring rudimentary rules: 1R algorithm
- Decision trees
- Covering rules
- Experiments with Weka - decision trees, rules
- Data mining algorithms: Prediction
- The prediction task
- Statistical (Bayesian) classification
- Bayesian networks
- Instance-based methods (nearest neighbor)
- Linear models
- Experiments with Weka - Prediction
- Evaluating what's been learned
- Basic issues
- Training and testing
- Estimating classifier accuracy (holdout, cross-validation,
leave-one-out)
- Combining multiple models (bagging, boosting, stacking)
- Minimum Description Length Principle (MLD)
- Experiments with Weka - training and testing
- Mining real data
- Preprocessing data from a real medical domain (310 patients
with Hepatitis C).
- Applying various data mining techniques to create a
comprehensive and accurate model of the data.
- Clustering
- Basic issues in clustering
- First conceptual clustering system: Cluster/2
- Partitioning methods: k-means, expectation maximization (EM)
- Hierarchical methods: distance-based agglomerative and
divisible clustering
- Conceptual clustering: Cobweb
- Experiments with Weka - k-means, EM, Cobweb
- Advanced techniques, Data Mining software and applications
- Text mining: extracting attributes (keywords), structural
approaches (parsing, soft parsing).
- Bayesian approach to classifying text
- Web mining: classifying web pages, extracting knowledge from
the web
- Data Mining software and applications
|
|