=====================================================================
                                                                       
                              ======                                   
                              README                                   
                              ======                                   
                                                                       
                             WEKA 3.2
                          14 March 2001
                                                                       
                 Java Programs for Machine Learning 
                                                                       
   Copyright (C) 1998, 1999, 2000, 2001  Eibe Frank, Leonard Trigg, Mark Hall 

                   email: wekasupport@cs.waikato.ac.nz            
                                                                       
=====================================================================

NOTE: We are following the Linux model of releases, where, an even
second digit of a release number indicates a "stable" release and an
odd second digit indicates a "development" release (e.g. 3.0.x is a
stable release, and 3.1.x is a developmental release). If you are
using a developmental release, you might get to play with extra funky
features, but it is entirely possible that these features
come/go/transmogrify from one release to the next.  If you require
stability (e.g. if you are using Weka for teaching), use a stable
release.

=====================================================================

Contents:
---------

1. Installation

2. Getting started

   - Classifiers
   - Association rules
   - Filters
   - Data format
   - Experiment package 
   - GUIs               

3. Tutorial

4. Source code

5. Credits

6. Submission of code and bug reports

7. Copyright


----------------------------------------------------------------------

1. Installation:
----------------

For people familiar with their command-line interface
-----------------------------------------------------

a) Set WEKAHOME to be the directory which contains this README.
b) Add $WEKAHOME/weka.jar to your CLASSPATH environment variable.
c) Bookmark $WEKAHOME/doc/packages.html in your web browser.


To start a simple GUI for using Weka
------------------------------------

If you are using Java 2 (JDK 1.2 or equivalent) or you have Swing
1.1.1 (or later installed for Java 1.1), you should be able to just
double-click on the weka.jar icon, or from a command-line (assuming
you are in the directory containing weka.jar) type

java -jar weka.jar
or if you are using Windows use
javaw -jar weka.jar

This will start a small GUI (GUIChooser) from which you can select the
SimpleCLI interface or the more sophisticated Explorer and
Experimenter interfaces (see below). SimpleCLI just acts like a simple
command shell and has been provided mainly for Mac users who don't
have their own shell :)

If you are using NT/Windows you may need to create a file association
before you can double click on the weka.jar icon. Open the file
Explorer or a file browser window. Select View (or perhaps
Tools)->Options. Click on File Types. Click on New Type. Fill in the
Type field (put something like "java jar files"). Fill in the
Associated Extension ("jar"). Add new Action, with Action name Open,
and application as "javaw.exe -jar" (you will probably need to browse
to the location of your JRE to get the path correct for javaw---you
will find javaw in the "bin" directory of wherever your JRE is
installed).

If you are using some other Java virtual machine you need to start
GUIChooser from within weka.jar. For JDK 1.1 users something
like the following:

java -classpath weka.jar:$CLASSPATH weka.gui.GUIChooser
or if you are using Windows use
javaw -classpath weka.jar;$CLASSPATH weka.gui.GUIChooser


----------------------------------------------------------------------

2. Getting started:
-------------------

In the following, the names of files assume use of a unix command-line
with environment variables. For other command-lines (including SimpleCLI)
you should substitute the name of the directory where weka.jar lives 
where you see $WEKAHOME. If your platform uses something other than / as
the path separator, also make the appropriate substitutions.

===========
Classifiers
===========

Try:

java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff

This prints out a decision tree classifier for the iris dataset 
and ten-fold cross-validation estimates of its performance. If you
don't pass any options to the classifier, WEKA will list all the 
available options. Try:

java weka.classifiers.j48.J48

The options are divided into "general" options that apply to most
classification schemes in WEKA, and scheme-specific options that 
only apply to the current scheme---in this case J48. WEKA has a
common interface to all classification methods. Any class that 
implements a classifier can be used in the same way as J48 is used
above. WEKA knows that a class implements a classifier if it 
extends the Classifier or DistributionClassifier classes in
weka.classifiers. Almost all classes in weka.classifiers fall into
this category. Try, for example:

java weka.classifiers.NaiveBayes -t $WEKAHOME/data/labor.arff

Here is a list of the most important classifiers currently 
implemented in weka.classifiers:

a) Classifiers for categorical prediction:

weka.classifiers.IBk: k-nearest neighbour learner
weka.classifiers.j48.J48: C4.5 decision trees 
weka.classifiers.j48.PART: rule learner 
weka.classifiers.NaiveBayes: naive Bayes with/without kernels
weka.classifiers.OneR: Holte's OneR
weka.classifiers.KernelDensity: kernel density classifier
weka.classifiers.SMO: support vector machines
weka.classifiers.Logistic: logistic regression
weka.classifiers.AdaBoostM1: AdaBoost
weka.classifiers.LogitBoost: logit boost
weka.classifiers.DecisionStump: decision stumps (for boosting)

b) Classifiers for numeric prediction:

weka.classifiers.LinearRegression: linear regression
weka.classifiers.m5.M5Prime: model trees
weka.classifiers.IBk: k-nearest neighbour learner
weka.classifiers.LWR: locally weighted regression
weka.classifiers.RegressionByDiscretization: uses categorical classifiers

=================
Association rules
=================

Next to classification schemes, there is some other useful stuff in 
WEKA. Association rules, for example, can be extracted using the 
apriori algorithm. Try

java weka.associations.Apriori -t $WEKAHOME/data/weather.nominal.arff

=======
Filters
=======

There are also a number of tools that allow you to manipulate a
dataset. These tools are called filters in WEKA and can be found
in weka.filters.

weka.filters.DiscretizeFilter: discretizes numeric data
weka.filters.AttributeFilter: deletes/selects attributes
etc.

Try:

java weka.filters.DiscretizeFilter -i $WEKAHOME/data/iris.arff -c last

===========
Data format
===========

Datasets in WEKA have to be formatted according to the arff 
format. Examples of arff files can be found in $WEKAHOME/data. 
What follows is a short description of the file format. 

A dataset has to start with a declaration of its name:

@relation name

followed by a list of all the attributes in the dataset (including 
the class attribute). These declarations have the form

@attribute attribute_name specification


If an attribute is nominal, specification contains a list of the 
possible attribute values in curly brackets:

@attribute nominal_attribute {first_value, second_value, third_value}


If an attribute is numeric, specification is replaced by the keyword 
numeric: (Integer values are treated as real numbers in WEKA.)

@attribute numeric_attribute numeric


In addition to these two types of attributes, there also exists a
string attribute type. This attribute provides the possibility to
store a comment or ID field for each of the instances in a dataset:

@attribute string_attribute string

After the attribute declarations, the actual data is introduced by a 

@data

tag, which is followed by a list of all the instances. The instances 
are listed in comma-separated format, with a question mark 
representing a missing value. Comments are lines starting with %

==================
Experiment package
==================

There is now support for running experiments that involve evaluating
classifiers on repeated randomizations of datasets, over multiple
datasets (you can do much more than this, besides). The classes for
this reside in the weka.experiment package. The basic architecture is
that a ResultProducer (which generates results on some randomization
of a dataset) sends results to a ResultListener (which is responsible
for stating whether it already has the result, and otherwise storing
results).

Example ResultListeners include:
weka.experiment.CSVResultListener: outputs results as
comma-separated-value files.
weka.experiment.InstancesResultListener: converts results into a set
of Instances.
weka.experiment.DatabaseResultListener: sends results to a database
via jdbc. 

Example ResultProducers include:
weka.experiment.RandomSplitResultProducer: train/test on a % split
weka.experiment.CrossValidationResultProducer: n-fold cross-validation
weka.experiment.AveragingResultProducer: averages results from another
ResultPoducer 
weka.experiment.DatabaseResultProducer: acts as a cache for results,
storing them in a database.

The RandomSplitResultProducer and CrossValidatioResultProducer make
use of a SplitEvaluator to obtain actual results for a particular
split, provided are ClassifierSplitEvaluator (for nominal
classification) and RegressionSplitEvaluator (for numeric
classification). Each of these uses a Classifier for actual results
generation. 

So, you might have a DatabaseResultListener, that is sent results from
an AveragingResultProducer, which produces averages over the n results
produced for each run of an n-fold CrossValidationResultProducer,
which in turn is doing nominal classification through a
ClassifierSplitEvaluator, which uses OneR for prediction. Whew. But
you can combine these things together to do pretty much whatever you
want. You might want to write a LearningRateResultProducer that splits
a dataset into increasing numbers of training instances.

In terms of database connectivity, we use InstantDB, a free database
implemented entirely in Java. It is available from:

http://www.instantdb.co.uk/index.htm

From there you will also be able to find a RmiJdbc bridge which is
useful for running a server that just listens for experiment results
from other machines. When using classes that access a database, you
will probably want to create a properties file that specifies which
jdbc drivers to use, and where to find the database. This file should
reside in your home directory or the current directory and be called
"DatabaseUtils.props". An example is provided in weka/experiment, this
file is used unless it is overidden by one in your home directory or
the current directory (in that order).

To run a simple experiment from the command line, try:

java weka.experiment.Experiment -r -T datasets/UCI/iris.arff  \
  -D weka.experiment.InstancesResultListener \
  -P weka.experiment.RandomSplitResultProducer -- \
  -W weka.experiment.ClassifierSplitEvaluator -- \
  -W weka.classifiers.OneR

(Try "java weka.experiment.Experiment -h" to find out what these options
mean) 

If you have your results as a set of instances, you can perform paired
t-tests using weka.experiment.PairedTTester (use the -h option to find
out what options it needs).

This is all much easier from the Experiment Environment GUI :-)


====
GUIs
====

We now have two GUIs to make using Weka a little easier: one that acts
much as the original interface to the old Weka 2 system, and one for
conducting experiments (see README_Experiment_Gui). Both of these
interfaces use Swing, so you need to either be using Java 2 or have
downloaded Swing 1.1.1 or later for your JDK 1.1. One of the
components of the GUIs is a generic object editor that requires a
configuration "GenericObjectEditor.props". There is an example file in
weka/gui. This file will be used unless it is overidden by one in your
home directory or the current directory (in that order).  This file
simply specifies for each superclass which subclasses to offer as
choices. For example, which Classifiers are available/wanted to be
used when an object requires a property of type Classifier. An example
file is provided.


To start the Explorer:
java weka.gui.explorer.Explorer

To start the experiment editor:
java weka.gui.experiment.Experimenter

These _really_ need more documentation, but that'll do to get you
started :)

----------------------------------------------------------------------

4. Tutorial:
------------

A tutorial on how to use WEKA is in $WEKAHOME/Tutorial.pdf. However,
not everything in WEKA is covered in the Tutorial. For a complete list
you have to look at the online documentation
$WEKAHOME/doc/packages.html.
In particular, Tutorial.pdf is a draft from the forthcoming book (see
our web page), and so only describes features in the stable 3.0
release.

----------------------------------------------------------------------

5. Source code:
---------------

The source code for WEKA is in $WEKAHOME/weka-src.jar. To expand it, 
use the jar utility that's in every Java distribution.

----------------------------------------------------------------------

6. Credits:
-----------

Len Trigg     - weka.experiment, weka.gui, weka.gui.experiment,
                weka.gui.explorer, weka.filters, weka.estimators, 
                weka.classifiers, weka.core

Eibe Frank    - weka.core, weka.classifiers, weka.classifiers.j48,
                weka.filters, weka.associations

Mark Hall     - weka.clusterers, weka.attributeSelection,
                weka.classifiers.DecisionTable, weka.gui, 
		weka.gui.explorer, weka.gui.experiment, 
		weka.gui.visualize, weka.converters

Yong Wang     - weka.classifiers.m5

Ian H. Witten - weka.classifiers.OneR, weka.classifiers.Prism

Stuart Inglis - weka.classifiers.IB1

----------------------------------------------------------------------

7. Call for code and bug reports:
---------------------------------

If you have implemented a learning scheme, filter, application,
visualization tool, etc., using the WEKA classes, and you think it 
should be included in WEKA, send us the code, and we can put it
in the next WEKA distribution.

If you find any bugs, send a fix to wekasupport@cs.waikato.ac.nz.
If that's too hard, just send us a bug report.

-----------------------------------------------------------------------

8. Copyright:
-------------

WEKA is distributed under the GNU public license. Please read
the file COPYING.

-----------------------------------------------------------------------