[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5. clsfy: Statistical Classification

Chapter summary:
Statistical classifiers only work x% of the time. x is inversely proportional to what the theory says it should be.

clsfy contains several classes for representing, using and training statistical classifiers. Input data is represented by vnl_vector<double>. Output classes are represented by integers [0..n_classes).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.1 Classifiers

All classifiers support the classification of sample vectors, and estimates of class probabilities. All classifiers are derived from clsfy_classifier_base.

Main functions

unsigned n_dims() const

Dimensionality of vector space of inputs.

unsigned n_classes() const

The number of possible output classes. If n_classes() == 1, this indicates a binary classifier. In this case, most functions return values associated with just the positive (1st) class. As far as the interface is concerned a binary classifier is distinct from a multiclass classifier with n_classes() == 2.

unsigned classify(x) const

Most likely class of vector x

void class_probabilities(vcl_vector<double> & outputs, x) const

Estimate of a-posteriori class probabilities for vector x. If the classifier is binary (i.e. n_classes == 1), only a single value will be returned, and will be the probability of being in the class 1, also called the positive class.

double log_l(x)

If the classifier is binary, an estimate of the a-posteriori log likelihood of being in class 1.

The classifiers all support IO via vsl_b_read, vsl_b_write, and vsl_print_summary.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2 Builders

The classifier training algorithms are embedded within the classes derived from clsfy_builder_base.

Main functions

clsfy_classifier_base* new_classifier()

Create a new classifier of appropriate type on heap and return pointer

double build(model, training_inputs, training_outputs, n_classes)

Train the classifier from the data supplied

The concrete builders have attributes that can be modified to control the training process. They should all have default values for these attributes which may allow you to build a classifier without understanding too much about it.

The builders all support IO via vsl_b_read, vsl_b_write, and vsl_print_summary.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.2.1 Strategy Pattern

This code is an example of the strategy pattern (Gamma, et al. Design Patterns, Addison Wesley, 1995.) It is possible to write code that builds and uses a classifier, where your code does not itself know what sort of classifier is being used. Both builders and classifiers can be saved and loaded by base class pointer.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.3 Derived Classes

clsfy_binary_hyperplane

Simple two class classifier, where the class boundary is a plane (or line in two dimensions).

clsfy_binary_hyperplane_ls_builder

Train a hyperplane classifier using least squares.

clsfy_pdf_classifier

A binary classifier that takes a single PDF to describe the positive class (class number 1). The boundary is set on an iso-probability contour.

clsfy_k_nearest_neighbour

One of the simplest and most effective classifies around. Don't wait until the end of your PhD before comparing your algorithm with this one.

clsfy_rbf_parzen

A Parzen window classifiers that uses a radial basis kernel at each training point.

clsfy_random_classifier

Useful for testing, this classifier outputs a preferred class independent of the input data.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Examples

Suppose we wish to compute a classifier from a set of vectors, then estimate the probability that each vector was taken from class 1 by the distribution.

 
vcl_vector<vnl_vector<double> > data_inputs(n);
vcl_vector<unsigned> data_targets(n);
// Load in the vectors
....

// Create an iterator object to pass the data in
mbl_data_wrapper<vnl_vector<double> > v_data(data_input);

// Define what type of builder to use.  In this case we want a hyperplane.
clsfy_binary_hyperplane_ls_builder builder;

// Generate model to build
clsfy_classifier_base *classifier = builder.new_classifier();
// I could have created it directly using
// clsfy_binary_hyperplane;

// Build the model from the data
builder.build(*classifier, v_data, data_targets);
vsl_print_summary(vcl_cout, classifier);

// Now find error;
unsigned error;
for (int i=0;i<data.size();++i)
{
    if (classifier->classify(data_inputs[i]) != data_targets[i])
      error++;

vcl_cout "Training error " << error << " out of " << n << "samples"<<vcl_endl;

// Tidy up
delete classifier;

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on May, 1 2013 using texi2html 1.76.