[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Chapter summary:
General utilities used by several mul libraries.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The STL iterators are great general purpose devices - however they force authors of new algorithms to write heavily templated code. This can prove difficult for the user, and impossible for some compilers in the case of extra-templated member functions. The mbl classes are templated, but on data-type, and this template parameter doesn't have the same potential to confuse.
The purpose of the mbl_data_collector
and mbl_data_wrapper
are to provide similar
purpose as STL iterators, but using polymorphism, instead of templating to handle different types
of iterator. The descendants of mbl_data_collector
provide something similar to an STL
output iterator. The descendants of mbl_data_wrapper
provide something similar to an
STL input iterator.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When reading data from an mbl_data_wrapper
, you do not need to know which polymorphic
concrete type is being used. This way a client can pass you data from any source.
Reading data from an mbl_data_wrapper
is straight forward.
vnl_vector<double> my_sum(mbl_data_wrapper<vnl_vector<double> >& data) { data.reset(); vnl_vector<double> sum = data.current(); while (data.next()) sum += data.current(); } |
You can do random access, using the set_index(int n)
method, but there is no guarantee that
the underlying iterator supports this efficiently, so you should use sequential access where possible.
The wrappers themselves do not support IO, because they are only wrappers - they do not own the
wrapped data.
Currently there is only one concrete wrapper, mbl_data_array_wrapper
which wraps C-style data
arrays. This can be used to wrap vcl_vector
s. Wrappers have in the past been written to
read files of data directly from disk.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When writing data to an mbl_data_collector
, you do not need to know which polymorphic
concrete type is being used. This way you can pass a client data in any way they choose.
Writing data to an mbl_data_collector
is straight forward.
void my_random_data(mbl_data_collector<vnl_vector<double> >& data) { // Create sampler vpdfl_axis_gaussian gaussian; gaussian.set(vnl_vector<double>(20,0.0), vnl_vector<double>(20,1.0)); vpdfl_sampler_base *sampler = gaussian.new_sampler(); // set up stuff data.clear(); vnl_vector<double> x; // put random samples in collector for (unsigned i=0 i<2000; ++i) { sampler->sample(x); // generate random sample data.record(x); // record sample } } |
You can store a collector using vsl
, although if you want to store a polymorphic collector variable,
you will need to cast the pointer to a mbl_data_collector_base
(1).
You can create a mbl_data_wrapper from an mbl_data_collector
by calling
collector.data_wrapper()
.
There are two concrete data collectors - mbl_data_collector_list
and
mbl_stochastic_data_collector
. The former stores all the data given to it, whilst the
latter stores a fixed size complete random subset of all the data given to it. This is useful
if you do not know how much data an algorithm is going to produce, but for space or time reasons
you can only store and use subset of it.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Rather than use vnl_sample()
, vnl_random
, provides a superior
random number generator.
The downside of vnl_random is that it is about 5 times slower than some other pseudo-random number generators. For a discussion on speed/randomness trade-off see Press et al. Numerical Recipes in C, Cambridge, 1992.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
I was using a vpdfl_axis_gaussian_sampler
to produce some test data, and
there seemed to be too many outliers. I was worried about the randomness
of the vnl_random::normal()
, so I ran some tests.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This seems normal.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
For a seed of 123456, the numbers of outliers at 1, 2, 3, and 4 s.d. were all greater than the expected value. This bias was reduced but still evident when the number of samples was increased from 10^4 to 10^5 and 10^6.
The results for a key of 9667566 are
Got 3183 samples outside 1 sd- expected3174 Got 463 samples outside 2 sd- expected455 Got 28 samples outside 3 sd- expected27.5 Got 0 samples outside 4 sd- expected0.6 |
These seem more normal.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
One further possibility is that there might be correlation between adjacent samples. So I counted the separations (number of samples) between outliers (those that exceeded 2 standard deviations), and plotted the distribution of intervals, along with the expected distribution. The first graph is for a seed of 123456, and the second for a seed of 9667566. (Note: The key on the following two graphs is reversed.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There does not appear to be a persistent bias in the number of outliers as the seed is modified.
A seed of 123456 does seem to give more outliers than expected, and in particular gives more adjacent outliers that expected - which can have significant effects on 2-D test data set generation.
Use a different seed 9667566
Ian Scott 29 November 2000
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
mbl_cloneable_ptr
Almost the opposite of a vbl_smart_ptr, this class acts like a pointer,
but performs a deep copy of the referenced object when the pointer itself is copied.
Supports IO via vsl
.
mbl_chord
Describes a horizontal line on a image.
mbl_priority_bounded_queue
This acts like a vcl_priority_queue, but only stores the n least values, where n is the configurable bound.
mbl_stats_1d, mbl_sum_1d
Calculate, store, and manipulate running statistics and sums.
mbl_lda
Calculate, manipulate and use a Linear Discriminant Analysis of data.
mbl_gamma.h
Complete and incomplete gamma functions
mbl_k_means.h
Perform k means clustering.
mbl_matrix_products.h, mbl_matxvec.h
Perform various linear algebra operations.
mbl_print.h
Commands that debuggers can run easily on demand. Include this file in your application, and you call these functions from a debugger to display common containers.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It would be useful to have several random number generators all with a similar
interface to vnl_random
. Adding an abstract base class, would allow the
use of the strategy pattern - for instance swapping in a more random generator.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Since the random number generator has a real state, it could be useful to be able
to save that state to a vsl_b_istream
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
With a bit more development some of this code could be usefully moved into @VXL . e.g.
mbl_chord -> vil1_chord, mbl_gamma.h -> vnl_gamma.h, mbl_priority_bounded_queue.h -> vbl_priority_bounded_queue, etc. |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on May, 1 2013 using texi2html 1.76.