[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Chapter summary: Load images using
vil_load
. Access them using avil_image_view<T>
.
The vxl image library has evolved from the TargetJr and Manchester Image libraries. As with its predecessors, its primary goals is to provide flexible access to all 2D images, including those too large to fit in the address space of a single program or process, and very powerful and fast access to images in memory. In fact, both cases need similar treatment: even in-core images are assumed to be sufficiently large (say a megabyte) that special care must be taken to avoid unnecessary copying of their data. In both cases, the normal requirements of efficiency and ease-of-use apply. The system must allow:
This vil library is the second VXL image library, and is sometimes referred to as vil2. The original vxl image library vil1 is deprecated.
You can read more about the design philosophy in $VXLSRC/core/vil/notes.html
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Let's look at an example of vil in use. This program makes an image from a disk file, copies it into memory, and prints the pixel at 100,100.
#include <vcl_iostream.h> #include <vxl_config.h> #include <vil/vil_rgb.h> #include <vil/vil_load.h> #include <vil/vil_image_view.h> int main() { vil_image_view<vil_rgb<vxl_byte> > img; img = vil_load("foo.ppm"); vcl_cerr << "Pixel 100,100 = " << img(100,100) << vcl_endl; } |
The first interesting line declares img to be an image. vil_image_view is the basic image type. It represents an image in memory about whose structure, size and pixel type we know everything. Hence we need to specify the pixel type at this point.
Now let's skip to the end to explain the pixel access method.
img(100,100) |
This looks up the pixel at position 100,100 and returns its value. The pixel type was defined on the first line to be an rgb of bytes, and that is what will be displayed.
[255 128 128] |
Where it matters (such as when loading an image in from disk) it is assumed that the image origin is at the top left of the image.
Finally lets look at the middle line. This consists of two parts.
The vil_load
function does a lot of work behind the scenes
to determine what the image type is, and then load that image into
memory. The second part is the assignment which has several special properties.
vil_image_view
object
is really a view of some underlying data. The view understands where the
real image data is in memory and how to interpret it. When you copy
a view, you merely copy this interpretation information, not the actual
image data. This is important, because often images are very big, and
copying is expensive. The underlying image is managed with smart pointers
so when the last view to the underlying data is destroyed, the image data
will be too.
vil_load
by default loads the image as 3 planes, with the pixel
type as vxl_byte
. It is trivial to reconfigure a vil_image_view
so that it views the same image data as one plane of rgb pixels. The
assignment will automatically do any cheap conversion necessary. You may
ask then, how is that we know that the pixel type can be viewed as RGB of bytes?
Here, we simply know that our image foo.ppm is this type. In general you can
either find out what the pixel type is before you load the image, or
you can force it to whatever pixel type you want. The latter may involve
a relatively expensive pixel by pixel conversion, so this will not happen
automatically.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Anyway, the usual next step in demonstrating an image handling library is to show thresholding, so let's have a look. This program loads an image into memory, forcing it to RGB byte format, and creates a new image where all pixels greater than a threshold value are set to 0.
#include <vxl_config.h> #include <vil/vil_rgb.h> #include <vil/vil_load.h> #include <vil/vil_save.h> #include <vil/vil_image_view.h> #include <vil/vil_convert.h> int main(int argc, char **argv) { vil_image_view<vil_rgb<vxl_byte> > img; img = vil_convert_to_component_order( vil_convert_to_n_planes(3, vil_convert_cast(vxl_byte(), vil_load(argv[1])))); for (unsigned j = 0; j < img.nj(); ++j) for (unsigned i = 0; i < img.ni(); ++i) if (img(i,j).r < 200 && img(i,j).g < 200 && img(i,j).b < 200) img(i,j) = vil_rgb<vxl_byte>(0,0,0); vil_save(img, argv[2]); return 0; } |
The call to vil_save
sends the modified image in img to disk.
The choice of file format is determined automatically from the extension of
the filename. If one wants more control, a string can be appended to specify the
format, e.g.
vil_save(buf, argv[2], "jpeg"); |
Of course, if your user has chosen a name such as "foo.ppm", you'll have a oddly named image.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You should know by now that copying vil_image_view
objects does not
duplicate the data they point to. This allows images to be passed into and
out of functions efficiently. It also means that modifying the data in one
vil_image_view
might change that in another. Take this example
... vil_image_view<float> a( vil_convert_cast(float(), vil_load("x")) ); vil_image_view<float> b = a; b(100,100) = 12; ... |
After the assignment in line 3, both a(100,100) and b(100,100)
are set to the value 12. On the other hand, if we had used
vil_copy_deep
, thus:
... vil_image_view<float> a( vil_convert_cast(float(), vil_load("x")) ); vil_copy_deep(a, b); b(100,100) = 12; ... |
or
... vil_image_view<float> a( vil_load("x") ); vil_image_view<float> b( vil_copy_deep(a) ); b(100,100) = 12; ... |
then a is unchanged after the assignment to b(100,100).
Note again that the actual copying is done in vil_copy_deep
; when the
return value is assigned to b, there is an efficient view copy.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Broadly there are two sorts of image one is interested in
As we have seen the first sort of images are represented
by a vil_image_view<T>
on the data in memory.
For some very large images it is not
possible or desirable to load them into memory. In this case it is
useful to be able to load in a sub-section of the image,
manipulate it, and possible write it out again. Alternatively
you may want to pass an image about, and process it without
knowing its pixel type. vil supports these second sort of images
using vil_image_resource
. You cannot create an image
resource object directly, instead you use a creation function
which returns a smart pointer to the base class
vil_image_resource_sptr
. When manipulating
vil_image_resource
s it will almost entirely be in terms of
vil_image_resource_sptr
s.
There are several types of
image resource, with different creation functions:
vil_pnm_image
,
vil_jpeg_image
. These are created using
vil_load_image_resource()
,
and vil_new_image_resource()
.
vil_memory_image
: Representing an image in memory
This is created using
vil_new_image_resource()
. Alternatively if you want to
wrap an existing view up as a vil_image_resource you can call
vil_new_image_resource_of_view()
vil_crop_image_resource
and
vil_decimate_image_resource
. These
are created using the equivalent functions: vil_crop()
,
vil_decimate()
, etc.
vil_convolve_1d_resource
.
These are created using the equivalent
functions e.g. vil_convolve_1d()
.
To actually get some image pixels you call the resource's
get_view()
or get_copy_view()
method.
For example, the vil_load()
function works by creating
a vil_image_resource
, and then calling get_view()
for the whole image.
vil_image_view_base_sptr vil_load(char const* file) { vil_image_resource_sptr data = vil_load_image_resource(file); if (!data) return 0; return data -> get_view(); } |
To set image pixels, you call the resource's put_view()
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When developing an image processing algorithm, first write your algorithm in terms of a function for
vil_image_view<T>
. Then, if you need it, write the vil_image_resource_sptr version, using thevil_image_view<T>
version to do the actual pixel manipulation.
vil_image_view<T>
is designed for playing with actual pixel values.
vil_image_resource
derivatives are designed to handle all the other stuff
associated with images, e.g. choosing pixel types at runtime,
splitting an image into blocks so that it fits in memory,
dealing with the arbitrary and complex hassles of image IO.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
vil_memory_image
to ignore pixel type. As explained above, you should be using vil_image_view<T>
to
actually manipulate your pixels. However, in some parts of your
code, you may want to pass images around without having to decide
the pixel type at compile time. This is a role for a
vil_image_resource
derivative, in particular the
vil_memory_image
. You can wrap an existing
vil_image_view<T>
in a vil_memory_image
by calling
vil_new_image_resource_of_view()
. Reference counting keeps track
of the underlying data in memory, so you can let the original
view go out of scope without loss.
It may be tempting to use the vil_image_view_base_sptr
for this
purpose instead. That type is only intended for internal use by vil, and
it will almost certainly not behave as you want.
The vil_image_resource
API has been designed to allow efficient
access to vil_memory_image
. In the example
below, if the image resource passed in is really a
vil_memory_image
, the get_view()
returns a view
to the underlying data, so no unneeded data copying happens.
Similarly, a call to put_view()
, can return almost immediately,
checking only to confirm that the view is still pointing to
the same underlying data.
void display_view(vil_image_resource_sptr &ir) { switch (ir->pixel_format()) { case VIL_PIXEL_FORMAT_BYTE: { vil_image_view<vxl_byte> v1 = ir->get_view(); display_byte(v1); } case ... } } |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
vil_image_view
uses a pointer arithmetic style of indexing.
The image data is assumed to be a regularly arranged set of
pixels in memory. The view keeps a pointer to the pixel
at the origin. It also keeps the pointer difference to
get to the next pixel to the right, the next pixel down,
and the same pixel in the next plane.
In a general image representation a 2d image
consists of multiple planes each containing multiple rasters (rows)
each containing multiple pixels, and each pixel contains multiple
components. The planes and the components are used for the same
purpose, to represent different spectral or functional values (e.g. the
red, green and blue channels of an RGB image.) In vil it is usually
assumed that an image cannot have both multiple planes and multiple
components per pixel. This allows
vil_image_view
to view the same a colour image data
as either a 3 plane image or a 1 plane RGB image. You can do this
explicitly by calling vil_view_as_planes()
or
vil_view_as_rgb()
. So the following example will print the
same value twice.
// Assume that x.png is an rgb byte image. vil_image_view<vil_rgb<vxl_byte> > im = vil_load("x.png"); vil_image_view<vxl_byte> im2 = vil_view_as_planes(im); vcl_cout << (int) (im(3,4).r) << vcl_endl; vcl_cout << (int) (im(3,4,0)) << vcl_endl; |
vil_view_as_planes()
and vil_view_as_rgb()
are
actually redundant, and simple assingment will do. In the above
example the conversion can be achieved by
vil_image_view<vxl_byte> im2 = im
You should bear in mind that the component-wise and plane-wise representations are not equal. The multi-plane representation is more general than the RGB multi-component one. If the underlying data is actually stored RRRR..GGGG..BBBB.. then it is not possible to view that image as a single plane of RGB pixels. For this reason, a lot of vil prefers to view an image as multi-plane single-component. In particular, the vil_image_resource derivatives in vil, will treat all images as multi-plane, scalar component images, whether the underlying data is RGBRGBRGB... or RRRR..GGGG..BBBB.. This means if you have switch statement to deal with pixel types in an normal image resource, you need not worry about any types other than than the following
Similarly to the planes to components conversion
it is possible to perform a whole range of other manipulations. These
include vil_transpose()
, vil_flip_ud()
,
vil_decimate()
, vil_crop()
.
One further advantage of the arithmetic indexing scheme is that
it becomes easy to create a 2d slice view of a 3d image.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Several image processing functions can be found in the algo subdirectory of vil. Lets look at an example of finding the image gradient using a Sobel filter.
#include <vcl_iostream.h> #include <vxl_config.h> // for vxl_byte #include <vil/vil_image_view.h> #include <vil/vil_print.h> #include <vil/algo/vil_sobel_3x3.h> int main() { unsigned ni=8; unsigned nj=15; unsigned nplanes=1; vil_image_view<vxl_byte> image(ni,nj,nplanes); for (unsigned p=0;p<nplanes;++p) for (unsigned j=0;j<nj;++j) for (unsigned i=0;i<ni;++i) image(i,j,p) = vxl_byte(i+10*j+100*p); vcl_cout<<"Original image:"<<vcl_endl; vil_print_all(vcl_cout,image); // Objects to hold gradients vil_image_view<float> grad_i,grad_j; vil_sobel_3x3(image,grad_i,grad_j); vcl_cout<<vcl_endl <<"Sobel I Gradient:"<<vcl_endl; vil_print_all(vcl_cout,grad_i); vcl_cout<<vcl_endl <<"Sobel J Gradient:"<<vcl_endl; vil_print_all(vcl_cout,grad_j); return 0; } |
There are also algorithms to perform image arithmetic, smoothing, general 1D and 2D convolution, morphological operations, interpolation, and much more.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section explores the major differences between using the old vil1 and using vil, and some of the implications for converting existing code.
The first and most obvious difference is that whilst there is a broad equivalent to
vil1_image
, and its descendants, this class tree has been split in two. The abstract
vil1_image
is now replaced with a smart pointer to a vil_image_resource
.
The concrete vil1_memory_image_of<T>
is now a vil_image_view<T>
.
Whereas previously, you might have written code in terms of vil1_image
, it now
usually makes sense to write most image manipulations in terms of
vil_image_view<T>
s. With the old vil1_image
, you either had to
do a get_section
and operate on raw memory, or do a messy switch statement to cast
it to its underlying vil_memory_image_of<T>
type, or do an expensive
vil1_view_as()
conversion.
Now with vil, the vil_image_view<T>
provides a powerful view directly onto
your image in memory.
The vil_image_view
provides such facilities
as compile-time type safety and switchable bounds checking. It also acts as a
sort of canonicaliser. A wide range of actual memory layouts can all be treated
identically and transparently while working through the vil_image_view
.
Previously, in vil1, the image loader often needed read an unblocked resource and to have several filters placed
on top of it to do such things as re-order the raster rows and re-order the
component order. vil doesn't do this, but instead uses the vil_image_view to provide
a canonical view of whatever deranged image format your loader finds most efficient to
use.
The second important change is that vil provides full support for planes. In many cases
accessing different image planes is directly equivalent to accessing different components.
Indeed, it is often preferable to view an image as a multi-planar rather than multi-component.
If your algorithms assume a single plane, it is however trivial to provide a wrapper function
which takes a multi-planar image and passes one plane at a time to your algorithm. This
can be done with virtually no loss in efficiency, and indeed is how some of the code in
vil/algo
is written.
To help convert existing code there is a script (core/vil/scripts/vil1tovil.pl
)
It converts as much code as it can. However, it can really only deal with file and identifier
name changes. There are large structural differences between vil1 and vil, with many of the
equivalent functions taking different parameters. The output of the conversion script
can best be seen as a hint on which types and classes to use and which functions to call.
You will almost certainly need to make extensive further edits to your code to get it
to compile again.
If you do not want to convert any code, but would rather use an interface to convert
between vil1 and vil types at runtime, then take a look at <vil1/vil1_vil.h>
which has a function for converting between vil1_memory_image_of
and vil_image_view
, and
a class that wraps a vil1_image
, and exports a vil_image_resource
interface.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
I'm trying to load a DICOM image, but it doesn't work.
vil_load.cxx
prints an error message that mentions lots of image type but not.dcm
. What's wrong?
The DICOM loader in VXL is not built by default, because it is large and only medical image people want it.
You will need to rerun CMake and find Cache value called VXL_BUILD_DICOM. Turn it on, and rebuild -- it won't need to rebuild everything.
I'm having problems trying to use vil_image_view_base_sptr to process a loaded image without worrying about what type the pixels are.
The designers of vil recommend against using vil_image_view_base_sptr
explicitly -- it is unlikely to behave the way any user might want or
expect. vil never processes pixels independently of their type, and
vil_image_view_base_sptr
is just a smart polymorphic pointer to a
concrete vil_image_view<T>
with some actual pixel type T
.
If you want to convert a loaded image into pixels of a particular type,
use one of the vil_convert
functions
vil_image_view<vxl_byte> view = vil_convert_stretch_range (vxl_byte(), vil_load(my_filename)); |
If you want to store an image in memory without worrying about its pixel
type, See vil_memory_image.
What co-ordinate system does vil use?
Mostly vil does not assume that the i
and j
co-ordinates have any explicit
meaning. Instead, any external meaning to the i
and j
directions is
provided externally by the user. The choice of the letters i
and j
was an
explicit decision to discourage any assuption of a Cartesian reference frame.
However there are a few places where further assumptions need to be made.
When loading an image, the file format generally provides an explicit mapping
to up/down and left/right. In such cases, vil assumes that image(0,0)
is the
top-left-most pixel in the image, that increasing i
moves right, and that
increasing j
moves down. A similar assumption is used by vil_rotate
to provide
a direction to the rotation angle.
If you need an explicit world co-ordinate frame, within which you can embed an image,
then take a look at the vimt library in vxl/contrib/mul/vimt
. That provides
an world-to-image co-ordinates transform, that can be efficiently manipulated to
provide transforms up to projective complexity.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The design of vil_image_view
(being more flexible than the design of vil1,)
and the state of modern optimising compilers (not as good as they
could be,) means that naive use of vil images may not be as fast as it
should be.
The following example shows the original implementation of the image fill method.
template<class T> void vil_image_view<T>::fill(T value) { for (unsigned p=0;p<nplanes_;++p) for (unsigned j=0;j<nj_;++j) for (unsigned i=0;i<ni_;++i) (*this)(i,j,p)= value; } |
This implementation has the advantage of being simple, and easy to test.
In an ideal world the compiler would realise that it doesn't have to recalculate the location of each pixel each step, but instead keep a running pointer to the current pixel location. (Of course, in an ideal world we would be programming using natural language and a microphone.) We can make this optimisation explicit.
template<class T> void vil_image_view<T>::fill(T value) { T* plane = top_left_; for (unsigned p=0; p<nplanes_; ++p, plane+=planestep_) { T* row = plane; for (unsigned j=0; j<nj_; ++j, row+=jstep_) { T* p = row; for (unsigned i=0; i<ni_; ++i, p+=istep_) *p = value; } } } |
This can halve the run time on some compilers.
The most important rule in code optimisation is to observe how the code
behaves in real life, and concentrate your efforts on where the code
spends most of its time. In our example, this means the inner most loop
Now, it turns out that in many cases, istep_==1
, because of the
default image layout in memory. Because of this common case it would
be worth having the compiler generate machine-code for the inner-most
loop in this special case. We can do this by explicitly testing for
such a special case.
template<class T> void vil_image_view<T>::fill(T value) { T* plane = top_left_; if (istep_==1) { for (unsigned p=0;p<nplanes_;++p,plane += planestep_) { T* row = plane-1; for (unsigned j=0;j<nj_;++j,row += jstep_) { int i = ni_ ; while (i>=0) { row[i--]=value; } } } return; } for (unsigned p=0;p<nplanes_;++p,plane += planestep_) { T* row = plane; for (unsigned j=0;j<nj_;++j,row += jstep_) { T* p = row; for (unsigned i=0;i<ni_;++i,p+=istep_) *p = value; } } } |
There are two other optimisations going on here. The first is that
we are using the pointer indexing operator []
. Most compilers
treat while (++i<n) { *(ptr++)=v; }
differently from
while (++i<n) { ptr[i]=v; }
, with the latter often
being significantly faster. This is especially true when ptr
is a pointer to a character sized type.
The other optimisation makes use of the fact that it is faster to count down to
0 than count up to n. This is because it is faster to test against a constant, 0,
than against a variable. Sometimes a compiler figures this out itself,
but by no means always. One useful refinement that may be possible is to
decrement the index counter right at the end of the loop. This allows the
compiler to avoid issuing a separate test instruction, since this sort
of test is automatically performed by the processor after a decrement
or other arithmetic operation.
Since we are performing the same operation on every pixel independent of its absolute or relative position, there is one further optimisation that can be performed. In many cases an image will be stored as a contiguous block of memory. If this is the case, it may make sense just to operate on this block of memory as a single dimensional array. In the case of fill, this may even allow a compiler to issue a specialised single machine instruction which performs the whole fill very very fast. This gives us our final implementation.
template<class T> void vil_image_view<T>::fill(T value) { T* plane = top_left_; if (is_contiguous()) { vil_image_view<T>::iterator it = begin(); vil_image_view<T>::const_iterator end_it = end(); while (it!=end_it) { *it = value; ++it; } return; } if (istep_==1) { for (unsigned p=0;p<nplanes_;++p,plane += planestep_) { T* row = plane-1; for (unsigned j=0;j<nj_;++j,row += jstep_) { int i = ni_; while (i>=0) { row[--i]=value; } } } return; } for (unsigned p=0; p<nplanes_; ++p, plane+=planestep_) { T* row = plane; for (unsigned j=0; j<nj_; ++j, row+=jstep_) { T* p = row; for (unsigned i=0; i<ni_; ++i, p+=istep_) *p = value; } } } |
This optimised version was between two and ten times faster than the original depending on the compiler, image structure, and pixel type.
It should always be born in mind that there is a trade-off in testing for special cases. Each test takes time, and this slows the function down for the non-special cases. Limit yourself to only testing for very common cases that have very significant potential speed improvements.
Finally as with all optimisation - be rigorous in comparing the actual times for your original and optimised code. Run enough experiments to measure the statistical spread to see if your improvements are significant. It is quite common for compiler or processor quirks to make your optimised code slower than the original.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is possible to encounter images that are much larger than available
memory. For example, a commercial satellite image can easily exceed several
gigabytes in size. The situation is even more dire in the case of
ultra high resolution video where up to 16K X 16K pixel resolutions are
feasible at two bytes per pixel. It is clearly not practical to handle
these images as an in-memory vil_image_view
. The use of a
vil_image_resource
to supply small views of the image at a time
is essential, however the overhead in extracting small views from a
large image file can be substantial.
Consider the example of displaying a small image region near the center of the image where the view is zoomed in so that one pixel in the image is mapped to one pixel on the screen. The size of this image patch might be 2K X 1K pixels. In order for the image resource to supply this set of pixels it is necessary to seek past a gigabyte or more of file-resident data to the middle of the image and then pull out the several megabytes of pixels needed to construct the view for display. If the user then wants to pan over a few hundred pixels to view something just off the screen, a full seek and file read must be repeated. Under these circumstances, image viewing performance will be overwhelmingly dominated by disk io bandwidth and seek times.
To mitigate the overhead of disk access, the image can be organized as a set of contiguous rectangular blocks of pixels. Blocks may be randomly scattered within the file, but each block is a contiguous set of pixels. This way, a view can be assembled by seeking to each block in the view and then reading the block efficiently. Typical block size is 512 X 512 or 1024 X 1024 pixels, so that only a few blocks are needed to display regions of interest at full zoom. To gain even more efficiency, the blocks can be managed in a cache so the most of the pixels being displayed on the screen are already in memory. As the user pans to a new location, those blocks that are now off the screen are replaced by new blocks needed to fill in the new region. Thus, the number of blocks that have to actually be read from the file is significantly reduced.
The blocked image resource interface has the following virtual methods in addition to those already defined in the base resource class:
The block size used to store and retrive pixels.
unsigned size_block_i() const unsigned size_block_j() const |
The number of blocks in column and row to contain the image.
unsigned n_block_i() const unsigned n_block_j() const |
Retrieving blocks from the resource. Note that a block is a vil_image_view
and thus ready for use in processing and visualization operations.
vil_image_view_base_sptr get_block( unsigned block_index_i, unsigned block_index_j ) const bool get_blocks(unsigned start_block_i, unsigned end_block_i, unsigned start_block_j, unsigned end_block_j, vcl_vector< vcl_vector< vil_image_view_base_sptr > >& blocks ) const |
This blocking structure is used internally to implement the basic method
get_copy_view(unsigned i0, unsigned n_i, unsigned j0, unsigned n_j) |
It is possible that i0, n_i
and j0, n_j
are not evenly divisible by size_block_i
and
size_block_j
, respectively. In this case the blocks are trimmed
to extract pixels belonging to the specified image view bounds.
In the case of retrieving views near the boundary of the full
image, e.g., n_i=ni(), n_j=nj()
, blocks may lie partially
outside the underlying image. In this case the pixel values in the
block locations lying outside the full image bounds are undefined.
Similar methods are defined for inserting blocked data into the image resource.
bool put_block(unsigned block_index_i, unsigned block_index_j, vil_image_view_base const& view) bool put_blocks(unsigned start_block_i, unsigned end_block_i, unsigned start_block_j, unsigned end_block_j, vcl_vector< vcl_vector< vil_image_view_base_sptr > > const& blocks) |
These methods are used internally to support the virtual
put_view(vil_image_view_base const& im, unsigned i0, unsigned j0) |
method. Note that current vil file-based resources do not support reading and writing on the same open resource. Therefore, a block-oriented image processing algorithm will have an input resource from which blocks are retrieved and an output resource where processed blocks are inserted.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Many of the advantages of blocking can be realized even if the
underlying image resource is not intrinsically blocked. The
vil_blocked_image_facade
wraps around any image resource and
provides the vil_blocked_image_resource
class interface. That
is, the facade is a sub-class of
vil_blocked_image_resource
. Internally, reading and writing
facade block data is implemented using the usual get and put view methods. In
this case the block view dimensions are those defined by the facade
blocking geometry.
One might wonder how this simulation of a blocked image structure
provides any gain in efficiency for pixel access, since the process
relies on an unblocked file format. A significant gain in performance
can be gained by the addition of a cache. The
vil_cached_image_resource
is a sub-class of
vil_blocked_image_resource
and provides an in-memory store for
most recently retrieved blocks. The size of the cache (in number of
blocks) is specified in the constructor:
vil_cached_image_resource(vil_blocked_image_resource_sptr bir, const unsigned cache_size) |
The cache is implemented as a priority queue based on the "age" of a block. The blocks in the queue are given a timestamp as they enter the queue. If a block is retrieved from the cache, then the timestamp is reset to the current time. Otherwise, blocks age as new blocks are entered into the cache. When the cache is full, the oldest block is discarded to make room for a new block. Note that the queue does not participate in writing blocks to a resource.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A blocking capability of a resource can be determined by examining the properties of the resource using the method
bool get_property(char const* tag, void* property_value = 0) const |
Two properties are defined for blocked resources:
vil_property_size_block_i "size_block_i" vil_property_size_block_j "size_block_j" |
To test if a resource supports blocking one can examine the appropriate properties of the resource:
vil_image_resource_sptr imgr = vil_load_image_resource("my_filename"); ... unsigned sbi=0, sbj=0; bool is_blocked = imgr->get_property(vil_property_size_block_i, &sbi) && imgr->get_property(vil_property_size_block_j, &sbj); ... |
If the resource is blocked then is_blocked
will be true and the
variables, sbi, sbj
, contain the blocking structure for the
resource.
The following example shows how to convert an image resource resource to a blocked file resource.
vil_image_resource_sptr imgr = vil_load_image_resource("my_filename"); unsigned size_block_i = 256, size_block_j = 256; vil_blocked_image_resource_sptr bimgr = vil_new_blocked_image_resource("my_blocked_filename", imgr->ni(), imgr->nj(), imgr->nplanes(), imgr->pixel_format(), size_block_i, size_block_j, "tiff"); if (!vil_copy_deep(imgr, bimgr)) { //report trouble ... } ... |
The new resource, bimgr
, will store pixels in square, 256 X
256, blocks. vil_copy_deep
automatically splits the input
resource into strips if the image is too large to fit in
memory. However, to insure proper handing of block boundaries it is
better to wrap the input resource in a facade with the same blocking
structure as the output. That is,
... vil_blocked_image_resource_sptr facr = vil_new_blocked_image_facade(imgr,sbi, sbj); if (!vil_copy_deep(facr, bimgr)) { //report trouble ... } ... |
Currently, the tiff
file format and The National
Image Transmission Format (nitf
) image format provide a
vil_blocked_image_resource
, however the nitf
format does not yet support writing. (NOTE THAT TIFF ONLY SUPPORTS BLOCK DIMENSIONS THAT ARE MULTIPLES OF 16.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As in the previous section on blocked images, the motivation for constructing a pyramid image is to manage large images without having to keep the entire image in memory. Satellite images can easily exceed all available random access memory so it is impossible to display an overview of the image. The blocked image strategy solves the problem of panning through a large image, but it does not solve the problem of zooming between different levels of detail. Even with blocking, the display of a complete overview requires that the entire image must be in memory.
The zooming problem can be solved by constructing a
vil_pyramid_image_resource
. This resource maintains a number of
file-based copies of an image at different resolution scales. The
original image is called the base image. Each reduced resolution image
resource is called a level of the pyramid. Most typically, the levels
of the pyramid differ by a factor of two in scale in each
dimension. The limit of the size of a pyramid as the number of levels
approaches infinity is 1 + 1/4 + .. =1+1/3. Thus, the worst case is
33% extra storage to represent all levels of detail.
It is not necessary to have a fixed scale difference between adjacent
levels of the vil_pyramid_image_resource
. When a user requests
a vil_image_view
at a particular scale, a view from the closest
scale in the pyramid is generated. The interface for getting a view
from a pyramid image is illustrated in the following code example.
In this example, the pyramid is stored as a set of image files in a directory.
#include <vil/vil_load.h> #include <vil/vil_pyramid_image_resource.h> ... { ... vil_pyramid_image_resource_sptr pir = vil_load_pyramid_resource("pyramid_dir"); float actual_scale; vil_image_view<unsigned short> level_view = pir->get_copy_view(0.25f, actual_scale); ... } |
This example shows the basic use of a pyramid resource where a level
view 1/4 the scale of the base image is being retrieved. If the
pyramid doesn't contain a level with a scale factor of exactly 0.25,
the closest scale is returned and the scale of the closest level is
returned in actual_scale
. The level view only requires 1/16 the
number of pixels of the base image and can likely be held entirely in
memory. However, the user of the view has to keep in mind that the
image has been scaled down and must manipulate it appropriately.
For example, in rendering an image to the screen, the screen display scale factor must be compared to the level scale in order to determine the correct rendering scale. Suppose for example that a display screen has 1000 x 1000 elements and the base image of the pyramid is 15,000 by 15,000 pixels. The required rendering scale factor for the base image is 1/15. Suppose that the closest scale level in the pyramid is 1/16. The resulting level view is then rendered at a scale factor of 16/15 in order to fill the screen. Note however that only one million pixels are being processed instead of 225 million.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Level images are formed by subsampling the original base image. In order to do this subsampling properly, it is necessary to observe the limitations imposed by the Nyquist sampling theorem. The sample rate must be greater than twice the highest spatial frequency in the image. Otherwise aliasing will occur, which appears as interference bands in the down-sampled image. The Nyquist sampling rate constraint can be achieved by spatially smoothing the image using a low pass filter. The filter is designed to remove spatial frequencies that exceed one half the sampling rate corresponding to the scale of the pyramid level.
For example, if the base image is being sampled at a scale of 0.5 (every second pixel in each image dimension) then the image must be pre-smoothed to remove spatial frequencies greater than 1/pixel. A simple filter for achieving this requirement is to form the average of the 2x2 pixel neighborhood in the base image corresponding to each pixel in the downsampled image. This smoothing does not remove all the higher spatial frequencies but they are significantly attenuated. Another common approach is to apply a Gaussian low pass smoothing kernel recursively to each level. Gaussian suppression of higher spatial frequencies is superior to block averaging. The Gaussian is cheap to compute since it is separable and can be formed by applying two 1-d convolutions.
The vil_pyramid_image_resource
class provides the simple 2x2
averaging method for generating pyramid levels that are a factor of
two apart in scale. The user can apply more sophisticated sampling
schemes but this method is adequate for display purposes. Each level
is generated accordingly by applying the static method
vil_pyramid_image_resource::decimate
.
#include <vil/vil_load.h> #include <vil/vil_pyramid_image_resource.h> ... { ... vil_image_resource_sptr image; // generate an image at 1/2 the scale image = vil_pyramid_image_resource::decimate(base_image, "level_filename", "tiff"); // the base_image resource was generated previously ... } |
In the current implementation of the decimate method, the pyramid levels are generated as blocked images and so a resource file format that can support blocking must be used. This choice is primarily a matter of decimation processing efficiency and to manage level images that are still too large to fit in memory. In the example, the "tiff" file format is chosen since rectangular block structure is supported. If the input image is blocked then its native block structure is used. Otherwise a default blocking (256 x 256) structure is used.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is necessary to have a file format that can store the multiple
images required for the different resolution levels. The most obvious
approach is to store the images as separate files in a directory.
This format is called vil_pyramid_image_list
and is designated
by the vil_file_format::tag()
, "pyil". There is no
restriction on the format of the level files but applications of the
pyramid are generally more efficient if the base image and the level
files are blocked.
A second option for storing image pyramids is the
vil_tiff_pyramid_resource
with
vil_file_format::tag()
,"ptif". In this case, all the pyramid
levels are saved in a single tiff file. There is no assumed order to
the image headers in the file. The pyramid level scales are sorted by
the resource to provide the required interface. The following example
shows creating an output resource of each type and inserting the level
image resources into each pyramid.
#include <vil/vil_new.h> #include <vil/vil_image_resource.h> #include <vil/vil_pyramid_image_resource.h> ... { // a list of image resources representing the pyramid levels vcl_vector<vil_image_resource_sptr> rescs; ... // Generate a set of resources at multiple scales ... // Construct a new multiple file pyramid resource vil_pyramid_image_resource_sptr pyr_image_list = vil_new_pyramid_image_resource("pyramid_directory", "pyil"); // Construct a new single file tiff pyramid resource vil_pyramid_image_resource_sptr pyr_tiff = vil_new_pyramid_image_resource("pyramid.tif", "ptif"); // Store image_resources into the pyramids for ( vcl_vector<vil_image_resource_sptr>::iterator rit = rescs.begin(); rit != rescs.end(); ++rit) { pry_image_list.put_resource(*rit); pry_tiff.put_resource(*rit); } ... } |
Two methods are provided in vil_new
that generate pyramid
images in either the image list or tiff format,
vil_new_pyramid_image_list_from_base
and
vil_new_pyramid_image_from_base
. The following example demonstrates
the use of each pyramid builder.
{ #include <vil/vil_new.h> #include <vil/vil_image_resource.h> #include <vil/vil_pyramid_image_resource.h> ... vil_image_resource_sptr base_image; // base_image is loaded or constructed ... unsigned number_of_levels = 7; bool copy_base = true; // Generate a pyramid as an image_list (files in a directory) vil_pyramid_image_resource_sptr pyril = vil_new_pyramid_image_list_from_base("pyramid_directory_path", base_image, number_of_levels, copy_base, "tiff", "R"); // Generate a pyramid as a multi-image tiff file vil_pyramid_image_resource_sptr pytif = vil_new_pyramid_image_from_base("pyramid_file.tif" base_image, number_of_levels, "ptif", "temporary_dir_path"); ... } |
In the image list pyramid the user can specify the format of the level
image resource files. In the example the tiff format is specified. The
last argument specifies the base name of the pyramid files, e.g., R0,
R1, ... Rn-1, in the example. The variable copy_base
indicates
whether or not the base image is already in the directory. If not,
then base_image
is copied as a blocked image resource with
default blocking (256 x 256). If a different blocking structure is
desired, the base image can be wrapped in a
vil_blocked_image_facade
resource with the new blocking structure.
For the tiff-based pyramid it is necessary to provide a temporary directory to generate pyramid levels prior to inserting them into the single tiff file. Since the pyramid level images can still be too large for memory, they are constructed as file-based resources.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The National Imagery Transmission Format (NITF) is a highly flexible and complex format for exchanging digital imagery and its support data. Our NITF implementation includes a framework for defining the "tagged record extensions" and "data extension segments" needed by your application. A framework example, along with the capabilities and current limitations, is summarized here.
The following code demonstrates how to define a tagged record extension:
vil_nitf2_tagged_record_definition::define("HISTOA", "Softcopy History") .field("SYSTYPE", "System Type", NITF_STR(20)) .field("PC", "Prior Compression", NITF_STR(12)) .field("PE", "Prior Enhancements", NITF_ENUM(4, vil_nitf2_enum_values() .value("EH08", "Enhanced 8bpp") ... .value("DGHC", "Digitized hardcopy") .value("UNKP", "Unknown") .value("NONE", "None"))) .field("REMAP_FLAG", "System Specific Remap", NITF_INT(1)) .field("LUT_ID", "Data Mapping ID from ESD", NITF_INT(2)) .field("NEVENTS", "Number of Processing Events", NITF_INT(2)) .repeat("NEVENTS", vil_nitf2_fields_definitions() .field("PDATE", "Processing Date and Time", NITF_DAT(14)) .field("PSITE", "Processing Site", NITF_STR(10)) .field("PAS", "Softcopy Processing Application", NITF_STR(10)) .field("NIPCOM", "Number of Image Proc. Comments", NITF_INT(1)) .repeat("NIPCOM", vil_nitf2_field_definitions() .field("IPCOM", "Image Processing Comment", NITF_STR(80))) .field("IBPP", "Image Bit Depth (actual) ", NITF_INT(2)) ...) |
This code enables the contents of record extension "HIST0A" to be parsed; without it, the unrecognized record would be skipped. Repeating field values, such as "IPCOM", above, are represented as vectors. Conditional and variable-length fields are also supported, and C++ functors are used to evaluate expressions involving tags that specify the length or repetition of other tags.
Currently the library can only read, but not write, NITF 2.0 and 2.1 files, and includes the following capabilities:
The following capabilities are not yet implemented:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
VIL can be configured to support the reading of JPEG 2000 image files as well as NITF 2.1 images that are JPEG 2000 compressed. This section describes how to set up this capability.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The decompression is handled by a third party library, ECW JPEG 2000 SDK, developed by ER Mapper. The library can be downloaded from www.ermapper.com, and is currently available under three different licensing schemes:
The VXL wrappers around this library were developed using version 3.1 beta of this SDK and have also been tested using version 3.3 RC2, the latest version available on 4 April 2006.
ER Mapper provides ECW JPEG 2000 SDK with a variety of build systems. As described in the next section, VXL has been configured to use the most common one which yields separate NCSEcw and NCSUtil libraries. Most of the testing has taken place using the dynamically linked versions of these libraries, but the static versions should work too.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Once you have installed and built the ECW JPEG 2000 SDK, you must configure VXL to find it. Specify these three CMAKE variables:
When CMAKE creates your build files it will automatically add the appropriate source files and pre-processor definitions. Once VIL is built, test the JPEG 2000 decompression capability using the test program "test_file_format_read" in project "vil_test_all". If things are set up correctly, the test program will report that these two tests passed:
Note that if you use the dynamically linked version of the ECW JPEG 2000 SDK, your PATH environment variable must contain the /lib directory that contains NCSEcw and NCSUtil.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on May, 1 2013 using texi2html 1.76.