[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7. vil: Imaging

Chapter summary: Load images using vil_load. Access them using a vil_image_view<T>.

The vxl image library has evolved from the TargetJr and Manchester Image libraries. As with its predecessors, its primary goals is to provide flexible access to all 2D images, including those too large to fit in the address space of a single program or process, and very powerful and fast access to images in memory. In fact, both cases need similar treatment: even in-core images are assumed to be sufficiently large (say a megabyte) that special care must be taken to avoid unnecessary copying of their data. In both cases, the normal requirements of efficiency and ease-of-use apply. The system must allow:

Beginners to have easy access to an image type. This image type should also be the default image type for an programmer writing image processing code. This image type must be very efficient to use.
Fast access to images on disk, at no more than a 10% speed penalty for operations on images in memory.
Fast loading of subsets of the image data. To look at a small portion of a 10000 by 10000 pixel satellite image, one should not have to load the entire 300 megabytes into memory.
Efficient memory management, both automatic and programmer-mediated. Automatic management is vital during program development, when the code is changing quickly. On the other hand, release builds need the kind of optimisations that only a human can apply.

This vil library is the second VXL image library, and is sometimes referred to as vil2. The original vxl image library vil1 is deprecated.

You can read more about the design philosophy in $VXLSRC/core/vil/notes.html

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1 Loading and saving

Let's look at an example of vil in use. This program makes an image from a disk file, copies it into memory, and prints the pixel at 100,100.

#include <vcl_iostream.h>
#include <vxl_config.h>
#include <vil/vil_rgb.h>
#include <vil/vil_load.h>
#include <vil/vil_image_view.h>

int main()
{
  vil_image_view<vil_rgb<vxl_byte> > img;
  img = vil_load("foo.ppm");
  vcl_cerr << "Pixel 100,100 = " << img(100,100) << vcl_endl;
}

The first interesting line declares img to be an image. vil_image_view is the basic image type. It represents an image in memory about whose structure, size and pixel type we know everything. Hence we need to specify the pixel type at this point.

Now let's skip to the end to explain the pixel access method.

  img(100,100)

This looks up the pixel at position 100,100 and returns its value. The pixel type was defined on the first line to be an rgb of bytes, and that is what will be displayed.

[255 128 128]

Where it matters (such as when loading an image in from disk) it is assumed that the image origin is at the top left of the image.

Finally lets look at the middle line. This consists of two parts. The vil_load function does a lot of work behind the scenes to determine what the image type is, and then load that image into memory. The second part is the assignment which has several special properties.

It does not copy the actual image data. A vil_image_view object is really a view of some underlying data. The view understands where the real image data is in memory and how to interpret it. When you copy a view, you merely copy this interpretation information, not the actual image data. This is important, because often images are very big, and copying is expensive. The underlying image is managed with smart pointers so when the last view to the underlying data is destroyed, the image data will be too.
It can do cheap conversions between different views of the same image. vil_load by default loads the image as 3 planes, with the pixel type as vxl_byte. It is trivial to reconfigure a vil_image_view so that it views the same image data as one plane of rgb pixels. The assignment will automatically do any cheap conversion necessary. You may ask then, how is that we know that the pixel type can be viewed as RGB of bytes? Here, we simply know that our image foo.ppm is this type. In general you can either find out what the pixel type is before you load the image, or you can force it to whatever pixel type you want. The latter may involve a relatively expensive pixel by pixel conversion, so this will not happen automatically.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.1 Loading and saving: The threshold program

Anyway, the usual next step in demonstrating an image handling library is to show thresholding, so let's have a look. This program loads an image into memory, forcing it to RGB byte format, and creates a new image where all pixels greater than a threshold value are set to 0.

#include <vxl_config.h>
#include <vil/vil_rgb.h>
#include <vil/vil_load.h>
#include <vil/vil_save.h>
#include <vil/vil_image_view.h>
#include <vil/vil_convert.h>

int main(int argc, char **argv)
{
  vil_image_view<vil_rgb<vxl_byte> > img;
  img = vil_convert_to_component_order(
          vil_convert_to_n_planes(3,
                                  vil_convert_cast(vxl_byte(),
                                                   vil_load(argv[1]))));

  for (unsigned j = 0; j < img.nj(); ++j)
    for (unsigned i = 0; i < img.ni(); ++i)
      if (img(i,j).r < 200 && img(i,j).g < 200 && img(i,j).b < 200)
        img(i,j) = vil_rgb<vxl_byte>(0,0,0);

  vil_save(img, argv[2]);
  return 0;
}

The call to vil_save sends the modified image in img to disk. The choice of file format is determined automatically from the extension of the filename. If one wants more control, a string can be appended to specify the format, e.g.

  vil_save(buf, argv[2], "jpeg");

Of course, if your user has chosen a name such as "foo.ppm", you'll have a oddly named image.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2 Copying an image

You should know by now that copying vil_image_view objects does not duplicate the data they point to. This allows images to be passed into and out of functions efficiently. It also means that modifying the data in one vil_image_view might change that in another. Take this example

...
vil_image_view<float> a( vil_convert_cast(float(), vil_load("x")) );
vil_image_view<float> b = a;
b(100,100) = 12;
...

After the assignment in line 3, both a(100,100) and b(100,100) are set to the value 12. On the other hand, if we had used vil_copy_deep, thus:

...
vil_image_view<float> a( vil_convert_cast(float(), vil_load("x")) );
vil_copy_deep(a, b);
b(100,100) = 12;
...

...
vil_image_view<float> a( vil_load("x") );
vil_image_view<float> b( vil_copy_deep(a) );
b(100,100) = 12;
...

then a is unchanged after the assignment to b(100,100). Note again that the actual copying is done in vil_copy_deep; when the return value is assigned to b, there is an efficient view copy.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3 Image resources

Broadly there are two sorts of image one is interested in

images in memory, about which everything is known and all parts of which can be accessed directly.
external images (eg in files) which can only be accessed indirectly, or images about which we may currently be missing information (eg pixel type.)

As we have seen the first sort of images are represented by a vil_image_view<T> on the data in memory. For some very large images it is not possible or desirable to load them into memory. In this case it is useful to be able to load in a sub-section of the image, manipulate it, and possible write it out again. Alternatively you may want to pass an image about, and process it without knowing its pixel type. vil supports these second sort of images using vil_image_resource. You cannot create an image resource object directly, instead you use a creation function which returns a smart pointer to the base class vil_image_resource_sptr. When manipulating vil_image_resources it will almost entirely be in terms of vil_image_resource_sptrs. There are several types of image resource, with different creation functions:

Representing an image in a file: e.g. vil_pnm_image, vil_jpeg_image. These are created using vil_load_image_resource(), and vil_new_image_resource().
vil_memory_image: Representing an image in memory This is created using vil_new_image_resource(). Alternatively if you want to wrap an existing view up as a vil_image_resource you can call vil_new_image_resource_of_view()
Representing a filtered version of an image in a file (without loading in memory): e.g. vil_crop_image_resource and vil_decimate_image_resource. These are created using the equivalent functions: vil_crop(), vil_decimate(), etc.
Representing the outcome of an image processing algorithm (see next section) e.g. vil_convolve_1d_resource. These are created using the equivalent functions e.g. vil_convolve_1d().

To actually get some image pixels you call the resource's get_view() or get_copy_view() method. For example, the vil_load() function works by creating a vil_image_resource, and then calling get_view() for the whole image.

vil_image_view_base_sptr vil_load(char const* file)
{
  vil_image_resource_sptr data = vil_load_image_resource(file);
  if (!data) return 0;
  return data -> get_view();
}

To set image pixels, you call the resource's put_view().

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.1 A rule of thumb.

When developing an image processing algorithm, first write your algorithm in terms of a function for vil_image_view<T>. Then, if you need it, write the vil_image_resource_sptr version, using the vil_image_view<T> version to do the actual pixel manipulation.

vil_image_view<T> is designed for playing with actual pixel values. vil_image_resource derivatives are designed to handle all the other stuff associated with images, e.g. choosing pixel types at runtime, splitting an image into blocks so that it fits in memory, dealing with the arbitrary and complex hassles of image IO.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.3.2 Using `vil_memory_image` to ignore pixel type.

As explained above, you should be using vil_image_view<T> to actually manipulate your pixels. However, in some parts of your code, you may want to pass images around without having to decide the pixel type at compile time. This is a role for a vil_image_resource derivative, in particular the vil_memory_image. You can wrap an existing vil_image_view<T> in a vil_memory_image by calling vil_new_image_resource_of_view(). Reference counting keeps track of the underlying data in memory, so you can let the original view go out of scope without loss.

It may be tempting to use the vil_image_view_base_sptr for this purpose instead. That type is only intended for internal use by vil, and it will almost certainly not behave as you want.

The vil_image_resource API has been designed to allow efficient access to vil_memory_image. In the example below, if the image resource passed in is really a vil_memory_image, the get_view() returns a view to the underlying data, so no unneeded data copying happens. Similarly, a call to put_view(), can return almost immediately, checking only to confirm that the view is still pointing to the same underlying data.

void display_view(vil_image_resource_sptr &ir)
{
  switch (ir->pixel_format())
  {
   case VIL_PIXEL_FORMAT_BYTE: {
    vil_image_view<vxl_byte> v1 = ir->get_view();
    display_byte(v1); }
   case ...
  }
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.4 Planes, components and stepping.

vil_image_view uses a pointer arithmetic style of indexing. The image data is assumed to be a regularly arranged set of pixels in memory. The view keeps a pointer to the pixel at the origin. It also keeps the pointer difference to get to the next pixel to the right, the next pixel down, and the same pixel in the next plane.

In a general image representation a 2d image consists of multiple planes each containing multiple rasters (rows) each containing multiple pixels, and each pixel contains multiple components. The planes and the components are used for the same purpose, to represent different spectral or functional values (e.g. the red, green and blue channels of an RGB image.) In vil it is usually assumed that an image cannot have both multiple planes and multiple components per pixel. This allows vil_image_view to view the same a colour image data as either a 3 plane image or a 1 plane RGB image. You can do this explicitly by calling vil_view_as_planes() or vil_view_as_rgb(). So the following example will print the same value twice.

// Assume that x.png is an rgb byte image.
vil_image_view<vil_rgb<vxl_byte> > im = vil_load("x.png");
vil_image_view<vxl_byte> im2 = vil_view_as_planes(im);
vcl_cout << (int) (im(3,4).r) << vcl_endl;
vcl_cout << (int) (im(3,4,0)) << vcl_endl;

vil_view_as_planes() and vil_view_as_rgb() are actually redundant, and simple assingment will do. In the above example the conversion can be achieved by vil_image_view<vxl_byte> im2 = im

You should bear in mind that the component-wise and plane-wise representations are not equal. The multi-plane representation is more general than the RGB multi-component one. If the underlying data is actually stored RRRR..GGGG..BBBB.. then it is not possible to view that image as a single plane of RGB pixels. For this reason, a lot of vil prefers to view an image as multi-plane single-component. In particular, the vil_image_resource derivatives in vil, will treat all images as multi-plane, scalar component images, whether the underlying data is RGBRGBRGB... or RRRR..GGGG..BBBB.. This means if you have switch statement to deal with pixel types in an normal image resource, you need not worry about any types other than than the following

bool
vxl_byte, vxl_sbyte
vxl_int_16, vxl_uint_16
vxl_int_32, vxl_uint_32
float, double
vcl_complex<float>, vcl_complex<double>

Similarly to the planes to components conversion it is possible to perform a whole range of other manipulations. These include vil_transpose(), vil_flip_ud(), vil_decimate(), vil_crop(). One further advantage of the arithmetic indexing scheme is that it becomes easy to create a 2d slice view of a 3d image.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.5 Algorithms and Image Processing

Several image processing functions can be found in the algo subdirectory of vil. Lets look at an example of finding the image gradient using a Sobel filter.

#include <vcl_iostream.h>
#include <vxl_config.h> // for vxl_byte
#include <vil/vil_image_view.h>
#include <vil/vil_print.h>
#include <vil/algo/vil_sobel_3x3.h>

int main()
{
  unsigned ni=8;
  unsigned nj=15;
  unsigned nplanes=1;
  vil_image_view<vxl_byte> image(ni,nj,nplanes);

  for (unsigned p=0;p<nplanes;++p)
    for (unsigned j=0;j<nj;++j)
      for (unsigned i=0;i<ni;++i)
        image(i,j,p) = vxl_byte(i+10*j+100*p);

  vcl_cout<<"Original image:"<<vcl_endl;
  vil_print_all(vcl_cout,image);

  // Objects to hold gradients
  vil_image_view<float> grad_i,grad_j;

  vil_sobel_3x3(image,grad_i,grad_j);

  vcl_cout<<vcl_endl
          <<"Sobel I Gradient:"<<vcl_endl;
  vil_print_all(vcl_cout,grad_i);

  vcl_cout<<vcl_endl
          <<"Sobel J Gradient:"<<vcl_endl;
  vil_print_all(vcl_cout,grad_j);

  return 0;
}

There are also algorithms to perform image arithmetic, smoothing, general 1D and 2D convolution, morphological operations, interpolation, and much more.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.6 Converting from using the old vil1 to vil.

This section explores the major differences between using the old vil1 and using vil, and some of the implications for converting existing code.

The first and most obvious difference is that whilst there is a broad equivalent to vil1_image, and its descendants, this class tree has been split in two. The abstract vil1_image is now replaced with a smart pointer to a vil_image_resource. The concrete vil1_memory_image_of<T> is now a vil_image_view<T>. Whereas previously, you might have written code in terms of vil1_image, it now usually makes sense to write most image manipulations in terms of vil_image_view<T>s. With the old vil1_image, you either had to do a get_section and operate on raw memory, or do a messy switch statement to cast it to its underlying vil_memory_image_of<T> type, or do an expensive vil1_view_as() conversion. Now with vil, the vil_image_view<T> provides a powerful view directly onto your image in memory.

The vil_image_view provides such facilities as compile-time type safety and switchable bounds checking. It also acts as a sort of canonicaliser. A wide range of actual memory layouts can all be treated identically and transparently while working through the vil_image_view. Previously, in vil1, the image loader often needed read an unblocked resource and to have several filters placed on top of it to do such things as re-order the raster rows and re-order the component order. vil doesn't do this, but instead uses the vil_image_view to provide a canonical view of whatever deranged image format your loader finds most efficient to use.

The second important change is that vil provides full support for planes. In many cases accessing different image planes is directly equivalent to accessing different components. Indeed, it is often preferable to view an image as a multi-planar rather than multi-component. If your algorithms assume a single plane, it is however trivial to provide a wrapper function which takes a multi-planar image and passes one plane at a time to your algorithm. This can be done with virtually no loss in efficiency, and indeed is how some of the code in vil/algo is written.

To help convert existing code there is a script (core/vil/scripts/vil1tovil.pl) It converts as much code as it can. However, it can really only deal with file and identifier name changes. There are large structural differences between vil1 and vil, with many of the equivalent functions taking different parameters. The output of the conversion script can best be seen as a hint on which types and classes to use and which functions to call. You will almost certainly need to make extensive further edits to your code to get it to compile again.

If you do not want to convert any code, but would rather use an interface to convert between vil1 and vil types at runtime, then take a look at <vil1/vil1_vil.h> which has a function for converting between vil1_memory_image_of and vil_image_view, and a class that wraps a vil1_image, and exports a vil_image_resource interface.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.7 Frequently Asked Questions

Question 1

I'm trying to load a DICOM image, but it doesn't work. vil_load.cxx prints an error message that mentions lots of image type but not .dcm. What's wrong?

The DICOM loader in VXL is not built by default, because it is large and only medical image people want it.

You will need to rerun CMake and find Cache value called VXL_BUILD_DICOM. Turn it on, and rebuild -- it won't need to rebuild everything.

Question 2

I'm having problems trying to use vil_image_view_base_sptr to process a loaded image without worrying about what type the pixels are.

The designers of vil recommend against using vil_image_view_base_sptr explicitly -- it is unlikely to behave the way any user might want or expect. vil never processes pixels independently of their type, and vil_image_view_base_sptr is just a smart polymorphic pointer to a concrete vil_image_view<T> with some actual pixel type T. If you want to convert a loaded image into pixels of a particular type, use one of the vil_convert functions

  vil_image_view<vxl_byte> view =
    vil_convert_stretch_range (vxl_byte(), vil_load(my_filename));

If you want to store an image in memory without worrying about its pixel type, See vil_memory_image.

Question 3

What co-ordinate system does vil use?

Mostly vil does not assume that the i and j co-ordinates have any explicit meaning. Instead, any external meaning to the i and j directions is provided externally by the user. The choice of the letters i and j was an explicit decision to discourage any assuption of a Cartesian reference frame.

However there are a few places where further assumptions need to be made. When loading an image, the file format generally provides an explicit mapping to up/down and left/right. In such cases, vil assumes that image(0,0) is the top-left-most pixel in the image, that increasing i moves right, and that increasing j moves down. A similar assumption is used by vil_rotate to provide a direction to the rotation angle.

If you need an explicit world co-ordinate frame, within which you can embed an image, then take a look at the vimt library in vxl/contrib/mul/vimt. That provides an world-to-image co-ordinates transform, that can be efficiently manipulated to provide transforms up to projective complexity.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.8 Optimising Image Processing Algorithms (Advanced Topic)

The design of vil_image_view (being more flexible than the design of vil1,) and the state of modern optimising compilers (not as good as they could be,) means that naive use of vil images may not be as fast as it should be.

The following example shows the original implementation of the image fill method.

template<class T>
void vil_image_view<T>::fill(T value)
{
  for (unsigned p=0;p<nplanes_;++p)
    for (unsigned j=0;j<nj_;++j)
      for (unsigned i=0;i<ni_;++i)
        (*this)(i,j,p)= value;
}

This implementation has the advantage of being simple, and easy to test.

In an ideal world the compiler would realise that it doesn't have to recalculate the location of each pixel each step, but instead keep a running pointer to the current pixel location. (Of course, in an ideal world we would be programming using natural language and a microphone.) We can make this optimisation explicit.

template<class T>
void vil_image_view<T>::fill(T value)
{
  T* plane = top_left_;
  for (unsigned p=0; p<nplanes_; ++p, plane+=planestep_)
  {
    T* row = plane;
    for (unsigned j=0; j<nj_; ++j, row+=jstep_)
    {
      T* p = row;
      for (unsigned i=0; i<ni_; ++i, p+=istep_) *p = value;
    }
  }
}

This can halve the run time on some compilers.

The most important rule in code optimisation is to observe how the code behaves in real life, and concentrate your efforts on where the code spends most of its time. In our example, this means the inner most loop Now, it turns out that in many cases, istep_==1, because of the default image layout in memory. Because of this common case it would be worth having the compiler generate machine-code for the inner-most loop in this special case. We can do this by explicitly testing for such a special case.

template<class T>
void vil_image_view<T>::fill(T value)
{
  T* plane = top_left_;

  if (istep_==1)
  {
    for (unsigned p=0;p<nplanes_;++p,plane += planestep_)
    {
      T* row = plane-1;
      for (unsigned j=0;j<nj_;++j,row += jstep_)
      {
        int i = ni_ ;
        while (i>=0) { row[i--]=value; }
      }
    }
    return;
  }

  for (unsigned p=0;p<nplanes_;++p,plane += planestep_)
  {
    T* row = plane;
    for (unsigned j=0;j<nj_;++j,row += jstep_)
    {
      T* p = row;
      for (unsigned i=0;i<ni_;++i,p+=istep_) *p = value;
    }
  }
}

There are two other optimisations going on here. The first is that we are using the pointer indexing operator []. Most compilers treat while (++i<n) { *(ptr++)=v; } differently from while (++i<n) { ptr[i]=v; } , with the latter often being significantly faster. This is especially true when ptr is a pointer to a character sized type. The other optimisation makes use of the fact that it is faster to count down to 0 than count up to n. This is because it is faster to test against a constant, 0, than against a variable. Sometimes a compiler figures this out itself, but by no means always. One useful refinement that may be possible is to decrement the index counter right at the end of the loop. This allows the compiler to avoid issuing a separate test instruction, since this sort of test is automatically performed by the processor after a decrement or other arithmetic operation.

Since we are performing the same operation on every pixel independent of its absolute or relative position, there is one further optimisation that can be performed. In many cases an image will be stored as a contiguous block of memory. If this is the case, it may make sense just to operate on this block of memory as a single dimensional array. In the case of fill, this may even allow a compiler to issue a specialised single machine instruction which performs the whole fill very very fast. This gives us our final implementation.

template<class T>
void vil_image_view<T>::fill(T value)
{
  T* plane = top_left_;

  if (is_contiguous())
  {
    vil_image_view<T>::iterator it = begin();
    vil_image_view<T>::const_iterator end_it = end();
    while (it!=end_it) { *it = value; ++it; }
    return;
  }

  if (istep_==1)
  {
    for (unsigned p=0;p<nplanes_;++p,plane += planestep_)
    {
      T* row = plane-1;
      for (unsigned j=0;j<nj_;++j,row += jstep_)
      {
        int i = ni_;
        while (i>=0) { row[--i]=value; }
      }
    }
    return;
  }

  for (unsigned p=0; p<nplanes_; ++p, plane+=planestep_)
  {
    T* row = plane;
    for (unsigned j=0; j<nj_; ++j, row+=jstep_)
    {
      T* p = row;
      for (unsigned i=0; i<ni_; ++i, p+=istep_) *p = value;
    }
  }
}

This optimised version was between two and ten times faster than the original depending on the compiler, image structure, and pixel type.

It should always be born in mind that there is a trade-off in testing for special cases. Each test takes time, and this slows the function down for the non-special cases. Limit yourself to only testing for very common cases that have very significant potential speed improvements.

Finally as with all optimisation - be rigorous in comparing the actual times for your original and optimised code. Run enough experiments to measure the statistical spread to see if your improvements are significant. It is quite common for compiler or processor quirks to make your optimised code slower than the original.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.9 Blocked Images (Advanced Topic)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.9.1 Basics

It is possible to encounter images that are much larger than available memory. For example, a commercial satellite image can easily exceed several gigabytes in size. The situation is even more dire in the case of ultra high resolution video where up to 16K X 16K pixel resolutions are feasible at two bytes per pixel. It is clearly not practical to handle these images as an in-memory vil_image_view. The use of a vil_image_resource to supply small views of the image at a time is essential, however the overhead in extracting small views from a large image file can be substantial.

Consider the example of displaying a small image region near the center of the image where the view is zoomed in so that one pixel in the image is mapped to one pixel on the screen. The size of this image patch might be 2K X 1K pixels. In order for the image resource to supply this set of pixels it is necessary to seek past a gigabyte or more of file-resident data to the middle of the image and then pull out the several megabytes of pixels needed to construct the view for display. If the user then wants to pan over a few hundred pixels to view something just off the screen, a full seek and file read must be repeated. Under these circumstances, image viewing performance will be overwhelmingly dominated by disk io bandwidth and seek times.

To mitigate the overhead of disk access, the image can be organized as a set of contiguous rectangular blocks of pixels. Blocks may be randomly scattered within the file, but each block is a contiguous set of pixels. This way, a view can be assembled by seeking to each block in the view and then reading the block efficiently. Typical block size is 512 X 512 or 1024 X 1024 pixels, so that only a few blocks are needed to display regions of interest at full zoom. To gain even more efficiency, the blocks can be managed in a cache so the most of the pixels being displayed on the screen are already in memory. As the user pans to a new location, those blocks that are now off the screen are replaced by new blocks needed to fill in the new region. Thus, the number of blocks that have to actually be read from the file is significantly reduced.

The blocked image resource interface has the following virtual methods in addition to those already defined in the base resource class:

The block size used to store and retrive pixels.

unsigned size_block_i() const
unsigned size_block_j() const

The number of blocks in column and row to contain the image.

unsigned n_block_i() const
unsigned n_block_j() const

Retrieving blocks from the resource. Note that a block is a vil_image_view and thus ready for use in processing and visualization operations.

vil_image_view_base_sptr
get_block( unsigned  block_index_i, unsigned  block_index_j ) const

bool
get_blocks(unsigned start_block_i, unsigned end_block_i,
           unsigned  start_block_j, unsigned end_block_j,
           vcl_vector< vcl_vector< vil_image_view_base_sptr > >& blocks ) const

This blocking structure is used internally to implement the basic method

get_copy_view(unsigned i0, unsigned n_i, unsigned j0, unsigned n_j)

It is possible that i0, n_i and j0, n_j are not evenly divisible by size_block_i and size_block_j, respectively. In this case the blocks are trimmed to extract pixels belonging to the specified image view bounds. In the case of retrieving views near the boundary of the full image, e.g., n_i=ni(), n_j=nj(), blocks may lie partially outside the underlying image. In this case the pixel values in the block locations lying outside the full image bounds are undefined.

Similar methods are defined for inserting blocked data into the image resource.

bool put_block(unsigned  block_index_i, unsigned  block_index_j,
               vil_image_view_base const& view)

bool
put_blocks(unsigned start_block_i, unsigned end_block_i,
           unsigned  start_block_j, unsigned end_block_j,
           vcl_vector< vcl_vector< vil_image_view_base_sptr > > const& blocks)

These methods are used internally to support the virtual

put_view(vil_image_view_base const& im, unsigned i0, unsigned j0)

method. Note that current vil file-based resources do not support reading and writing on the same open resource. Therefore, a block-oriented image processing algorithm will have an input resource from which blocks are retrieved and an output resource where processed blocks are inserted.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.9.2 The Facade and Cached Resource

Many of the advantages of blocking can be realized even if the underlying image resource is not intrinsically blocked. The vil_blocked_image_facade wraps around any image resource and provides the vil_blocked_image_resource class interface. That is, the facade is a sub-class of vil_blocked_image_resource. Internally, reading and writing facade block data is implemented using the usual get and put view methods. In this case the block view dimensions are those defined by the facade blocking geometry.

One might wonder how this simulation of a blocked image structure provides any gain in efficiency for pixel access, since the process relies on an unblocked file format. A significant gain in performance can be gained by the addition of a cache. The vil_cached_image_resource is a sub-class of vil_blocked_image_resource and provides an in-memory store for most recently retrieved blocks. The size of the cache (in number of blocks) is specified in the constructor:

vil_cached_image_resource(vil_blocked_image_resource_sptr bir,
                           const unsigned cache_size)

The cache is implemented as a priority queue based on the "age" of a block. The blocks in the queue are given a timestamp as they enter the queue. If a block is retrieved from the cache, then the timestamp is reset to the current time. Otherwise, blocks age as new blocks are entered into the cache. When the cache is full, the oldest block is discarded to make room for a new block. Note that the queue does not participate in writing blocks to a resource.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.9.3 Using Blocked File Formats

A blocking capability of a resource can be determined by examining the properties of the resource using the method

bool get_property(char const* tag, void* property_value = 0) const

Two properties are defined for blocked resources:

vil_property_size_block_i "size_block_i"
vil_property_size_block_j "size_block_j"

To test if a resource supports blocking one can examine the appropriate properties of the resource:

vil_image_resource_sptr imgr = vil_load_image_resource("my_filename");
...
unsigned sbi=0, sbj=0;
bool is_blocked =
  imgr->get_property(vil_property_size_block_i, &sbi) &&
  imgr->get_property(vil_property_size_block_j, &sbj);
...

If the resource is blocked then is_blocked will be true and the variables, sbi, sbj, contain the blocking structure for the resource.

The following example shows how to convert an image resource resource to a blocked file resource.

vil_image_resource_sptr imgr = vil_load_image_resource("my_filename");
unsigned size_block_i = 256, size_block_j = 256;
vil_blocked_image_resource_sptr bimgr =
        vil_new_blocked_image_resource("my_blocked_filename",
                                       imgr->ni(), imgr->nj(), imgr->nplanes(),
                                       imgr->pixel_format(),
                                       size_block_i, size_block_j, "tiff");
if (!vil_copy_deep(imgr, bimgr))
{ //report trouble
 ...
}
...

The new resource, bimgr, will store pixels in square, 256 X 256, blocks. vil_copy_deep automatically splits the input resource into strips if the image is too large to fit in memory. However, to insure proper handing of block boundaries it is better to wrap the input resource in a facade with the same blocking structure as the output. That is,

...
vil_blocked_image_resource_sptr facr =
    vil_new_blocked_image_facade(imgr,sbi, sbj);

if (!vil_copy_deep(facr, bimgr))
{ //report trouble
 ...
}
...

Currently, the tiff file format and The National Image Transmission Format (nitf) image format provide a vil_blocked_image_resource, however the nitf format does not yet support writing. (NOTE THAT TIFF ONLY SUPPORTS BLOCK DIMENSIONS THAT ARE MULTIPLES OF 16.)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.10 Pyramid Images (Advanced Topic)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.10.1 What are Pyramid Images?

As in the previous section on blocked images, the motivation for constructing a pyramid image is to manage large images without having to keep the entire image in memory. Satellite images can easily exceed all available random access memory so it is impossible to display an overview of the image. The blocked image strategy solves the problem of panning through a large image, but it does not solve the problem of zooming between different levels of detail. Even with blocking, the display of a complete overview requires that the entire image must be in memory.

The zooming problem can be solved by constructing a vil_pyramid_image_resource. This resource maintains a number of file-based copies of an image at different resolution scales. The original image is called the base image. Each reduced resolution image resource is called a level of the pyramid. Most typically, the levels of the pyramid differ by a factor of two in scale in each dimension. The limit of the size of a pyramid as the number of levels approaches infinity is 1 + 1/4 + .. =1+1/3. Thus, the worst case is 33% extra storage to represent all levels of detail.

It is not necessary to have a fixed scale difference between adjacent levels of the vil_pyramid_image_resource. When a user requests a vil_image_view at a particular scale, a view from the closest scale in the pyramid is generated. The interface for getting a view from a pyramid image is illustrated in the following code example. In this example, the pyramid is stored as a set of image files in a directory.

#include <vil/vil_load.h>
#include <vil/vil_pyramid_image_resource.h>
...
{
...
 vil_pyramid_image_resource_sptr pir =
                        vil_load_pyramid_resource("pyramid_dir");
 float actual_scale;
 vil_image_view<unsigned short> level_view =
                        pir->get_copy_view(0.25f, actual_scale);
 ...
}

This example shows the basic use of a pyramid resource where a level view 1/4 the scale of the base image is being retrieved. If the pyramid doesn't contain a level with a scale factor of exactly 0.25, the closest scale is returned and the scale of the closest level is returned in actual_scale. The level view only requires 1/16 the number of pixels of the base image and can likely be held entirely in memory. However, the user of the view has to keep in mind that the image has been scaled down and must manipulate it appropriately.

For example, in rendering an image to the screen, the screen display scale factor must be compared to the level scale in order to determine the correct rendering scale. Suppose for example that a display screen has 1000 x 1000 elements and the base image of the pyramid is 15,000 by 15,000 pixels. The required rendering scale factor for the base image is 1/15. Suppose that the closest scale level in the pyramid is 1/16. The resulting level view is then rendered at a scale factor of 16/15 in order to fill the screen. Note however that only one million pixels are being processed instead of 225 million.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.10.2 Subsampling the base image

Level images are formed by subsampling the original base image. In order to do this subsampling properly, it is necessary to observe the limitations imposed by the Nyquist sampling theorem. The sample rate must be greater than twice the highest spatial frequency in the image. Otherwise aliasing will occur, which appears as interference bands in the down-sampled image. The Nyquist sampling rate constraint can be achieved by spatially smoothing the image using a low pass filter. The filter is designed to remove spatial frequencies that exceed one half the sampling rate corresponding to the scale of the pyramid level.

For example, if the base image is being sampled at a scale of 0.5 (every second pixel in each image dimension) then the image must be pre-smoothed to remove spatial frequencies greater than 1/pixel. A simple filter for achieving this requirement is to form the average of the 2x2 pixel neighborhood in the base image corresponding to each pixel in the downsampled image. This smoothing does not remove all the higher spatial frequencies but they are significantly attenuated. Another common approach is to apply a Gaussian low pass smoothing kernel recursively to each level. Gaussian suppression of higher spatial frequencies is superior to block averaging. The Gaussian is cheap to compute since it is separable and can be formed by applying two 1-d convolutions.

The vil_pyramid_image_resource class provides the simple 2x2 averaging method for generating pyramid levels that are a factor of two apart in scale. The user can apply more sophisticated sampling schemes but this method is adequate for display purposes. Each level is generated accordingly by applying the static method vil_pyramid_image_resource::decimate.

#include <vil/vil_load.h>
#include <vil/vil_pyramid_image_resource.h>
...
{
...
  vil_image_resource_sptr image;

// generate an image at 1/2 the scale

 image =
   vil_pyramid_image_resource::decimate(base_image, "level_filename", "tiff");

// the base_image resource was generated previously
...
}

In the current implementation of the decimate method, the pyramid levels are generated as blocked images and so a resource file format that can support blocking must be used. This choice is primarily a matter of decimation processing efficiency and to manage level images that are still too large to fit in memory. In the example, the "tiff" file format is chosen since rectangular block structure is supported. If the input image is blocked then its native block structure is used. Otherwise a default blocking (256 x 256) structure is used.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.10.3 Storing the pyramid resource

It is necessary to have a file format that can store the multiple images required for the different resolution levels. The most obvious approach is to store the images as separate files in a directory. This format is called vil_pyramid_image_list and is designated by the vil_file_format::tag(), "pyil". There is no restriction on the format of the level files but applications of the pyramid are generally more efficient if the base image and the level files are blocked.

A second option for storing image pyramids is the vil_tiff_pyramid_resource with vil_file_format::tag(),"ptif". In this case, all the pyramid levels are saved in a single tiff file. There is no assumed order to the image headers in the file. The pyramid level scales are sorted by the resource to provide the required interface. The following example shows creating an output resource of each type and inserting the level image resources into each pyramid.

#include <vil/vil_new.h>
#include <vil/vil_image_resource.h>
#include <vil/vil_pyramid_image_resource.h>
...
{
// a list of image resources representing the pyramid levels

    vcl_vector<vil_image_resource_sptr> rescs;
...

// Generate a set of resources at multiple scales
...

// Construct a new multiple file pyramid resource

vil_pyramid_image_resource_sptr pyr_image_list =
        vil_new_pyramid_image_resource("pyramid_directory", "pyil");

// Construct a new single file tiff pyramid resource

vil_pyramid_image_resource_sptr pyr_tiff =
        vil_new_pyramid_image_resource("pyramid.tif", "ptif");

// Store image_resources into the pyramids

  for ( vcl_vector<vil_image_resource_sptr>::iterator rit = rescs.begin();
        rit != rescs.end();  ++rit)
  {
    pry_image_list.put_resource(*rit);
    pry_tiff.put_resource(*rit);
  }
...
}

Two methods are provided in vil_new that generate pyramid images in either the image list or tiff format, vil_new_pyramid_image_list_from_base and vil_new_pyramid_image_from_base. The following example demonstrates the use of each pyramid builder.

{
#include <vil/vil_new.h>
#include <vil/vil_image_resource.h>
#include <vil/vil_pyramid_image_resource.h>
...

vil_image_resource_sptr base_image;

// base_image is loaded or constructed
...

 unsigned number_of_levels = 7;
 bool copy_base = true;
// Generate a pyramid as an image_list (files in a directory)
 vil_pyramid_image_resource_sptr pyril =
     vil_new_pyramid_image_list_from_base("pyramid_directory_path",
                                          base_image,
                                          number_of_levels,
                                          copy_base,
                                          "tiff",
                                          "R");

// Generate a pyramid as a multi-image tiff file
 vil_pyramid_image_resource_sptr pytif =
    vil_new_pyramid_image_from_base("pyramid_file.tif"
                                    base_image,
                                    number_of_levels,
                                    "ptif",
                                    "temporary_dir_path");
...
}

In the image list pyramid the user can specify the format of the level image resource files. In the example the tiff format is specified. The last argument specifies the base name of the pyramid files, e.g., R0, R1, ... Rn-1, in the example. The variable copy_base indicates whether or not the base image is already in the directory. If not, then base_image is copied as a blocked image resource with default blocking (256 x 256). If a different blocking structure is desired, the base image can be wrapped in a vil_blocked_image_facade resource with the new blocking structure.

For the tiff-based pyramid it is necessary to provide a temporary directory to generate pyramid levels prior to inserting them into the single tiff file. Since the pyramid level images can still be too large for memory, they are constructed as file-based resources.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.11 NITF image reading

The National Imagery Transmission Format (NITF) is a highly flexible and complex format for exchanging digital imagery and its support data. Our NITF implementation includes a framework for defining the "tagged record extensions" and "data extension segments" needed by your application. A framework example, along with the capabilities and current limitations, is summarized here.

The following code demonstrates how to define a tagged record extension:

vil_nitf2_tagged_record_definition::define("HISTOA", "Softcopy History")
  .field("SYSTYPE",    "System Type",                     NITF_STR(20))
  .field("PC",         "Prior Compression",               NITF_STR(12))
  .field("PE",         "Prior Enhancements",
     NITF_ENUM(4, vil_nitf2_enum_values()
      .value("EH08",   "Enhanced 8bpp")
                    ...
      .value("DGHC",   "Digitized hardcopy")
      .value("UNKP",   "Unknown")
      .value("NONE",   "None")))
  .field("REMAP_FLAG", "System Specific Remap",           NITF_INT(1))
  .field("LUT_ID",     "Data Mapping ID from ESD",        NITF_INT(2))
  .field("NEVENTS",    "Number of Processing Events",     NITF_INT(2))
  .repeat("NEVENTS", vil_nitf2_fields_definitions()
     .field("PDATE",   "Processing Date and Time",        NITF_DAT(14))
     .field("PSITE",   "Processing Site",                 NITF_STR(10))
     .field("PAS",     "Softcopy Processing Application", NITF_STR(10))
     .field("NIPCOM",  "Number of Image Proc. Comments",  NITF_INT(1))
     .repeat("NIPCOM", vil_nitf2_field_definitions()
        .field("IPCOM", "Image Processing Comment",       NITF_STR(80)))
     .field("IBPP",     "Image Bit Depth (actual) ",      NITF_INT(2))
                                                                    ...)

This code enables the contents of record extension "HIST0A" to be parsed; without it, the unrecognized record would be skipped. Repeating field values, such as "IPCOM", above, are represented as vectors. Conditional and variable-length fields are also supported, and C++ functors are used to evaluate expressions involving tags that specify the length or repetition of other tags.

Currently the library can only read, but not write, NITF 2.0 and 2.1 files, and includes the following capabilities:

Files larger than 2GB are supported by building with flag USE_LFS turned on
All four NITF uncompressed data layouts are supported (IMODE=S, B, P, or R)
Most NITF image data types, including 8-, 16-, 32- and 64-bit signed and unsigned integers, single- and double-precision floating point numbers, and boolean data are supported. Support for complex float data is implemented but not tested.
Multiple images per file are supported, as are an arbitrary number of bands per image.
Blocked images (NBPR > 1 or MBPC > 1) are supported.
Images with look-up tables (LUTs) will read correctly, but client applications must query the image header for the LUT and apply it to the image data.
JPEG-2000-compressed imagery is supported via a plug-in, as described in the next section.

The following capabilities are not yet implemented:

writing NITF files
parsing graphic segments
parsing text segments
bounds checking of numeric field values
additional structured field formatters (e.g., some geocoordinate formats)
other compression schemes (e.g., original JPEG, bi-level compression, vector quantization)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.12 JPEG 2000 Support

VIL can be configured to support the reading of JPEG 2000 image files as well as NITF 2.1 images that are JPEG 2000 compressed. This section describes how to set up this capability.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.12.1 Install the library

The decompression is handled by a third party library, ECW JPEG 2000 SDK, developed by ER Mapper. The library can be downloaded from www.ermapper.com, and is currently available under three different licensing schemes:

a "free use" license (even for commercial applications) with the restriction that the compression code supports streams of length 500 MB or less (a moot restriction for VIL, which does not yet support file writing);
a GPL-like "public use" license with no limitations on reading and writing;
a "commercial use" license for commercial applications that require the ability to write JPEG 2000 streams longer than 500 MB.

The VXL wrappers around this library were developed using version 3.1 beta of this SDK and have also been tested using version 3.3 RC2, the latest version available on 4 April 2006.

ER Mapper provides ECW JPEG 2000 SDK with a variety of build systems. As described in the next section, VXL has been configured to use the most common one which yields separate NCSEcw and NCSUtil libraries. Most of the testing has taken place using the dynamically linked versions of these libraries, but the static versions should work too.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.12.2 Configure VXL to use it

Once you have installed and built the ECW JPEG 2000 SDK, you must configure VXL to find it. Specify these three CMAKE variables:

ECW_INCLUDE_DIR: ECW SDK include directory
ECW_ncsecw_LIBRARY: NCSEcw library pathname
ECW_ncsutil_LIBRARY: NCSEcw library pathname

When CMAKE creates your build files it will automatically add the appropriate source files and pre-processor definitions. Once VIL is built, test the JPEG 2000 decompression capability using the test program "test_file_format_read" in project "vil_test_all". If things are set up correctly, the test program will report that these two tests passed:

JPEG 2000 [j2k,jpc]
NITF 2.1 [nitf] (JPEG 2000 compressed)

Note that if you use the dynamically linked version of the ECW JPEG 2000 SDK, your PATH environment variable must contain the /lib directory that contains NCSEcw and NCSUtil.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated on May, 1 2013 using texi2html 1.76.