vidl : Video Stream Library

A stream-based rewrite of the old vidl1 library

Introduction

The purpose of this library is to provide a unified interface reading and writing video in VXL. Unlike the previous vidl library, this library treats video as a data stream and builds directly on the vil image library. The two main classes are vidl_istream for input streams and vidl_ostream for output streams.

This design has the following advantages over the old vidl library that treated videos as container objects.

Streams vs. Codecs

Various stream subclasses are used to interface with different types of video streams. Note that these are not called codecs. The codec in the old vidl library was somewhat of a misnomer. Each vidl_istream and vidl_ostream subclass is a wrapper for another video API that may or may not provide encoding, and may support many different codecs. For example, vidl_ffmpeg_istream provides a input using the FFMPEG library. The FFMPEG library can open a large variety of video file formats (avi, mpeg, etc.) and decode using an even larger variety of codecs (MPEG2, MPEG4, MJPEG, H.264, Cinepak, etc.).

The video stream design also allows for streams that receive images from live video feeds. This could be from a camera, network video feed, or other source. Thus, video capture and video transcoding are both accomplished by directing an input stream into and output stream.

Stream Types

The following is a list of streams that are currently implemented

The following streams are currently under development

The following streams sound useful but need someone to work on them

Video Frames

The vidl_frame_sptr is data structure used to transport video frame data from a vidl_istream to a vidl_ostream. The vidl_frame contains a pointer to the frame buffer, the image resolution, the pixel format, and may eventually contain a time stamp. A vidl_frame is more restrictive than a vil_image_view since it requires contiguous memory and is limited in pixel data types. However, it also handles many pixels formats not supported by vil such as those with subsampled chrominance (i.e. YUV 4:2:2).

In some cases the vidl_frame data can be wrapped by a vil_image_view, but usually this is not a good idea. The vidl_frame should considered be volatile data structure. The image buffer is only guaranteed to be valid until the next call of vidl_ostream::advance(). The ostream will keep a pointer to the released frame and then invalidate it before reusing memory. Use the functions in vidl_convert.h to obtain a non-volatile vil_image_view.

Pixel Formats and Conversions

The supported pixel formats are enumerated in vidl_pixel_format. A set of template specializations is used to define a set of traits for each format. A new pixel format may be introduced by simply adding an entry in the table in vidl_pixel_format.h. The traits of a pixel format are used in the generation of conversion routines with templates.

Todo List

Todo:
Settle on the stream API. Should the redundant read_frame() be removed? Can we handle all asynchronous streams with the synchronous API and an asynchronous buffering mechanism behind the scenes? The idea being that we want a simple API.
Todo:
Add some notion of timestamps to the vidl_frame. It seems like it would be useful to know when each frame was captured (maybe relative to the first frame), especially if you are allowing for dropped frames or capturing non-uniformly in time.
Todo:
Improve conversion routines to and from vil_image_views.
Todo:
Add support for more parameters in the 1394 camera standard.
Todo:
Write a VXL book chapter for vidl (in progress). This is required to promote vidl to core and ultimately replace vidl (which has a very limited book chapter). Also, it would be good to have a user's guide other than the Doxygen generated pages.
Todo:
Decide on a standard (maybe XML?) for stream parameter files, and add code to read and write these files. This is related to the factory creation stuff. Should we push to put modern factory creation code in vbl?
Todo:
Update vidl/examples/vidl_player to include all implemented streams. We'd like to have a single GUI demo application that can feed any istream to any ostream (provided they have been compiled into vidl).
Todo:
Update comments within the code. This includes the "advance but don't acquire" remark. This was intended for the case of file parsing where one can seek to another frame without decoding it, but is incorrect for live streams. Advanced Copy & paste techniques have spread the comment around ;-).
Todo:
A DirectShow ostream should be added.
Todo:
We'd like to create a DirectShow filter that replaces the ISampleGrabber currently used in the vidl_dshow_istream family. This would remove the limitations imposed by the use of the ISampleGrabber and potentially could consolidate the file/live_istreams into a single vidl_dshow_istream.