A stream-based rewrite of the old vidl1 library
Introduction
The purpose of this library is to provide a unified interface reading and writing video in VXL. Unlike the previous vidl library, this library treats video as a data stream and builds directly on the vil image library. The two main classes are vidl_istream for input streams and vidl_ostream for output streams.
This design has the following advantages over the old vidl library that treated videos as container objects.
- Better support for large video files
- Better support for encoding options
- Support for live video streams
- Better integration with vil
Streams vs. Codecs
Various stream subclasses are used to interface with different types of video streams. Note that these are not called codecs. The codec in the old vidl library was somewhat of a misnomer. Each vidl_istream and vidl_ostream subclass is a wrapper for another video API that may or may not provide encoding, and may support many different codecs. For example, vidl_ffmpeg_istream provides a input using the FFMPEG library. The FFMPEG library can open a large variety of video file formats (avi, mpeg, etc.) and decode using an even larger variety of codecs (MPEG2, MPEG4, MJPEG, H.264, Cinepak, etc.).
The video stream design also allows for streams that receive images from live video feeds. This could be from a camera, network video feed, or other source. Thus, video capture and video transcoding are both accomplished by directing an input stream into and output stream.
Stream Types
The following is a list of streams that are currently implemented
- vidl_image_list_istream : treats an ordered list of vil_image_resource_sptr as a video stream. Also supports reading a list of images from disk given a list of filenames or a file glob.
- vidl_image_list_ostream : writes a video to disk as a sequence of numbered image files using any image file format supported by vil.
- vidl_ffmpeg_istream : uses the FFMPEG library to decode many common video file formats.
- vidl_ffmpeg_ostream : uses the FFMPEG library to encode to many common video file formats with many encoding options for video quality and compression.
- vidl_v4l_istream - use a video for Linux input stream
- vidl_dshow_live_istream - use the DirectShow API to stream video directly from camera and frame-grabber devices in Windows using native Windows codecs.
- vidl_dshow_file_istream - use the DirectShow API to encode video files in Windows using native Windows codecs
- vidl_dc1394_istream - use libdc1394 to stream video directly from IEEE 1394 (firewire) based cameras (using Linux/BSD)
The following streams are currently under development
- vidl_v4l_ostream - use a video for Linux output stream
The following streams sound useful but need someone to work on them
- vidl_cmu1394_istream - use the CMU 1394 driver to stream video directly from IEEE 1394 (firewire) based cameras (using Windows)
- vidl_dv1394_istream - use libdv1394 to stream video from DV camcorders over a 1394 DV connection (using Linux/BSD)
- vidl_dv1394_ostream - use libdv1394 to stream video over a 1394 DV connection to some other video device (using Linux/BSD)
Video Frames
The vidl_frame_sptr is data structure used to transport video frame data from a vidl_istream to a vidl_ostream. The vidl_frame contains a pointer to the frame buffer, the image resolution, the pixel format, and may eventually contain a time stamp. A vidl_frame is more restrictive than a vil_image_view since it requires contiguous memory and is limited in pixel data types. However, it also handles many pixels formats not supported by vil such as those with subsampled chrominance (i.e. YUV 4:2:2).
In some cases the vidl_frame data can be wrapped by a vil_image_view, but usually this is not a good idea. The vidl_frame should considered be volatile data structure. The image buffer is only guaranteed to be valid until the next call of vidl_ostream::advance()
. The ostream will keep a pointer to the released frame and then invalidate it before reusing memory. Use the functions in vidl_convert.h to obtain a non-volatile vil_image_view.
Pixel Formats and Conversions
The supported pixel formats are enumerated in vidl_pixel_format. A set of template specializations is used to define a set of traits for each format. A new pixel format may be introduced by simply adding an entry in the table in vidl_pixel_format.h. The traits of a pixel format are used in the generation of conversion routines with templates.
Todo List
- Todo:
- Settle on the stream API. Should the redundant read_frame() be removed? Can we handle all asynchronous streams with the synchronous API and an asynchronous buffering mechanism behind the scenes? The idea being that we want a simple API.
- Todo:
- Add some notion of timestamps to the vidl_frame. It seems like it would be useful to know when each frame was captured (maybe relative to the first frame), especially if you are allowing for dropped frames or capturing non-uniformly in time.
- Todo:
- Improve conversion routines to and from vil_image_views.
- Todo:
- Add support for more parameters in the 1394 camera standard.
- Todo:
- Write a VXL book chapter for vidl (in progress). This is required to promote vidl to core and ultimately replace vidl (which has a very limited book chapter). Also, it would be good to have a user's guide other than the Doxygen generated pages.
- Todo:
- Decide on a standard (maybe XML?) for stream parameter files, and add code to read and write these files. This is related to the factory creation stuff. Should we push to put modern factory creation code in vbl?
- Todo:
- Update vidl/examples/vidl_player to include all implemented streams. We'd like to have a single GUI demo application that can feed any istream to any ostream (provided they have been compiled into vidl).
- Todo:
- Update comments within the code. This includes the "advance but
don't acquire" remark. This was intended for the case of file parsing where one can seek to another frame without decoding it, but is incorrect for live streams. Advanced Copy & paste techniques have spread the comment around ;-).
- Todo:
- A DirectShow ostream should be added.
- Todo:
- We'd like to create a DirectShow filter that replaces the ISampleGrabber currently used in the vidl_dshow_istream family. This would remove the limitations imposed by the use of the ISampleGrabber and potentially could consolidate the file/live_istreams into a single vidl_dshow_istream.