[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

1. Introduction

Chapter summary:
BOXM2 is a VXL accelerated probabilistic volumetric modeling library. BOXM2 is developed in c++ along with a parallel implementation in OpenCL for accelerated performance. While higher-level language C++ implementation is crucial for debugging and development of new algorithms, the OpenCL implementation is required for efficient running of algorithms in real-application systems or demonstrations. The desgin or ideas for this implementation has evolved over past few years and has been assimilated from existing library BOXM and BVXM. BOXM2 is designed mainly to overcome limitations of these libraries and also facilitate parallel development of accelerated library in OpenCL.

The major advantages of BOXM2 libraries over its predecessors are listed below:

A common set of data structures are used for both C++ and OpenCL implementation which facilitates intermixing of C++ code for memory intensive jobs and OpenCL code for computation intensive jobs.
The structure of the scene which is a grid of shallow octrees encoded by bit streams is separated from the data represented at each cell of the volume. This provides flexibility of using the structure of the scene with multiple data streams without loading all different data streams.
This library is designed to handle huge amount of data by processing streams of data. The latency of disk I/O is hidden by using a) cache with LRU or nearest neighbor implementations and b) asynchronous I/O (brl/bbas/baio). The latency of data I/O from main memory to GPU memory is hidden by using double buffering concept.
This library also facilitates representation of multi-resolution data.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

1.1 The Design and Evolution of BOXM2

The predecessors of BOXM2 are BVXM and BOXM. BVXM was developed at Brown university for modeling volumes with fixed grid for multiple modalities. This library was developed using a fixed grid and therefore required hue amount of storage which was later mitigated by using octrees in BOXM. BOXM implemented octrees for modeling volumetric data but had few limitations such as data stroed inside the tree structure leading to inefficient implementations.

Design goals:

Accelerated Implementation: The modeling and other algorithms developed using 3-d volumetric representation are computationally intensive but highly parallel. During this age of GPU rush, such parallel implementations tend to achive great efficiency and was the main reason to include OpenCL implementation. The choice of OpenCL over CUDA was mainly governed by the ability of OpenCL to run on GPUs, CPUs and more hardware resources. BOXM2 provides a parallel implementation in OpenCL to the algorithms in C++. A single set of data and code structure cater to both C++ and OpenCL implementations.
Separation of data and octrees: One of the major drawbacks of BOXM was mixed data structure of data and octree. This would lead to limited use of octrees. The octree would have to be instantiated with know data types and would rule out a possibility of removing or adding another data type after the instantiation. BOXM2 has separated out the octrees from data by maintaining two different streams: (i) block streams and (ii) data streams. The block streams contain information about the scene structure and octrees with an index of the data stream stored at each cell of the octree. The data streams are just arrays of data and index of the respective data elements are stored in the block streams. This provides flexibility of adding or removing as well as selective processing of data streams at any given time.
Streaming capabilities: One of the foremost capability of this library is to deal with huge amount of data (order of hundred of GBs). This requires huge amount of data to be read from the hard drives which would impact the efficiency of the algorithm. BOXM2 hides such latency by allowing for asynchronous I/O accompanied with smart caching algorithms. The synchronous I/O has been implemented for windows, Posix and Mac Os platforms. The caching algorithms can be implemented specifically or generically depending on the application.

liboverview

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

1.2 OpenCL implementation

OpenCL (Open Computing Language) is a programming framework for heterogeneous platform consisting of CPUs, GPUs, and other hardware resources. OpenCL is based on C99 for writing functions. OpenCL provides parallel computing using task-based and data-based parallelism based on SIMD architecture. It has been adopted by several vendors such AMD, Nvdia (only GPU), Intel, IBM and many more. OpenCL biggest competitor CUDA has been developed by Nvidia but is only limited to GPUs as well is closed design.

BOXM2 mirror c++ implementation in OpenCL for efficiency purposes. A distinct library brl/bbas/bocl handles most of the OpenCL API calls so as to allow users to program platform independent. The OpenCL implementation also comprises of a cache and use a parallel scheduler system to hide the latency of data transfer form CPU memory to GPU memory in case of GPU implementation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

1.3 Scene Data structure

The scene ( or volume ) to be modeled is broken down into numerous blocks. These blocks can be laid out in sparse or dense 3-d matrix. This kind of setup allows users to model chunks of the volume over a wide area. These blocks further consists of a 3-d grid of shallow octrees. This representation could be more efficient by have a 3-d octree of shallow octrees instead of 3-d grid which will be implemented in the future. The shallow octrees are encoded unconventionally as bit-streams for efficient storage and computation.

The root of each octree stores an absolute index of the data in its respective array. Note that all different data elements can be accessed by the same index. This separation of data and octrees leads to flexibility in using multi-modalities.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

1.4 Asynchronous I/O

One of the major advantages of BOXM2 is to hide the latency of disk I/O by using single threaded asynchronous I/O. The asynchronous I/O has been implemented for Windows, Posix and Mac OS platforms and platform implementations are hidden using the library "baio". The cache loads the nearest blocks asynchronously without blocking execution of the program and assists in hiding disk I/O latency.

streamingoverview