#102: Parallel implementation of the integral histogram

P. Bellens, K. Palaniappan, R. M. Badia, G. Seetharaman, and J. Labarta

Lecture Notes in Computer Science (ACIVS), Volume 6915, pgs. 586--598, 2011

parallelization, gpu, tracking, fmv, motion, features, dod

PlainText, Bibtex, PDF, URL, DOI, Google Scholar


The integral histogram is a recently proposed preprocessing technique to compute histograms of arbitrary rectangular gridded (i.e. image or volume) regions in constant time. We formulate a general parallel version of the the integral histogram and analyse its implementation in Star Superscalar (StarSs). StarSs provides a uniform programming and runtime environment and facilitates the development of portable code for heterogeneous parallel architectures. In particular, we discuss the implementation for the multi-core IBM Cell Broadband Engine (Cell/B.E.) and provide extensive performance measurements and tradeoffs using two different scan orders or histogram propagation methods. For 640×480 images, a tile or block size of 28×28 and 16 histogram bins the parallel algorithm is able to reach greater than real-time performance of more than 200 frames per second.