Lecture Notes in Computer Science (ACIVS),
Volume 6915,
pgs. 586--598,
2011
The integral histogram is a recently proposed preprocessing technique to compute histograms of arbitrary rectangular gridded (i.e. image or volume) regions in constant time. We formulate a general parallel version of the the integral histogram and analyse its implementation in Star Superscalar (StarSs). StarSs provides a uniform programming and runtime environment and facilitates the development of portable code for heterogeneous parallel architectures. In particular, we discuss the implementation for the multi-core IBM Cell Broadband Engine (Cell/B.E.) and provide extensive performance measurements and tradeoffs using two different scan orders or histogram propagation methods. For 640×480 images, a tile or block size of 28×28 and 16 histogram bins the parallel algorithm is able to reach greater than real-time performance of more than 200 frames per second.