#160: Efficient GPU implementation of the integral histogram

M. Poostchi, K. Palaniappan, F. Bunyak, M. Becchi, and G. Seetharaman

Lecture Notes in Computer Science (ACCV Workshop on Developer-Centered Computer Vision), Volume 7728, pgs. 266--278, 2012

parallelization, gpu, tracking, fmv, motion, features, dod

PlainText, Bibtex, PDF, Google Scholar


The integral histogram for images is an efficient preprocess- ing method for speeding up diverse computer vision algorithms including object detection, appearance-based tracking, recognition and segmenta- tion. Our proposed Graphics Processing Unit (GPU) implementation uses parallel prefix sums on row and column histograms in a cross-weave scan with high GPU utilization and communication-aware data transfer between CPU and GPU memories. Two different data structures and communication models were evaluated. A 3-D array to store binned his- tograms for each pixel and an equivalent linearized 1-D array, each with distinctive data movement patterns. Using the 3-D array with many ker- nel invocations and low workload per kernel was inefficient, highlighting the necessity for careful mapping of sequential algorithms onto the GPU. The reorganized 1-D array with a single data transfer to the GPU with high GPU utilization, was 60 times faster than the CPU version for a 1K × 1K image reaching 49 fr/sec and 21 times faster for 512 × 512 images reaching 194 fr/sec. The integral histogram module is applied as part of the likelihood of features tracking (LOFT) system for video object tracking using fusion of multiple cues.