DCT-Based Local Descriptor for Robust Matching and Feature Tracking in Wide Area Motion Imagery


Abstract

We introduce a novel discrete cosine transform-based feature (DCTF) descriptor designed for both robustly matching features in aerial video and tracking features across wide-baseline oblique views in aerial wide area motion imagery (WAMI). Our DCTF descriptor preserves local structure more compactly in the frequency domain by utilizing the mathematical properties of the discrete cosine transform (DCT) and outperforms widely used the spatial-domain feature extraction methods, such as speeded up robust features (SURF) and scale-invariant feature transform (SIFT). The DCTF descriptor can be used in combination with other feature detectors, such as SURF and features from accelerated segment test (FAST), for which we provide experimental results. The performance of DCTF for image matching and feature tracking is evaluated on two city-scale aerial WAMI data sets (ABQ-215 and LA-351) and a synthetic aerial drone video data set digital imaging and remote sensing image generation (Rochester Institute of Technology (RIT)-DIRSIG). DCTF is a compact 120-D descriptor that is less than half the dimensionality of state-of-the-art deep learning-based approaches, such as SuperPoint, LF-Net, and DeepCompare, which requires no learning and is domain-independent. Despite its small size, the DCTF descriptor surprisingly produces the highest image matching accuracies (F1 = 0.76 and ABQ-215), the longest maximum and average feature track lengths, and the lowest tracking error (0.3 pixel, LA-351) compared with both handcrafted and deep learning features.