2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR),
pgs. 1--6,
2020
We describe Spatial Voxel-Net (SVNet) and Multi-View Voxel-Net (MVNet), a cascade of two novel deep learning architectures for calibrated multi-view stereopsis that reconstructs complicated outdoor 3D models accurately. Both networks use a sequence of RGB images based on ordered camera poses in a coarse-to-fine fashion. SVNet extracts summarized features and analyze the spatial relationship among a block of 3D voxels using 3D convolutions, then predicts block-level occupancy information. MVNet then receives the occupancy information together with RGB images to predict the final voxel-level occupancy information. SMVNet is an end-to-end trainable network, which can reconstruct complex outdoor 3D models and be applied to large-scale datasets in a parallel fashion without the need of estimating or fusing multiple depth maps, typical of other approaches. We evaluated SMVNet on the complex outdoor Tanks and Temples dataset, in which outperformed two well-known state-of-the-art MVS algorithms.