#298: Deep realistic novel view generation for city-scale aerial images


Abstract

In this paper we introduce a novel end-to-end framework for generation of large, aerial, city-scale, realistic synthetic image sequences with associated accurate and precise camera metadata. The two main purposes for this data are (i) to enable objective, quantitative evaluation of computer vision algorithms and methods such as feature detection, description, and matching or full computer vision pipelines such as 3D reconstruction; and (ii) to supply large amounts of high quality training data for deep learning guided computer vision methods. The proposed framework consists of three main modules, a 3D voxel renderer for data generation, a deep neural network for artifact removal, and a quantitative evaluation module for Multi-View Stereo (MVS) as an example. The 3D voxel renderer enables generation of seen or unseen views of a scene from arbitrary camera poses with accurate camera metadata parameters. The artifact removal module proposes a novel edge-augmented deep learning network with an explicit edgemap processing stream to remove image artifacts while preserving and recovering scene structures for more realistic results. Our experiments on two urban, city-scale, aerial datasets for Albuquerque (ABQ), NM and Los Angeles (LA), CA show promising results in terms of structural similarity to real data and accuracy of reconstructed 3D point clouds.