TY - GEN
T1 - ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo
AU - Weilharter, Rafael
AU - Fraundorfer, Friedrich
PY - 2022/8/25
Y1 - 2022/8/25
N2 - We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.
AB - We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset. Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark.
KW - Training
KW - Three-dimensional displays
KW - Costs
KW - Memory management
KW - Neural networks
KW - Graphics processing units
KW - Deep architecture
UR - http://www.scopus.com/inward/record.url?scp=85143627578&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956633
DO - 10.1109/ICPR56361.2022.9956633
M3 - Conference paper
SN - 978-1-6654-9063-4
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3557
EP - 3563
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - ACM/IEEE
T2 - 26th International Conference on Pattern Recognition
Y2 - 21 August 2022 through 25 August 2022
ER -