HighRes-MVSNet: A Fast Multi-View Stereo Network for Dense 3D Reconstruction from High-Resolution Images

Rafael Weilharter; Friedrich Fraundorfer

doi:10.1109/ACCESS.2021.3050556

HighRes-MVSNet: A Fast Multi-View Stereo Network for Dense 3D Reconstruction from High-Resolution Images

Rafael Weilharter^*, Friedrich Fraundorfer

^*Corresponding author for this work

Institute of Computer Graphics and Vision (7100)

Research output: Contribution to journal › Article › peer-review

Abstract

We propose an end-to-end deep learning architecture for 3D reconstruction from high-resolution images. While many approaches focus on improving reconstruction quality alone, we primarily focus on decreasing memory requirements in order to exploit the abundant information provided by modern high-resolution cameras. Towards this end, we present HighRes-MVSNet, a convolutional neural network with a pyramid encoder-decoder structure searching for depth correspondences incrementally over a coarse-to-fine hierarchy. The first stage of our network encodes the image features to a much smaller resolution in order to significantly reduce the memory requirements. Additionally, we limit the depth search range in every hierarchy level to the vicinity of the previous prediction. In this manner, we are able to produce highly accurate 3D models while only using a fraction of the GPU memory and runtime of previous methods. Although our method is aimed at much higher resolution images, we are still able to produce state-of-the-art results on the Tanks and Temples benchmark and achieve outstanding scores on the DTU benchmark.

Original language	English
Article number	9319163
Pages (from-to)	11306-11315
Number of pages	10
Journal	IEEE Access
Volume	9
DOIs	https://doi.org/10.1109/ACCESS.2021.3050556
Publication status	Published - 2021

Keywords

Convolutional neural network
dense 3D reconstruction
multi-view stereo

ASJC Scopus subject areas

Computer Science(all)
Materials Science(all)
Engineering(all)

Access to Document

10.1109/ACCESS.2021.3050556Licence: CC BY 4.0

Cite this

@article{1b04dc110d8c437eb551269a10996b15,

title = "HighRes-MVSNet: A Fast Multi-View Stereo Network for Dense 3D Reconstruction from High-Resolution Images",

abstract = "We propose an end-to-end deep learning architecture for 3D reconstruction from high-resolution images. While many approaches focus on improving reconstruction quality alone, we primarily focus on decreasing memory requirements in order to exploit the abundant information provided by modern high-resolution cameras. Towards this end, we present HighRes-MVSNet, a convolutional neural network with a pyramid encoder-decoder structure searching for depth correspondences incrementally over a coarse-to-fine hierarchy. The first stage of our network encodes the image features to a much smaller resolution in order to significantly reduce the memory requirements. Additionally, we limit the depth search range in every hierarchy level to the vicinity of the previous prediction. In this manner, we are able to produce highly accurate 3D models while only using a fraction of the GPU memory and runtime of previous methods. Although our method is aimed at much higher resolution images, we are still able to produce state-of-the-art results on the Tanks and Temples benchmark and achieve outstanding scores on the DTU benchmark. ",

keywords = "Convolutional neural network, dense 3D reconstruction, multi-view stereo",

author = "Rafael Weilharter and Friedrich Fraundorfer",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3050556",

language = "English",

volume = "9",

pages = "11306--11315",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "IEEE Publications",

}

TY - JOUR

T1 - HighRes-MVSNet

T2 - A Fast Multi-View Stereo Network for Dense 3D Reconstruction from High-Resolution Images

AU - Weilharter, Rafael

AU - Fraundorfer, Friedrich

PY - 2021

Y1 - 2021

N2 - We propose an end-to-end deep learning architecture for 3D reconstruction from high-resolution images. While many approaches focus on improving reconstruction quality alone, we primarily focus on decreasing memory requirements in order to exploit the abundant information provided by modern high-resolution cameras. Towards this end, we present HighRes-MVSNet, a convolutional neural network with a pyramid encoder-decoder structure searching for depth correspondences incrementally over a coarse-to-fine hierarchy. The first stage of our network encodes the image features to a much smaller resolution in order to significantly reduce the memory requirements. Additionally, we limit the depth search range in every hierarchy level to the vicinity of the previous prediction. In this manner, we are able to produce highly accurate 3D models while only using a fraction of the GPU memory and runtime of previous methods. Although our method is aimed at much higher resolution images, we are still able to produce state-of-the-art results on the Tanks and Temples benchmark and achieve outstanding scores on the DTU benchmark.

AB - We propose an end-to-end deep learning architecture for 3D reconstruction from high-resolution images. While many approaches focus on improving reconstruction quality alone, we primarily focus on decreasing memory requirements in order to exploit the abundant information provided by modern high-resolution cameras. Towards this end, we present HighRes-MVSNet, a convolutional neural network with a pyramid encoder-decoder structure searching for depth correspondences incrementally over a coarse-to-fine hierarchy. The first stage of our network encodes the image features to a much smaller resolution in order to significantly reduce the memory requirements. Additionally, we limit the depth search range in every hierarchy level to the vicinity of the previous prediction. In this manner, we are able to produce highly accurate 3D models while only using a fraction of the GPU memory and runtime of previous methods. Although our method is aimed at much higher resolution images, we are still able to produce state-of-the-art results on the Tanks and Temples benchmark and achieve outstanding scores on the DTU benchmark.

KW - Convolutional neural network

KW - dense 3D reconstruction

KW - multi-view stereo

UR - http://www.scopus.com/inward/record.url?scp=85099570524&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3050556

DO - 10.1109/ACCESS.2021.3050556

M3 - Article

AN - SCOPUS:85099570524

SN - 2169-3536

VL - 9

SP - 11306

EP - 11315

JO - IEEE Access

JF - IEEE Access

M1 - 9319163

ER -

HighRes-MVSNet: A Fast Multi-View Stereo Network for Dense 3D Reconstruction from High-Resolution Images

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this