Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

Anil Armagan; Martin Hirzer; Peter M. Roth; Vincent Lepetit

Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

Anil Armagan, Martin Hirzer, Peter M. Roth, Vincent Lepetit

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.

Original language	English
Title of host publication	Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Publication status	Published - 2017
Event	2017 IEEE Conference on Computer Vision and Pattern Recognition: CVPR 2017 - Honolulu, United States Duration: 21 Jul 2017 → 26 Jul 2017

Conference

Conference	2017 IEEE Conference on Computer Vision and Pattern Recognition
Abbreviated title	CVPR 2017
Country/Territory	United States
City	Honolulu
Period	21/07/17 → 26/07/17

Cite this

@inproceedings{98985536ebc64a82a3828357da43a409,

title = "Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization",

abstract = "We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.",

author = "Anil Armagan and Martin Hirzer and Roth, {Peter M.} and Vincent Lepetit",

year = "2017",

language = "English",

booktitle = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",

note = "2017 IEEE Conference on Computer Vision and Pattern Recognition : CVPR 2017, CVPR 2017 ; Conference date: 21-07-2017 Through 26-07-2017",

}

TY - GEN

T1 - Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

AU - Armagan, Anil

AU - Hirzer, Martin

AU - Roth, Peter M.

AU - Lepetit, Vincent

PY - 2017

Y1 - 2017

N2 - We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.

AB - We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.

M3 - Conference paper

BT - Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

T2 - 2017 IEEE Conference on Computer Vision and Pattern Recognition

Y2 - 21 July 2017 through 26 July 2017

ER -

Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

Abstract

Conference

Fingerprint

Cite this