Modern spaceborne synthetic aperture radar (SAR) sensors, such as TerraSAR-X/TanDEM-X and COSMO-SkyMed, can deliver very high resolution (VHR) data beyond the inherent spatial scales (on the order of 1m) of buildings, constituting invaluable data source for large-scale urban mapping. Processing this VHR data with advanced interferometric techniques, such as SAR tomography (TomoSAR), enables the generation of 3-D (or even 4-D) TomoSAR point clouds from space. In this paper, we present a novel and generic workflow that exploits these TomoSAR point clouds in a way that is capable to automatically produce benchmark annotated (buildings/nonbuildings) SAR datasets. These annotated datasets (building masks) have been utilized to construct and train the state-ofthe- A rt deep Fully Convolution Neural Networks with an additional Conditional Random Field represented as a Recurrent Neural Network to detect building regions in a single VHR SAR image. The results of building detection are illustrated and validated over TerraSAR-X VHR spotlight SAR image covering approximately 39 km2 . almost the whole city of Berlin . with mean pixel accuracies of around 93.84%.