🦅👁️ RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation

Henrique Piñeiro Monteagudo

Verizon Connect

University of Bologna

Leonardo Taccari

Verizon Connect

Aurel Pjetri

Verizon Connect

University of Florence

Francesco Sambo

Verizon Connect

Samuele Salti

University of Bologna

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Gif showing an input sequence and the results provided by a model trained with our method only, and with our method and then finetuned on GT.
Left: RendBEV only training, no GT. Right: finetuned on 100% GT after pretraining with RendBEV

Abstract

Bird’s Eye View (BEV) semantic maps have recently garnered a lot of attention as a useful representation of the environment to tackle assisted and autonomous driving tasks. However, most of the existing work focuses on the fully supervised setting, training networks on large annotated datasets. In this work, we present RendBEV, a new method for the self-supervised training of BEV semantic segmentation networks, leveraging differentiable volumetric rendering to receive supervision from semantic perspective views computed by a 2D semantic segmentation model. Our method enables zero-shot BEV semantic segmentation, and already delivers competitive results in this challenging setting. When used as pretraining to then fine-tune on labeled BEV ground truth, our method significantly boosts performance in low-annotation regimes, and sets a new state of the art when fine-tuning on all available labels.

Method

RendBEV, our method for self-supervised training of BEV semantic segmentation models: we perform a forward pass with a reference view IrI^r as input of the BEV network. We render the semantic semantic segmentation of another view S^k\hat{S}^k, with class probability values lxikl^k_{\mathbf{x_i}} sampled from the BEV prediction B^r\hat{B}^r and densities σxi\sigma_{\mathbf{x}_i} queried from a pretrained frozen model ω\omega that receives the target frame IkI^k as input. We supervise the network with a cross entropy loss computed with the rendered semantic segmentation S^k\hat{S}^k and the target semantic segmentation SkS^k.

RendBEV pipeline.
Visualization of the RendBEV pipeline.

Qualitative Results

Qualitative results of RendBEV on the KITTI dataset.
Qualitative results of RendBEV on the KITTI-360 dataset with different GT data regimes.

Quantitative Results

Quantitative results of RendBEV on the KITTI dataset.
Quantitative results of RendBEV and others on the KITTI-360 dataset with different GT data regimes.

Data

We use the KITTI-360 dataset for training and evaluation, accesible here. We use the Waymo Open Dataset for evaluation only, accesible here. We employ the BEV semantic labels of this datasets provided by the authors of SkyEye. We thank the creators of these datasets for making them publicly accesible.

Acknowledgements

Henrique Piñeiro Monteagudo is supported by the SMARTHEP project, funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

BibTeX citation

    @INPROCEEDINGS{pineiro25rendbev,
  authors = "{Piñeiro Monteagudo, Henrique and Taccari, Leonardo and Pjetri, Aurel and Sambo, Francesco and Salti, Samuele}",
  booktitle={2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, 
  title = "RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation",
  year = "2025",
}