Effective Multi-sensor Conditioning for Street-view Novel-view Synthesis

June 2026

Overall framework

Abstract

We introduce StreetNVS, a street-view novel-view synthesis framework that jointly conditions video diffusion on sparse LiDAR reprojections, surround-view reference imagery, and calibrated camera poses. The method uses reference-enhanced camera attention with ray-level positional encoding and a staged curriculum for sparse LiDAR conditioning, enabling coherent synthesis along challenging out-of-trajectory camera paths.

Type

Conference paper

Publication

arXiv preprint

Source Themes