Effective Multi-sensor Conditioning for Street-view Novel-view Synthesis

Overall framework

Abstract

We introduce StreetNVS, a street-view novel-view synthesis framework that jointly conditions video diffusion on sparse LiDAR reprojections, surround-view reference imagery, and calibrated camera poses. The method uses reference-enhanced camera attention with ray-level positional encoding and a staged curriculum for sparse LiDAR conditioning, enabling coherent synthesis along challenging out-of-trajectory camera paths.

Publication
arXiv preprint
Tong WU 吴桐
Tong WU 吴桐
Assistant Professor @ Fudan

My research interests include 3d vision, long-tailed recognition, and robustness.

Related