Overall framework
We introduce StreetNVS, a street-view novel-view synthesis framework that jointly conditions video diffusion on sparse LiDAR reprojections, surround-view reference imagery, and calibrated camera poses. The method uses reference-enhanced camera attention with ray-level positional encoding and a staged curriculum for sparse LiDAR conditioning, enabling coherent synthesis along challenging out-of-trajectory camera paths.