ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation

December 2025

Overall framework

Abstract

ViSAudio tackles end-to-end binaural spatial audio generation directly from silent video. It introduces the BiAudio dataset and a conditional flow matching architecture with dual audio branches and a conditional spacetime module for spatially consistent audio generation.

Type

Conference paper

Publication

arXiv preprint

Source Themes