Overall framework
SS4D is a native 4D generative model for synthesizing dynamic 3D objects from monocular video. It represents motion with structured spacetime latents, combines spatial consistency from image-to-3D priors with temporal reasoning layers, and compresses long latent sequences for efficient 4D generation.