Overall framework
UniHand formulates hand motion estimation and generation as conditional motion synthesis. It aligns heterogeneous inputs such as MANO parameters, 2D skeletons, and visual observations into a shared latent space, then uses a latent diffusion model to generate consistent hand motion sequences under diverse controls.