Project Page
Adding expressive sound to motion graphics typically requires both audio expertise and tedious manual work. MoSound is an interactive tool that streamlines this process: a vision-language model analyzes the video to detect visual events and suggest sound descriptions, while a motion-tracking interface lets users map object movement—position, velocity, size, etc.—to audio properties such as stereo panning and volume, generating a guide signal that spatially and temporally anchors the final generative sound effect.