VMT
Video-Music Transformer (VMT) is an attention-based multi-modal model, which generates piano music for a given video.
Paper
https://arxiv.org/abs/2112.15320
Demo
Here is an example video fragments from our dataset. Note that we do not do any post-production. Each file is made from the original video with a WAVE file converted from the MIDI of the model output.
Original
The original music of the videos.
100-001_original.mp4
VMT
The music generated by our VMT model.
100-001_vmt.mp4
Seq2Seq
The musics generated by the baseline Seq2Seq model.