MPT
A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities.
Implementation for our AAAI 2022 paper: Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking.
Our paper and code will be released soon.