Look Who’s Talking: Active Speaker Detection in the Wild

Clova AI Research

Last update: Dec 8, 2022

Related tags

Deep Learning lookwhostalking

Overview

Look Who's Talking: Active Speaker Detection in the Wild

Dependencies

pip install -r requirements.txt

In addition to the Python dependencies, ffmpeg must be installed on the system.

Instructions

First, download the videos to $DATA_DIR/original.

Run the following to convert the videos and visualise the labels.

python3 run_convert.py --data_dir $DATA_DIR
python3 run_visualize.py --data_dir $DATA_DIR

Citation

Please cite the following if you make use of the code.

@inproceedings{kim2021you,
  title={Look Who's Talking: Active Speaker Detection in the Wild},
  author={Kim, You Jin and Heo, Hee-Soo Heo and Choe, Soyeon and Chung, Soo-Whan and Kwon, Yoohwan and Lee, Bong-Jin and Kwon, Youngki and Chung, Joon Son},
  booktitle={Interspeech},
  year={2021}
}

License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

You might also like...

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) 🔥 If DaGAN is helpful in your photos/projects, please hel

503 Jan 4, 2023

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

269 Dec 21, 2022

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

MI-AOD Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection (The PDF is not available tem

269 Dec 21, 2022

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

437 Dec 30, 2022

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Table of Content Introduction Datasets Getting Started Requirements Usage Example Training & Evaluation CPM: Color-Pattern Makeup Transfer CPM is a ho

248 Dec 13, 2022

The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

136 Dec 29, 2022

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

SEDE SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description

83 Nov 11, 2022

Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

53 Dec 24, 2022

Robust Partial Matching for Person Search in the Wild

APNet for Person Search Introduction This is the code of Robust Partial Matching for Person Search in the Wild accepted in CVPR2020. The Align-to-Part

36 Dec 18, 2022

Comments

Some videos are unavailable

Hello~ Thanks for your great job~ But I found some videos are unavailable, can you provide them? I think missing those files may lead to unfair comparsion. (209 videos have been sucessfully downloaded.)

opened by liutaocode 0

Look Who’s Talking: Active Speaker Detection in the Wild

Related tags

Overview

Look Who's Talking: Active Speaker Detection in the Wild

Dependencies

Instructions

Citation

License

You might also like...

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Code release of paper "Deep Multi-View Stereo gone wild"

Robust Partial Matching for Person Search in the Wild

Comments

Some videos are unavailable

Owner

Clova AI Research

This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence