Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Deep Learning MMCoref_Cleaned

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

You might also like...

A Comparative Framework for Multimodal Recommender Systems

Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia

671 Jan 3, 2023

Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

23 Dec 21, 2022

Rethinking the U-Net architecture for multimodal biomedical image segmentation

MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M

308 Jan 5, 2023

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

370 Dec 27, 2022

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework 🖼 Registration of images in different modalities with Deep Learning 🤖

55 Dec 9, 2022

MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

190 Dec 22, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Preprocessed Datasets for our Multimodal NER paper

Unified Multimodal Transformer (UMT) for Multimodal Named Entity Recognition (MNER) Two MNER Datasets and Codes for our ACL'2020 paper: Improving Mult

76 Dec 21, 2022

Comments

Cannot find 'more_img_roi.pt'

Hi I download the processed file in this link (https://drive.google.com/file/d/1lcIxEwBwVGUFpN5pgtffk42JJIAw7qAY/view?usp=sharing), but I cannot find 'more_img_roi.pt'. Best, Xiaowen

opened by XiaowenSun-Lab 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

You might also like...

A Comparative Framework for Multimodal Recommender Systems

Deep Multimodal Neural Architecture Search

Rethinking the U-Net architecture for multimodal biomedical image segmentation

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

MERLOT: Multimodal Neural Script Knowledge Models

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Preprocessed Datasets for our Multimodal NER paper

Comments

Cannot find 'more_img_roi.pt'

CVE-2007-4559 Patch

Patching CVE-2007-4559

Owner

Yichen (William) Huang

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Cross-Document Coreference Resolution

Like Dirt-Samples, but cleaned up

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"