Official code repository for "Exploring Neural Models for Query-Focused Summarization"

Salesforce

Last update: Dec 18, 2022

Related tags

Deep Learning query-focused-sum

Overview

Query-Focused Summarization

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

This is a work in progress. Expect additional updates.

Running two-stage models

See extractors directory.

Running Segment Encoder models

See multiencoder directory.

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

34 Nov 16, 2022

The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

49 Dec 1, 2022

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

49 Oct 9, 2022

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Comments

failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str

Hello there, thank you for a great paper and piece of work!

I tried to train multiencoder, but when I try to get raw data from https://github.com/Yale-LILY/QMSum, it seems to have a slightly different format

Failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str but the oroginal data has list of dict

I can hack around and change to format to introduce dummy meeting_id and make it look as expected but I wanted to first check if I am missing something or if there is an cleaner way to do so.

Question is: before running preprocess.py should one just get the jsonl files from https://github.com/Yale-LILY/QMSum or is there additional and different data expected beyond a simple transform to the original data?

Thank you in advance!

opened by md-experiments 2
Bump nltk from 3.6.5 to 3.6.6 in /multiencoder
Bumps nltk from 3.6.5 to 3.6.6.

Changelog

Sourced from nltk's changelog.

Version 3.6.7 2021-12-28

Resolve IndexError in sent_tokenize and word_tokenize (#2922)

Version 3.6.6 2021-12-21

Refactor gensim.doctest to work for gensim 4.0.0 and up (#2914)

Add Precision, Recall, F-measure, Confusion Matrix to Taggers (#2862)

Added warnings if .zip files exist without any corresponding .csv files. (#2908)

Fix FileNotFoundError when the download_dir is a non-existing nested folder (#2910)

Rename omw to omw-1.4 (#2907)

Resolve ReDoS opportunity by fixing incorrectly specified regex (#2906)

Support OMW 1.4 (#2899)

Deprecate Tree get and set node methods (#2900)

Fix broken inaugural test case (#2903)

Use Multilingual Wordnet Data from OMW with newer Wordnet versions (#2889)

Keep NLTKs "tokenize" module working with pathlib (#2896)

Make prettyprinter to be more readable (#2893)

Update links to the nltk book (#2895)

Add CITATION.cff to nltk (#2880)

Resolve serious ReDoS in PunktSentenceTokenizer (#2869)

Delete old CI config files (#2881)

Improve Tokenize documentation + add TokenizerI as superclass for TweetTokenizer (#2878)

Fix expected value for BLEU score doctest after changes from #2572

Add multi Bleu functionality and tests (#2793)

Deprecate 'return_str' parameter in NLTKWordTokenizer and TreebankWordTokenizer (#2883)

Allow empty string in CFG's + more (#2888)

Partition tree.py module into tree package + pickle fix (#2863)

Fix several TreebankWordTokenizer and NLTKWordTokenizer bugs (#2877)

Rewind Wordnet data file after each lookup (#2868)

Correct init call for SyntaxCorpusReader subclasses (#2872)

Documentation fixes (#2873)

Fix levenstein distance for duplicated letters (#2849)

Support alternative Wordnet versions (#2860)

Remove hundreds of formatting warnings for nltk.org (#2859)

Modernize nltk.org/howto pages (#2856)

Fix Bleu Score smoothing function from taking log(0) (#2839)

Update third party tools to newer versions and removing MaltParser fixed version (#2832)

Fix TypeError: _pretty() takes 1 positional argument but 2 were given in sem/drt.py (#2854)

Replace http with https in most URLs (#2852)

Thanks to the following contributors to 3.6.6 Adam Hawley, BatMrE, Danny Sepler, Eric Kafe, Gavish Poddar, Panagiotis Simakis, RnDevelover, Robby Horvath, Tom Aarsen, Yuta Nakamura, Mohaned Mashaly

Version 3.6.5 2021-10-11

modernised nltk.org website

addressed LGTM.com issues

support ZWJ sequences emoji and skin tone modifer emoji in TweetTokenizer

... (truncated)

Commits

4862b09 updates for 3.6.6

6b60213 Refactor gensim.doctest to work for gensim 4.0.0 and up (#2914)

59aa3fb Fix decode error for bllip parser (#2897)

a28d256 Add Precision, Recall, F-measure, Confusion Matrix to Taggers (#2862)

72d9885 Added warnings if .zip files exist without any corresponding .csv files. (#2908)

dea7b44 Fix FileNotFoundError when the download_dir is a non-existing nested fold...

abbe86b Undo #2909 due to unexpected test failure

c075dab Allow commits with /nocache to not use the cache (#2909)

d6d513d Renamed omw to omw-1.4 (#2907)

2a50a3e Resolve ReDoS opportunity by fixing incorrectly specified regex (#2906)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump nltk from 3.6.3 to 3.6.5 in /multiencoder
Bumps nltk from 3.6.3 to 3.6.5.

Changelog

Sourced from nltk's changelog.

Version 3.6.5 2021-10-11

modernised nltk.org website

addressed LGTM.com issues

support ZWJ sequences emoji and skin tone modifer emoji in TweetTokenizer

METEOR evaluation now requires pre-tokenized input

Code linting and type hinting

implement get_refs function for DrtLambdaExpression

Enable automated CoreNLP, Senna, Prover9/Mace4, Megam, MaltParser CI tests

specify minimum regex version that supports regex.Pattern

avoid re.Pattern and regex.Pattern which fail for Python 3.6, 3.7

Thanks to the following contributors to 3.6.5 Tom Aarsen, Saibo Geng, Mohaned Mashaly, Dimitri Papadopoulos, Danny Sepler, Ahmet Yildirim, RnDevelover, yutanakamura

Version 3.6.4 2021-10-01

deprecate nltk.usage(obj) in favor of help(obj)

resolve ReDoS vulnerability in Corpus Reader

solidify performance tests

improve phone number recognition in tweet tokenizer

refactored CISTEM stemmer for German

identify NLTK Team as the author

replace travis badge with github actions badge

add SECURITY.md

Thanks to the following contributors to 3.6.4 Tom Aarsen, Mohaned Mashaly, Dimitri Papadopoulos Orfanos, purificant, Danny Sepler

Version 3.6.3 2021-09-19

Dropped support for Python 3.5

Run CI tests on Windows, too

Moved from Travis CI to GitHub Actions

Code and comment cleanups

Visualize WordNet relation graphs using Graphviz

Fixed large error in METEOR score

Apply isort, pyupgrade, black, added as pre-commit hooks

Prevent debug_decisions in Punkt from throwing IndexError

Resolved ZeroDivisionError in RIBES with dissimilar sentences

Initialize WordNet IC total counts with smoothing value

Fixed AttributeError for Arabic ARLSTem2 stemmer

Many fixes and improvements to lm language model package

Fix bug in nltk.metrics.aline, C_skip = -10

Improvements to TweetTokenizer

Optional show arg for FreqDist.plot, ConditionalFreqDist.plot

edit_distance now computes Damerau-Levenshtein edit-distance

Thanks to the following contributors to 3.6.3 Tom Aarsen, Abhijnan Bajpai, Michael Wayne Goodman, Michał Górny, Maarten ter Huurne,

... (truncated)

Commits

b422364 updates for 3.6.5

03e4b4e Modernised nltk.org website (#2845)

9f468d3 Merge pull request #2851 from DimitriPapadopoulos/lgtm_errors

8ce97b2 Add a unit test, fix typos

2538164 Enhancement: Add ZWJ sequences Emoji and Skin Tone Modifier Emoji support to ...

836b98e Accept pre-tokenized references & hypothesis for METEOR calculation (#2822)

82ceb20 refactor: perfom linting for punkt.py (#2830)

c05b0e7 use latest version of pip (#2846)

6d39c90 Implement get_refs function for DrtLambdaExpression (#2847)

f554129 LGTM.com error: Wrong number of arguments in a class instantiation

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Preprocesed AquaMuse dataset?

Hi, thanks for the great work!

Are the preprocessed AquaMuse documents (and summaries) also available for download, similar to the QMSum dataset? It would be really helpful for us to get identical datasets (and save the overhead of processing the data from commoncrawl).

Thanks!

opened by apoorvumang 0

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

Related tags

Overview

Query-Focused Summarization

Running two-stage models

Running Segment Encoder models

You might also like...

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

The repository offers the official implementation of our paper in PyTorch.

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official repository for "Intriguing Properties of Vision Transformers" (2021)

Competitive Programming Club, Clinify's Official repository for CP problems hosting by club members.

Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

The official repository for BaMBNet

Comments

failed to run preprocess.py, missing meeting_id and meeting_transcripts expects list of str

Bump nltk from 3.6.5 to 3.6.6 in /multiencoder

Bump nltk from 3.6.3 to 3.6.5 in /multiencoder

Preprocesed AquaMuse dataset?

Owner

Salesforce

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)