Database Reasoning Over Text project for ACL paper

Overview

Database Reasoning over Text

This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in collaboration with James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, and Alon Halevy.

Overview Image

Data

The completed NeuralDB datasets can be downloaded here and are released under a CC BY-SA 3.0 license.

The dataset includes entity names from Wikidata which are released under a CC BY-SA 3.0 license. This dataset includes sentences from the KELM corpus. KELM is released under the CC BY-SA 2.0 license

Repository Structure

The repository is structured in 3 sub-folders:

  • Tools for mapping the KELM data to Wikidata identifiers are provided in the dataset construction folder ,
  • The information retrieval system for the support set generator are provided in the ssg folder
  • The models for Neural SPJ, the baseline retrieval (TF-IDF and DPR), and evaluation scripts are provided in the modelling folder.

Instructions for running each component are provided in the README files in the respective sub-folders.

Setup

All sub-folders were set up with one Python environment per folder. Requirements for each environment can be installed by running a pip install:

pip install -r requirements.txt

In the dataset-construction and modelling folders, the src folder should be included in the python path.

export PYTHONPATH=src

License

The code in this repository is released under the Apache 2.0 license

Comments
  • FiD Missing

    FiD Missing

    Line 51 in modelling/neuraldb/run.py has the following import: from neuraldb.modelling.fusion_in_decoder import T5MergeForConditionalGeneration Is it possible to add it to the repository?

    opened by OhadRubin 3
  • How to obtain <path_to_mapped_kelm>.jsonl

    How to obtain .jsonl

    Hi, regarding this:

    1.2 KELM

    Importing the mappings from KELM is quite fast and is done through running the kelm_data.py script

    python -m ndb_data.data_import.kelm_data <path_to_mapped_kelm>.jsonl
    

    How do I obtain the <path_to_mapped_kelm>.jsonl? I think it's not described in the readme. I guess it has something to do with the map_kelm.py script, but I wasn't able to figure the details. E.g. downloading KELM from https://github.com/google-research-datasets/KELM-corpus and feeding it into either map_kelm.py or kelm_data.py doesn't seem to work.

    What am I missing?

    opened by urbanmatthias 2
  • Adding Code of Conduct file

    Adding Code of Conduct file

    This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

    Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • Adding Contributing file

    Adding Contributing file

    This is pull request was created automatically because we noticed your project was missing a Contributing file.

    CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In NeuralDB, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    tqdm
    pymongo
    numpy
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency tqdm can be changed to >=4.42.0,<=4.64.0. The version constraint of dependency pymongo can be changed to >=2.9,<=4.1.1.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the tqdm
    itertools.product
    tqdm.tqdm
    
    The calling methods from the pymongo
    pymongo.MongoClient
    pymongo.UpdateOne
    
    The calling methods from the all methods
    loaded.extend
    global_obs.append
    search_key.split
    qh2_filtered.append
    r.split
    json.load.keys
    pathlib.Path
    transformers.integrations.deepspeed_init
    special_tokens.extend
    instance.strip.rstrip
    write_updates
    os.environ.get
    collections.defaultdict.items
    torch.utils.data.DataLoader
    self.context_tokenizer.items
    get_longest
    self.RankArgs
    fact.split.detok.detokenize.replace.replace.replace
    pathlib.Path.exists
    s.startswith
    try_recovery
    collections.defaultdict
    tmp_heights.append
    idx.by_idx.append
    logging.getLogger.info
    self.callback_handler.on_prediction_step
    model
    maybe_split
    matplotlib.pyplot.xticks
    nltk.tokenize.treebank.TreebankWordDetokenizer.detokenize
    random.choice.replace
    map_triples_to_facts
    f.write
    resolve_redirect
    grp.tuple.all_unique.append
    collections.defaultdict.add
    all_series.append
    collections.Counter.most_common
    torch.stack
    item.replace.strip.replace
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.save_metrics
    EvalPredictionWithMetadata
    get_unit
    name.replace.replace.replace
    resolve_later_ref.append
    read_questions_into_dict.items
    torch.utils.data.sampler.WeightedRandomSampler
    collections.defaultdict.values
    all_grams.extend
    matplotlib.pyplot.savefig
    sampled.extend
    refs.split.replace
    convert_numeric_hypothesis
    hdr.replace
    name.replace.replace.replace.replace.replace.replace.replace
    json.load.get
    ax.plot
    map
    question_template.split
    self.model.generate
    context_outputs.pooler_output.T.question_outputs.pooler_output.torch.matmul.cpu
    get_generator
    sizes.append
    r.final_templates.keys
    relation_id.additional_subjects.get.keys.set.union
    name.replace.replace
    float
    Exception
    glob.glob
    collections.defaultdict.keys
    generate_positive_question
    list.append
    v.to
    matplotlib.pyplot.plot
    subj.by_relation.extend
    self.tokenizer.convert_tokens_to_ids
    next_actions.tolist.tolist
    long_questions.extend
    new_search.append
    stripped_template.replace.replace
    is_valid_file
    population.pop
    find_matches
    r.source_mutations.keys
    generate_answers
    resolve_first_ref
    self.maybe_tokenize_db
    result.strip
    self.database_reader.load_instances
    loaded.append
    self.model.prepare_decoder_input_ids_from_labels
    logging.getLogger.debug
    relation_id.additional_objects.get.keys.set.union
    question_template.split.strip
    aggr.update
    numpy.concatenate
    pandas.pivot_table
    logging.getLogger
    transformers.integrations.is_deepspeed_zero3_enabled
    f.read
    transformers.trainer_utils.is_main_process
    hasattr
    subject_name.q.replace.replace
    pydash.get
    pandas.DataFrame
    join
    normalize_subject
    resolve_later_ref.split
    object_id.startswith
    model.half.to.half
    added_instances.append
    matplotlib.pyplot.subplots
    subj.by_subject.extend
    rel.subj.by_sub_rel.append
    join_decoded
    sro.startswith
    resolve_later_ref
    feature.items
    hdr.replace.replace
    os.path.exists
    bool
    build_questions_for_db
    batch.append
    k.replace
    os.path.isdir
    tmp_rels.append
    instance.strip.rstrip.replace.replace.split
    merge_type
    read_csv.items
    self._prepare_inputs
    rel.split.split_by_relation.extend
    torch.nn.Softmax
    partition_questions
    statement.replace
    all
    predicted.split
    answer_sizes.append
    dataset.append
    print
    threshold.cos_scores.np.nonzero.squeeze
    map_triples_to_facts.keys
    torch.no_grad
    transformers.trainer_utils.denumpify_detensorize.pop
    isinstance
    collections.OrderedDict
    found_sro.hf.set.union
    generate_derivations.append
    TFIDFRetriever.closest_docs
    answer.strip
    query.active_questions.append
    search_toks.index
    element.split
    tokenizer.pad_token.tokenizer.pad_token.tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.label.replace.replace.replace.strip
    nltk.word_tokenize
    itertools.chain
    ordering.index
    subject_name.question_template.replace.replace
    str
    all_losses.mean.item
    clean_title
    actual.split
    transformers.trainer_pt_utils.find_batch_size
    dev_examples.append
    partition_subject
    k.strip.answers.append
    derivation.strip.split
    json.dumps
    question_template.replace.replace
    qh1_filtered.append
    stds.extend
    sentence_transformers.util.pytorch_cos_sim
    self._wrap_model
    operator.itemgetter
    outputs.outputs.dict.outputs.isinstance.mean
    sample_databases
    final_questions.append
    random.uniform
    logging.getLogger.warning
    range
    subj.by_object.extend
    evaluate_ndb_with_ssg
    os.getenv
    repr
    o.by_object.append
    statement.replace.replace
    context_tokens.append
    pandas.set_option
    itertools.repeat
    get_instances_from_file
    precision
    ax.fill_between
    csv.DictReader
    linearize
    transformers.HfArgumentParser.parse_args_into_dataclasses
    item.replace.strip
    tokenizer.pad_token.tokenizer.pad_token.tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.label.replace.replace.replace.strip.split
    matplotlib.pyplot.legend
    dpr.context_model.eval
    logging.getLogger.critical
    all_metadata.extend
    self.collection.find_one
    wikidata_common.wikpedia.Wikipedia
    medium_questions.extend
    tokenizer.batch_decode
    group_derivations
    b.count
    copy.copy
    process_lists
    key.startswith
    setuptools.find_packages
    json.loads.split
    partition_relation
    q.strip
    numpy.mean
    question_template.replace
    object_name.replace
    TFIDFRetriever.lookup
    question.split.strip
    features.keys
    nltk.ngrams
    fact.split.detok.detokenize.replace.replace.split
    tokenizer.add_special_tokens
    transformers.DPRQuestionEncoder.from_pretrained
    collections.Counter.update
    list
    model.half.to
    kvp.split
    hdr.replace.replace.replace.replace.replace
    v.torch.LongTensor.to
    new_states.append
    object_name.replace.replace
    template.keys.set.difference
    search_str.clean.strip
    example.append
    read_csv
    fact.split.detok.detokenize.replace.replace
    torch.matmul
    dataclasses.field
    query_obj.range.set.difference
    majority_vote
    ndb_data.util.log_helper.setup_logging
    numpy.argmax
    tmp_positive_answers.append
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.train
    self._process_query
    v.len.by_len.append
    logging.root.addHandler
    generate_facts_for_db.append
    transformers.DPRContextEncoder.from_pretrained
    wikidata_common.wikpedia.Wikipedia.resolve_redirect
    set
    ndb_data.wikidata_common.wikidata.Wikidata
    prediction.replace.split
    filter
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.save_state
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.save_model
    setup_logging
    min
    lookup_entity
    all_subjects.set.union
    self.subsampler.maybe_drop_sample
    generate_joins_extra
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.log_metrics
    tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.label.replace.replace.replace
    subj.by_sub_rel.append
    round
    sampled.append
    neuraldb.dataset.neuraldb_parser.NeuralDBParser.load_instances
    sampled_fact.strip
    dpr.question_model.eval
    in_dict.items
    dict_flatten
    self._prepare_inputs.items
    collection.insert_many
    pymongo.UpdateOne
    numpy.count_nonzero
    num_fact_used.append
    get_bool_breakdown
    reader_cls.read
    sentence_transformers.losses.ContrastiveLoss
    transformers.DPRContextEncoderTokenizer.from_pretrained
    neuraldb.evaluation.scoring_functions.average_score
    numpy.std
    sorted
    read_dump
    matplotlib.pyplot.hlines
    v.split.strip.strip
    q.by_qid.append
    find_longest_match
    ssg_data.append
    tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.pred.replace.replace.replace
    generate_joins_filter
    transformers.DPRQuestionEncoderTokenizer.from_pretrained
    matplotlib.pyplot.ylabel
    self.context_model
    DPRRetriever.lookup
    i.values
    transformers.trainer_pt_utils.nested_truncate
    ssg_output.remove
    relation_id.extra_subjects.get.keys
    matplotlib.pyplot.style.use
    db.split.rsplit
    neuraldb.evaluation.scoring_functions.breakdown_score
    delattr
    qid.split
    open.read
    self.tokenizer.pad.append
    tokenizer.pad_token.tokenizer.pad_token.tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.pred.replace.replace.replace.strip.split
    local_obs.append
    neuraldb.dataset.instance_generator.subsampler.Subsampler
    fact.split.detok.detokenize.replace
    torch.cat
    prediction.split.replace
    transformers.AutoTokenizer.from_pretrained.tokenize
    scoring_function
    self.tokenizer.tokenize
    short_questions.extend
    self.features.keys
    numpy.sum
    self._generate
    self._process_answer
    transformers.DPRQuestionEncoder.from_pretrained.to
    tok.startswith
    k.strip
    math.pow
    subject.get
    numpy.concatenate.mean
    generate_negative_bool
    q.startswith
    s.strip
    additional_ids.difference.difference
    question.strip
    instance.strip.rstrip.replace
    random.choices
    matplotlib.pyplot.xlabel
    ndb_data.wikidata_common.wikidata.Wikidata.get_by_id_or_uri
    set.append
    reader_cls
    normalize_subject.replace
    generate_derivations
    self.concatenate_answer
    transformers.trainer_utils.denumpify_detensorize.keys
    actual.join_decoded.lower
    rel.by_sub_rel.append
    state.copy.append
    self.context_tokenizer
    self.prediction_step
    transformers.AutoTokenizer.from_pretrained.decode
    check_match
    derivation.strip.startswith
    self.compute_metrics
    random.choice
    derivation.rsplit
    len.keys
    inputs.outputs.self.label_smoother.mean
    make_symmetric
    ndb_data.construction.make_database_initial.normalize_subject
    transformers.AutoModelForSeq2SeqLM.from_pretrained.resize_token_embeddings
    super.__init__
    matplotlib.pyplot.title
    question_template.replace.replace.replace
    self.collection.find
    get_size_bin
    b.strip.first_bit.strip
    template.keys
    claim.pydash.get.values
    ef.write
    q_idx.db_idx.questions_answers.append
    len
    random.random
    numpy.min
    prediction.replace.replace
    k.rel_avgs.append
    get_bool_ans
    argparse.ArgumentParser.parse_args
    prediction.split
    json.loads.replace
    neuraldb.dataset.seq2seq_dataset.Seq2SeqDataset
    ValueError
    prediction.split.replace.replace
    self.validation_file.split
    all_stds.append
    tmp_derivations.append
    sentence_transformers.evaluation.BinaryClassificationEvaluator.from_input_examples
    torch.cuda.amp.autocast
    try_numeric
    self.test_file.split
    generate_joins
    self.instance_generator.generate
    clean.append
    name.replace.replace.replace.replace.replace.replace
    neuraldb.dataset.data_collator_seq2seq.DataCollatorForSeq2SeqAllowMetadata
    context_outputs.pooler_output.T.question_outputs.pooler_output.torch.matmul.cpu.detach.numpy.argsort
    pred.replace
    key.added_q_type_bin.append
    by_subj.keys.set.difference
    logging.StreamHandler
    next
    extended_question_answers.append
    sentence_transformers.SentenceTransformer
    self.answer_delimiter.join
    numpy.where
    wikidata_common.wikidata.Wikidata.find_custom
    torch.zeros
    logging.getLogger.error
    instance.update
    format
    collection.find
    super.compute_loss
    context_outputs.pooler_output.T.question_outputs.pooler_output.torch.matmul.cpu.detach
    ndb_data.wikidata_common.kelm.KELMMongo
    logging.root.setLevel
    get_indexable
    load_experiment
    drqascripts.retriever.build_tfidf_lines.OnlineTfidfDocRanker
    extra_negative_facts.append
    collections.Counter
    ndb_data.wikidata_common.kelm.KELMMongo.find_entity_rel
    torch.LongTensor
    any
    pydash.get.items
    sum
    setuptools.setup
    wikidata_common.wikidata.Wikidata
    model.half.to.eval
    clean
    second.nested.n_count.add
    lengths.append
    hyp.original_for.append
    self._nested_gather
    question_template.replace.replace.replace.replace.replace
    os.path.basename
    index_dump
    bulks.append
    matplotlib.pyplot.fill_between
    zip
    search_key.result.n_count.add
    json.loads.append
    generate_db_facts
    self._pad_tensors_to_max_len
    singleton_questions.extend
    elem.to
    transformers.AutoTokenizer.from_pretrained
    statement.replace.replace.replace
    compute_f1
    set.add
    transformers.AutoTokenizer.from_pretrained.encode
    bz2.open
    similarity.normalized_levenshtein.NormalizedLevenshtein
    argparse.ArgumentParser.add_argument
    transformers.trainer_utils.EvalLoopOutput
    generate_derivations.extend
    s.copy
    context_outputs.pooler_output.T.question_outputs.pooler_output.torch.matmul.cpu.detach.numpy.argsort.tolist
    generate_hypotheses
    hdr.replace.replace.replace.replace
    sentence_transformers.SentencesDataset
    retokenize
    self.context_delimiter.join
    out_file.write
    self._maybe_sample
    label.replace
    json.loads
    transformers.set_seed
    self.tokenizer.encode_plus
    type
    question.strip.replace
    neuraldb.evaluation.scoring_functions.f1
    json.load.items
    additional_ids.difference.update
    sentence_transformers.InputExample
    o.startswith
    random.shuffle
    transformers.AutoConfig.from_pretrained
    sentence_transformers.SentenceTransformer.encode
    re.match
    prediction.split.replace.replace.lower
    math.floor
    additional_subjects.keys.set.union
    derivation.split
    datasets.tqdm
    self._pad_across_processes
    instance.questions.append
    name.replace
    question.replace
    self.tokenizer.as_target_tokenizer
    question_template.replace.replace.replace.replace
    recall
    read_databases
    candidate_negatives_1.append
    plot.append
    question_template.startswith
    numpy.max
    nltk.tokenize.treebank.TreebankWordDetokenizer
    ctx.insert
    isinstance.items
    self._load_instances
    local_f.append
    qbin.qtype.all_questions_binned.append
    generate_facts_for_db
    outputs.outputs.dict.outputs.isinstance.mean.detach
    get_numeric_value
    tmp_types.append
    normalize_subject.split
    self.question_types.values
    random.choice.startswith
    context_outputs.pooler_output.T.question_outputs.pooler_output.torch.matmul.cpu.detach.numpy
    collection.bulk_write
    ndb_data.wikidata_common.kelm.KELMMongo.close
    transformers.trainer_utils.denumpify_detensorize
    ssg_utils.read_NDB
    instance.split.TreebankWordDetokenizer.detokenize.replace.replace
    data_files.items
    instance.copy.strip
    super
    generator_cls
    subject_name.question.replace.replace
    r.final_templates.keys.set.difference
    self.tokenizer.add_tokens
    dataset.extend
    all_experiments.append
    plot.sort
    self.num_examples
    json.dump
    object.get
    self._prepare_inputs.pop
    v.split.strip
    collection.estimated_document_count
    expt.update
    neuraldb.dataset.neuraldb_parser.NeuralDBParser
    q.replace
    keys.split
    final_sets.append
    db.extend
    tmp_fact_ids.append
    hyp.extra_kelm_for.append
    self.train_file.split
    kwargs.get
    argparse.ArgumentParser.error
    numpy.nonzero
    similarity.normalized_levenshtein.NormalizedLevenshtein.similarity
    argparse.ArgumentParser
    config_kwargs.update
    refs.split.split
    startptr.toks.join.clean.split
    keys.split.strip
    db_idx.to_add.append
    ndb_data.generation.question_to_db.generate_answers
    tuple
    transformers.utils.logging.set_verbosity_info
    shutil.rmtree
    additional_objects.keys.set.union
    pandas.pivot_table.to_records
    NotImplementedError
    a.strip
    super.prediction_step
    itertools.product
    tqdm.tqdm
    derivation.strip
    subject_name.islower
    random.choice.split
    post_process_instances
    property.property_entity.append
    matplotlib.pyplot.show
    os.unlink
    final_period
    self.tokenizer.decode
    instance.strip.rstrip.replace.replace
    swap_so
    question.split
    v.strip.answers.append
    re.match.group
    outputs.outputs.dict.outputs.isinstance.mean.detach.repeat
    relation_id.additional_objects.get.keys
    line.rstrip
    instance.split.TreebankWordDetokenizer.detokenize.replace
    self.tokenizer.pad
    numpy.cumsum
    copy.copy.extend
    tuple.startswith
    partition_idx
    partition_subject.keys
    functools.reduce
    states.pop.copy
    hyp.hypotheses_facts.append
    transformers.utils.logging.enable_default_handler
    others.append
    read_questions_into_dict
    self.maybe_decorate_with_metadata
    obj.startswith
    v.split.strip.split
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer.evaluate
    object_name.islower
    cos_scores.cpu.cpu
    transformers.trainer_pt_utils.nested_concat
    json.load
    b.strip
    self.question_tokenizer
    series.extend
    partition_subject_relation
    transformers.trainer_pt_utils.nested_numpify
    get_file_stats
    bring_extra_facts
    pymongo.MongoClient
    read_csv.lower
    json.loads.strip
    postprocess_text
    numpy.percentile
    transformers.utils.logging.enable_explicit_format
    os.listdir
    random.sample
    os.makedirs
    derivation.split.strip
    hf.keys
    self.label_smoother
    sentence_transformers.SentenceTransformer.fit
    is_valid_folder
    read_questions_into_dict.keys
    tokenizer.pad_token.tokenizer.pad_token.tokenizer.eos_token.tokenizer.eos_token.tokenizer.bos_token.tokenizer.bos_token.pred.replace.replace.replace.strip
    TFIDFRetriever
    transformers.trainer_utils.get_last_checkpoint
    collections.Counter.items
    subject_name.modifier.is_subject.q.replace.replace
    transformers.DPRContextEncoder.from_pretrained.to
    out.extend
    subject_name.fact.replace.replace
    extract_operator
    inputs.outputs.self.label_smoother.mean.detach
    ssg_utils.create_dataset
    self.question_tokenizer.items
    lookup_relation
    start.toks.join.startswith
    r.by_relation.append
    derv.split
    max
    derv.tokenizer.encode.tokenizer.decode.strip
    tokenizer.bos_token.tokenizer.bos_token.label.replace.replace
    int
    remove_lst.append
    random.randint
    transformers.AutoModelForSeq2SeqLM.from_pretrained
    neuraldb.modelling.neuraldb_trainer.NeuralDBTrainer
    example.self.tokenizer.convert_tokens_to_ids.self.tokenizer.decode.split
    open
    set.update
    derivation.rsplit.strip
    hdr.replace.replace.replace
    enumerate
    k.startswith
    pandas.DataFrame.select_dtypes
    states.pop
    weights.append
    v.items
    DPRRetriever
    name.replace.replace.replace.replace.replace
    tmp_questions.append
    of.write
    main
    tokenizer.bos_token.tokenizer.bos_token.pred.replace.replace
    predicted.join_decoded.lower
    unit_uri.replace
    relation_id.additional_subjects.get.keys
    flatten_dicts
    transformers.HfArgumentParser
    self.maybe_tokenize_answer
    convert_comparable
    self._prepend_prediction_type_answer
    self.question_model
    q.q_heights.append
    train_examples.append
    name.replace.replace.replace.replace
    dict
    set.items
    evaluation_metrics
    pydash.get.values
    virtual_features.extend
    s.by_subject.append
    batch_update.append
    relation_id.extra_objects.get.keys
    name.replace.replace.replace.replace.replace.replace.replace.replace
    neuraldb.util.log_helper.setup_logging
    search_toks.append
    wikidata_common.wikidata.Wikidata.get_by_id_or_uri
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • questions about spj_rand

    questions about spj_rand

    1. What does spj_rand mean? Is spj_rand the same as ssg+spj? if no, how can I run ssg+spj?

    2.I'm going to reproduce the results of the ACL paper, I have executed 'bash scripts/experiments_ours.sh v2.4_25' and 'bash scripts/experiments_baseline.sh v2.4_25'. After execute 'python -m neuraldb.final_scoring' I got this results: image

    The result here is only minmax/set/bool/count. How to output the results related to Atomic and join?

    opened by WenxiongLiao 1
  • Some fix in dataset construction

    Some fix in dataset construction

    Hi,

    I am working on NeuralDB, and I want to contribute to the original project with these small fixes.

    • I have improved the requirements.txt since some packages are missing
    • I have added a setup.sh in order to install the requirements and also to install the NLTK punk dependency.
    • I have also removed the logger in the file make_database_initial.py, since ndb_data.util.log_helper is missing.

    Kind regards, Andrea Bacciu

    CLA Signed 
    opened by andreabac3 0
  • Performance problem combination of SSG + SPJ

    Performance problem combination of SSG + SPJ

    Hey,

    My team and I are facing an issue with the concatenation of SSG and SPJ. We trained the SSG and SPJ. The performances are quite good taken separatly. But, as soon as we test the NRD globally, the performances drop down. We have 0.55, 087 precision and recall for the SSG, 0.89 F1 score for the SPJ but 0.131 for the accumulation of SSG + SPJ. Based on the table from the Neural Databases article, we expected to have better results. Do you have any idea why this is happening ?

    Why can't we predict the other types of questions for SSG + SPJ ... ?

    image

    image

    image

    image

    Thanks :)

    opened by MargauxParentAubay 6
  • "kelm_file.jsonl" is missing

    Hey, I downloaded mongo wikidata dump from Google Drive and restore the dump of mongo db successfully. Databases is also created with the collection: "wiki_graph", with the respective indexes - wikidata_id, english_name, english_wiki, sitelinks.title and collection: "wiki_redirect", with the index - title .

    I am stuck at section 1.2 of README under dataset-construction folder, followed all the steps of README. Please help me with the location of "kelm_file.jsonl" to download it and reproducing the results.

    Thanks.

    opened by ajaysh2193 2
  • HuggingFace model for inference testing

    HuggingFace model for inference testing

    Hey team,

    Is there a chance of releasing the final finetuned models on HuggingFace or in-general, for inference testing purposes on the downloaded NeuralDB dataset?

    Thanks.

    opened by Rock-Anderson 0
Owner
Facebook Research
Facebook Research
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs This is the code of paper ConE: Cone Embeddings for Multi-Hop Reasoning over Knowl

MIRA Lab 33 Dec 7, 2022
Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot

THUNLP 5 Jun 16, 2022
The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SGRAF PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”. It is built on top of the SCAN and C

Ronnie_IIAU 149 Dec 22, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

null 4 Apr 20, 2022
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

tao han 35 Nov 22, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

SimCLS Code for our paper: "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021 1. How to Install Requirements

Yixin Liu 150 Dec 12, 2022
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

One2Set This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”. Our implementation is built on the

Jiacheng Ye 63 Jan 5, 2023
code associated with ACL 2021 DExperts paper

DExperts Hi! This repository contains code for the paper DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts to appear at

Alisa Liu 68 Dec 15, 2022
null 190 Jan 3, 2023
Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Balancing Training for Multilingual Neural Machine Translation Implementation of the paper Balancing Training for Multilingual Neural Machine Translat

Xinyi Wang 21 May 18, 2022
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

Code for ACL'2021 paper WARP ?? Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification.

YerevaNN 75 Nov 6, 2022