This repository contains examples of Task-Informed Meta-Learning

Related tags

Text Data & NLP timl
Overview

Task-Informed Meta-Learning

This repository contains examples of Task-Informed Meta-Learning (paper).

We consider two tasks:

Each task acts as its own self-contained codebase - for more details on running the experiments, please check their respective READMEs.

Getting started

For both tasks, anaconda running python 3.6 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from either of the directories) run

conda env create -f environment.yml

Once the environment is activated, the main script to train the models is then deep_learning.py, with the model configurations controlled by the config.py file.

Comments
  • Inference server for timl

    Inference server for timl

    This PR adds timl inference deployment! 🚀

    Essentially this means that files in crop_classification/models/*.pt will be deployed to Google Cloud on each main build.

    Components:

    • A torchserve handler for running deployed timl predictions
    • A Dockerfile for housing the torchserve handler
    • A Google Cloud function for triggering inference when a new file is uploaded to a specified bucket
    • A script for deploying the docker container and the Google Cloud function to Google Cloud
    • An addition to ci.yml to run the above deployment script during CI

    Completes remaining tasks in #4

    opened by ivanzvonkov 5
  • Finetuning & model saving script

    Finetuning & model saving script

    • [x] Finetune a TIML model for any country and target label, and save it
    • [x] Include the normalizing dictionary when saving the model

    cc: https://github.com/nasaharvest/timl/pull/6#issuecomment-1067171438

    opened by gabrieltseng 3
  • Inference for timl

    Inference for timl

    Part of #4

    This PR introduces a stand alone inference class which can be used to make predictions with a ckpt or jit model. This inference class will make it possible to make predictions inside a torchserve server without a lot of dependencies (you'll only need the inference class).

    I've left # TODO comments where some discussion would be useful.

    opened by ivanzvonkov 3
  • Ensure the right datasets are used for the fine-tuning

    Ensure the right datasets are used for the fine-tuning

    • ✅ Ensure create_benchmark_datasets returns tasks from which classification labels can be retrieved
    • ✅ Remove the initializer function for TIMLCropHarvest, since its identical to the original one
    • ✅ Use the right dataset in the deep learning script
    opened by gabrieltseng 0
  • Pin google-github-actions/setup-gcloud to v0 instead of master

    Pin google-github-actions/setup-gcloud to v0 instead of master

    See here for more information:

    Warning: google-github-actions/setup-gcloud is pinned at "master". We strongly advise against pinning to "@master" as it may be unstable. Please update your GitHub Action YAML from:
    
        uses: 'google-github-actions/setup-gcloud@master'
    
    to:
    
        uses: 'google-github-actions/setup-gcloud@v0'
    
    Alternatively, you can pin to any git tag or git SHA in the repository.
    Error: On 2022-04-05, the default branch will be renamed from "master" to "main". Your action is currently pinned to "@master". Even though GitHub creates redirects for renamed branches, testing found that this rename breaks existing GitHub Actions workflows that are pinned to the old branch name.
    
    We strongly advise updating your GitHub Action YAML from:
    
        uses: 'google-github-actions/setup-gcloud@master'
    
    to:
    
        uses: 'google-github-actions/setup-gcloud@v0'
    
    opened by gabrieltseng 0
  • `torch.jit` for the crop classifier

    `torch.jit` for the crop classifier

    To better integrate this into our inference pipeline, save the model as a torch.jit model.

    • [ ] Function to save the model after finetuning
    • [ ] Test to ensure the jit model has the same outputs as the unjitted model

    cc @ivanzvonkov

    opened by gabrieltseng 0
  • Deploy to Google Cloud

    Deploy to Google Cloud

    To make predictions with TIML at scale we'll deploy the model to Google Cloud, deployment will consist of:

    • [x] jit model (https://github.com/nasaharvest/timl/pull/3)
    • [ ] Inference class for making predictions with jit model (https://github.com/nasaharvest/timl/pull/5)
    • [ ] Inference class working inside torchserve server
    • [ ] Docker container for housing torchserve server (deploy to Google Cloud Run)
    • [ ] Google Cloud Function to begin data export
    • [ ] Google Cloud Function to trigger inference
    opened by ivanzvonkov 0
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 4, 2023
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

GenSen Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning Sandeep Subramanian, Adam Trischler, Yoshua B

Maluuba Inc. 309 Oct 19, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO ?? ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 1, 2023
null 189 Jan 2, 2023
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

Joseph Imperial 1 Oct 5, 2021
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 6, 2023
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 169 Jan 5, 2023
Code examples for my Write Better Python Code series on YouTube.

Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/

null 858 Dec 29, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 2, 2023
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Deduplication is the task to combine different representations of the same real world entity.

Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training without having to provide a large, manually labelled dataset.

null 63 Nov 17, 2022