Label data using HuggingFace's transformers and automatically get a prediction service

Overview

Label Studio for Hugging Face's Transformers

WebsiteDocsTwitterJoin Slack Community


Transfer learning for NLP models by annotating your textual data without any additional coding.

This package provides a ready-to-use container that links together:


Quick Usage

Install Label Studio and other dependencies

pip install -r requirements.txt
Create ML backend with BERT classifier
label-studio-ml init my-ml-backend --script models/bert_classifier.py
cp models/utils.py my-ml-backend/utils.py

# Start ML backend at http://localhost:9090
label-studio-ml start my-ml-backend

# Start Label Studio in the new terminal with the same python environment
label-studio start
  1. Create a project with Choices and Text tags in the labeling config.
  2. Connect the ML backend in the Project settings with http://localhost:9090
Create ML backend with BERT named entity recognizer
label-studio-ml init my-ml-backend --script models/ner.py
cp models/utils.py my-ml-backend/utils.py

# Start ML backend at http://localhost:9090
label-studio-ml start my-ml-backend

# Start Label Studio in the new terminal with the same python environment
label-studio start
  1. Create a project with Labels and Text tags in the labeling config.
  2. Connect the ML backend in the Project settings with http://localhost:9090

Training and inference

The browser opens at http://localhost:8080. Upload your data on Import page then annotate by selecting Labeling page. Once you've annotate sufficient amount of data, go to Model page and press Start Training button. Once training is finished, model automatically starts serving for inference from Label Studio, and you'll find all model checkpoints inside my-ml-backend/ / directory.

Click here to read more about how to use Machine Learning backend and build Human-in-the-Loop pipelines with Label Studio

License

This software is licensed under the Apache 2.0 LICENSE © Heartex. 2020

Comments
  • How to use the prediction service?

    How to use the prediction service?

    Hi, thanks for this great tool. However I couldn't find a detailed instruction of using the prediction service both here and https://github.com/heartexlabs/label-studio/blob/master/docs/source/guide/tasks.md. I'd like to generate NER annotations after training my model, select the uncertain predictions, and then continue labeling. Thanks in advance.

    opened by tienduccao 11
  • AttributeError: 'TransformersBasedTagger' object has no attribute '_tokenizer'

    AttributeError: 'TransformersBasedTagger' object has no attribute '_tokenizer'

    predict run into error: label-studio-ml-backend/ner-ml-backend/ner.py", line 368, in predict predict_set = SpanLabeledTextDataset(texts, tokenizer=self._tokenizer, **self._dataset_params_dict) AttributeError: 'TransformersBasedTagger' object has no attribute '_tokenizer'

    opened by whisere 10
  • --ml-backend-url option changed and where should i put --ml-backend-name ?

    --ml-backend-url option changed and where should i put --ml-backend-name ?

    The current label-studio start command in the docker-compose.yml contains these options. https://github.com/heartexlabs/label-studio-transformers/blob/9450322/docker-compose.yml#L15-L16

          --ml-backend-url http://label-studio-ml-backend:9090
          --ml-backend-name my_model
    

    but the current label-studio doesn't have them.

    label-studio start -h and I found it changed to --ml-backend. I fixed this and I could see localhost:8200.

    but where should I put a model name?

    opened by miyamonz 4
  • label-studio requirement incorrect, also getting old

    label-studio requirement incorrect, also getting old

    The README example does not work as is -- label-studio==1.0.0 does not provide the command label-studio-ml, and does not expose LabelStudioMLBase.

    It works OK with label-studio==0.7, but that's not what's specified in requirements.txt.

    (NB that it also doesn't work with the current head of label-studios-ml-backend).

    opened by RevelaAutumn 3
  • Can't make predictions: ML backend returns an error (ner.py)

    Can't make predictions: ML backend returns an error (ner.py)

    Steps to reproduce:

    1. Using docker to start up the server docker-compose up --build
    2. Used import sample with three tasks [{"text":"To have faith is to trust yourself to the water"},{"text":"To have faith is to trust yourself to the water"},{"text":"To have faith is to trust yourself to the water"}]
    3. Completed two tasks and trained huggingface transformer from ner.py.
    4. Go to UI for third task prediction.
    5. No prediction.

    Requirements: torch==1.5.0 transformers==2.4.1 tensorboardX==1.9 label-studio>=0.7.0

    Full logs are here:

    [2020-08-31 15:07:49,882] [ERROR] [label_studio.utils.models::make_predictions::528] Can't make predictions: ML backend returns an error: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
    <title>500 Internal Server Error</title>
    <h1>Internal Server Error</h1>
    <p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
    

    Can you help me please with this issue?

    opened by Ecclesiast 3
  • Error with ner.py

    Error with ner.py

    When using the quick start for BERT NER: label-studio-ml init my-ml-backend --script models/ner.py

    This error occurs: AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'

    opened by somejonus 2
  • When predict ner datas use ner samples [KeyError: 'ner'] has been occured

    When predict ner datas use ner samples [KeyError: 'ner'] has been occured

    When predict ner datas,next error has been occured,I print tasks,I found out that the place where it should be 'ner' was programmed with '$undefined$' ,

    tasks data:

    tasks:[{'id': 17, 'data': {'$undefined$': 'This work proposes a novel adaptation of a pretrained sequence-to-sequence model to the task of document ranking.'}, 'meta': {}, 'created_at': '2021-07-05T02:30:36.230799Z', 'updated_at': '2021-07-05T02:30:36.230834Z', 'is_labeled': True, 'overlap': 1, 'project': 10, 'file_upload': 6, 'annotations': [{'id': 17, 'created_username': ' [email protected], 1', 'created_ago': '0\xa0minutes', 'completed_by': 1, 'result': [{'value': {'start': 54, 'end': 74, 'text': 'sequence-to-sequence', 'labels': ['ORG']}, 'id': 'FA_HTHugoH', 'from_name': 'label', 'to_name': 'text', 'type': 'labels'}], 'was_cancelled': False, 'ground_truth': False, 'created_at': '2021-07-05T02:38:39.940285Z', 'updated_at': '2021-07-05T02:38:39.940321Z', 'lead_time': 11.491, 'task': 17}], 'predictions': []}]
    

    error:

    [2021-07-05 10:38:40,048] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
        return f(*args, **kwargs)
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/api.py", line 31, in _predict
        predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/model.py", line 274, in predict
        predictions = m.model.predict(tasks, **kwargs)
      File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in predict
        texts = [task['data'][self.value] for task in tasks]
      File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in <listcomp>
        texts = [task['data'][self.value] for task in tasks]
    KeyError: 'ner'
    
    Traceback (most recent call last):
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
        return f(*args, **kwargs)
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/api.py", line 31, in _predict
        predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
      File "/workspace/label-studio/label-studio-ml-backend/label_studio_ml/model.py", line 274, in predict
        predictions = m.model.predict(tasks, **kwargs)
      File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in predict
        texts = [task['data'][self.value] for task in tasks]
      File "/workspace/label-studio/label-studio-transformers/ner-backend-test/ner.py", line 369, in <listcomp>
        texts = [task['data'][self.value] for task in tasks]
    KeyError: 'ner'
    
    opened by gherao 1
  • No module named label_studio_ml.api while starting ML backend

    No module named label_studio_ml.api while starting ML backend

    Was able to create ML backends successfully based on bert_classifier.py with: "label-studio-ml init my-ml-backend-bert --script models/bert_classifier.py"

    but while starting it with command "label-studio-ml start my-ml-backend-bert" i'm getting following error:

    __File "././my-ml-backend-bert/_wsgi.py", line 30, in from label_studio_ml.api import init_app ImportError: No module named label_studio_ml.api

    Also tried with other classifiers from this source "https://github.com/heartexlabs/label-studio-ml-backend/tree/master/label_studio_ml/examples" but each of them gives me the same error while starting.

    opened by wojnarabc 1
  • How to load trained bert ner model in python and do prediction on a new text?

    How to load trained bert ner model in python and do prediction on a new text?

    Hi, i have trained one bert ner model through ML backend. Then, I would like to share the trained model with my colleagues and they could use the model to do predictions on new text data. How could we load the trained model in python and do prediction on new text data?

    opened by zephyrwang 0
  • Fix NER script:

    Fix NER script:

    'ALL_MODELS' is not needed and 'pretrained_config_archive_map' is deprecated.

    Fixed error in predictions for label-ends which lead to one label snaking the next one without accounting for 'O'-labels between them.

    opened by somejonus 0
  • not showing predictions after training

    not showing predictions after training

    Describe the bug There are 2 problems:

    1. After training ML backend model, I cannot find the model predictions in the UI when labelling.
    2. Often cannot train all 100 epochs, the system will crash at middle, 30-70 epochs although dataset is small (50) and have GPU.
    3. Error shows that: get latest job results from work dir doesn’t exist
    4. Sometimes, when 3 not occurs, other issue is that: unable to load weight from pytorch checkpoint file. image image

    To reproduce Steps to reproduce the behaviour

    1. Import pre annotated data
    2. Manually label some of them
    3. Go to ML UI in Setting, connect model (BERT classifier) and start training
    4. After finishing, come back to Label UI. In prediction tab, only the pre annotated predictions are shown.

    Expected behaviour ML training should be completed and new predictions should be shown in UI

    opened by hienvantran 6
  • Ner.py pretrained_config_archive_map not found for any model

    Ner.py pretrained_config_archive_map not found for any model

    On the initialitation process label-studio-ml init smdia-backend-ner --script models/ner.py --force

    I'm receiving this error to all the models AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'

    
    Traceback (most recent call last):
      File "/usr/local/bin/label-studio-ml", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/server.py", line 119, in main
        create_dir(args)
      File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/server.py", line 73, in create_dir
        model_classes = get_all_classes_inherited_LabelStudioMLBase(script_path)
      File "/usr/local/lib/python3.6/dist-packages/label_studio_ml/utils.py", line 29, in get_all_classes_inherited_LabelStudioMLBase
        module = importlib.import_module(module_name)
      File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/labelstudio/label-studio-transformers/models/ner.py", line 36, in <module>
        [list(conf.pretrained_config_archive_map.keys()) for conf in (BertConfig,CamembertConfig, RobertaConfig, DistilBertConfig)],
      File "/labelstudio/label-studio-transformers/models/ner.py", line 36, in <listcomp>
        [list(conf.pretrained_config_archive_map.keys()) for conf in (BertConfig,CamembertConfig, RobertaConfig, DistilBertConfig)],
    AttributeError: type object 'BertConfig' has no attribute 'pretrained_config_archive_map'
    

    I tried to downgrade transformers to 2.0.0 but them fails the transformers import

    could someone check this issue?

    opened by info2000 3
  • Fixing a 'key' error and an import error

    Fixing a 'key' error and an import error

    When I tried to reproduce the results in ner.py file, I had to fix two errors. One of them was a typo. Specifically, each completion (item) in fit method expects key 'annotations' (not 'completions') and that's what the author had intended. The other fix I had to make was not so obvious. Specifically, I see an import error with LabelStudioMLBase and changing the module it was importing form fixed the error. I learned this by seeing a couple of other label-studio code examples. I've not used bert.py file, and can't speak for these fixes in that file.

    opened by HAMZA310 0
Owner
Heartex
Data labeling and exploration tools for Machine Learning
Heartex
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P

Stanford Future Data Systems 17 Oct 14, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Unpacker Karton Service A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework. This project is

c3rb3ru5 45 Jan 5, 2023
CorNet Correlation Networks for Extreme Multi-label Text Classification

CorNet Correlation Networks for Extreme Multi-label Text Classification Prerequisites python==3.6.3 pytorch==1.2.0 torchgpipe==0.0.5 click==7.0 ruamel

Guangxu Xun 38 Dec 31, 2022
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: [email protected]) JIR

hyunwook.kim 2 Dec 20, 2021
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

SHI Lab 367 Dec 31, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

null 289 Jan 6, 2023
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

Phil Wang 17 Dec 23, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 6, 2023
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022
Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

Vishnu Nandakumar 13 Dec 21, 2022
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022