Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Overview

drawing

Explore, label, and monitor data for AI projects

CI

Rubrix Intro

Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Rubrix focuses on enabling novel, human in the loop workflows involving data scientists, subject matter experts and ML/data engineers.

With Rubrix, you can:

  • Monitor the predictions of deployed models.
  • Label data for starting up or evolving an existing project.
  • Iterate on ground-truth and predictions to debug, track and improve your data and models over time.
  • Build custom applications and dashboards on top of your model predictions.

We've tried to make working with Rubrix easy and fun, while keeping it scalable and flexible.

Rubrix is composed of:

  • a Python library to bridge data and models, which you can install via pip.
  • a web application to explore and label data, which you can launch using Docker or directly with Python.

This is an example of Rubrix UI annotation mode:

Rubrix Annotation Mode

📖 For more information, visit the documentation or if you want to get started, keep reading.

Get started

To get started you need to follow three steps:

  1. Install the Python client
  2. Launch the web app
  3. Start logging data

1. Install the Python client

You can install the Python client with pip:

pip install rubrix

2. Launch the webapp

There are two ways to launch the webapp:

  • Using docker-compose (recommended).
  • Executing the server code manually

Using docker-compose (recommended)

Create a folder:

mkdir rubrix && cd rubrix

and launch the docker-contained web app with the following command:

wget -O docker-compose.yml https://raw.githubusercontent.com/recognai/rubrix/master/docker-compose.yaml && docker-compose up

This is the recommended way because it automatically includes an Elasticsearch instance, Rubrix's main persistent layer.

Executing the server code manually

When executing the server code manually you need to provide an Elasticsearch instance yourself.

  1. First you need to install Elasticsearch (we recommend version 7.10) and launch an Elasticsearch instance. For MacOS and Windows there are Homebrew formulae and a msi package, respectively.
  2. Install the Rubrix Python library together with its server dependencies:
pip install rubrix[server]
  1. Launch a local instance of the Rubrix web app
python -m rubrix.server

By default, the Rubrix server will look for your Elasticsearch endpoint at http://localhost:9200. If you want to customize this, you can set the ELASTICSEARCH environment variable pointing to your endpoint.

3. Start logging data

The following code will log one record into the example-dataset dataset:

import rubrix as rb

rb.log(
    rb.TextClassificationRecord(inputs="my first rubrix example"),
    name='example-dataset'
)
BulkResponse(dataset='example-dataset', processed=1, failed=0)

If you go to your Rubrix app at http://localhost:6900/, you should see your first dataset.

Congratulations! You are ready to start working with Rubrix with your own data.

To better understand what's possible take a look at Rubrix's Cookbook

Community

As a new open-source project, we are eager to hear your thoughts, fix bugs, and help you get started. Feel free to use the Discussion forum or the Issues and we'll be pleased to help out.

Comments
  • Add monitoring examples with FastAPI: Hugging Face and spaCy

    Add monitoring examples with FastAPI: Hugging Face and spaCy

    The idea would be to add a guide (as a Jupyter Notebook) to be included under docs/guides. This Jupyter notebook will showcase the RubrixHTTPMiddleware for monitoring the predictions of a FastAPI inference endpoint. Here is the example with Hugging Face + FastAPI:

    from fastapi import FastAPI
    from typing import List
    from transformers import pipeline
    from rubrix.client.asgi import RubrixLogHTTPMiddleware
    
    classifier = pipeline("sentiment-analysis", return_all_scores=True)
    
    app = FastAPI()
    
    # define the middleware for logging predictions into a Rubrix Dataset
    app.add_middleware(
        RubrixLogHTTPMiddleware,
        api_endpoint="/predict",
        dataset="monitoring_dataset_v1",
        # you could post-process the predict output with a custom record_mapper function
        # record_mapper=custom_text_classification_mapper,
    )
    
    # prediction endpoint
    @app.post("/predict")
    def predict_batch(batch: List[str]):
        predictions = classifier(batch)
        return [
            {
                "labels": [p["label"] for p in prediction],
                "probabilities": [p["score"] for p in prediction],
            }
            for prediction in predictions
        ]
    

    The steps would be to:

    1. Create a notebook and include the above example
    2. Add an example with a pre-trained transformer TokenClassifier (for example: https://huggingface.co/dslim/bert-base-NER)
    3. Add an example with a spaCy NER pipeline.
    4. (Optionally) Include an example dashboard with Kibana (screenshots, gif or video)
    5. (Optionally) Include an example with ray serve
    documentation good first issue help wanted 
    opened by dvsrepo 19
  • updated readme with `conda` install instruction

    updated readme with `conda` install instruction

    This closes #781.

    • [x] added conda installation instruction (rubrix is available on conda-forge channel)
    • [x] added badges:
      • [x] conda-forge/rubrix (with version)
      • [x] conda-forge/rubrix (with platform specification): example -- "noarch"
      • [x] docs badge
    opened by sugatoray 14
  • [NER Fine tuning] content selection

    [NER Fine tuning] content selection

    Multi word

    Actual state : (VIEW SS) 1- I select various words, highlight is grey and in a solid block (Highlight/words). 2- When selection is done, highlight selection is splited and label selector appears.

    • [x] Should be:

    1- I select various words, highlight is grey and splited (highlight/word) 2- When selection is done, highlight selection is a solid block label selector appears.

    Delete labelling

    • [x] Make clicable the whole tooltip to delete

    Selection on a searched word

    • [x] Selection highlight should not be cut (SS)
    • [x] When selection is containing a search word the label selector does not appear (how it works only on right>left sense)
    • [x] In general : change appearance of results : in place of Orange highlight show text in bold

    Cursor

    • [x] Active "hand" cursor (pointer) on piece of text already annotated/Predicted
    • [x] Active "Text Select" cursor on the rest of record
    • [x] Enlarge the hover state to the whole area : (record + annotated tooltip + empty space between them) (record + predicted tooltip + empty space between us)

    New Select label modal

    • [x] Integrate new UI modal
    • [x] In case of unique label, dont show modal, and just affect label after selecting text
    • [x] Add logic to show first and preselected the last label used
    • [x] Add following Keyboard shortcut: Enter to valid preselected label, and vertical arrow keyboard or Number to valid other labels
    opened by Amelie-V 13
  • Add text2text example (e.g., text summarisation)

    Add text2text example (e.g., text summarisation)

    Add the text summarisation fine-tuning tutorial similar to sentiment classifier fine-tuning tutorial:

    https://rubrix.readthedocs.io/en/stable/tutorials/06-labeling-finetuning.html#3.-Fine-tune-the-pre-trained-mode

    documentation good first issue help wanted 
    opened by frascuchon 13
  • fix: Compute predicted properly for token classification [NEEDS_DATA_UPGRADE]

    fix: Compute predicted properly for token classification [NEEDS_DATA_UPGRADE]

    This PR fixes the way predicted ok/ko info is computed for token classification records.

    To apply this fix to already created datasets, you must first re-log records. Otherwise, stored info won't be updated.

    Closes #1955

    opened by frascuchon 12
  • [Workspaces] Users without personal datasets

    [Workspaces] Users without personal datasets

    Users without personal datasets but that belongs to one or more workspaces which have datasets, should automatically change to one of those workspace?

    Better to show all datasets from all workspaces in datasets list allowing to filter by workspace?

    question app 
    opened by frascuchon 11
  • [Text Class] Optimize Long records view *Prioritary*

    [Text Class] Optimize Long records view *Prioritary*

    • [x] Show labels buttons area above the fold.

    • [x] Create Action to open/close on click the full record in the same view

    • [x] Copy "Show full record" "Show less"

    • [ ] I would grap the opportunity to update the "View more" "view less" on Metrics modal to "Show more" "Show less" and apply the same style there

    enhancement 
    opened by Amelie-V 11
  • [Search] Improve and normalizes the search data model

    [Search] Improve and normalizes the search data model

    Things to keep in mind:

    • Normalize text inputs fields: text, inputs, words must be normalized and use a common pattern for all tasks
    • Several es analyzers for text fields: standard and whitespace(?) for fine tuning searches. Default as standard
    • What about text fields in metadata ? For now, only terms queries are supported. It's mean that metadata fields with large content are not enabled to be queries as full text search.
    • Created indices should contain mapping info only for its fields. A text classification index should not include mapping info for tokens or text predicted (text2text).
    • Review filter fields and align with UI names (if any)
    • What about nested fields? like token or metrics info for token classification, or label and its score for text classification. As default, query string dsl does not support nested queries, but it could be nice include some minimal support for that kind of queries.

    @dvsrepo @dcfidalgo Anything to include here?

    Tasks

    To achieve to do the work, we need tackle following tasks (that will be created as separated issues and linked here)

    1. [Datasets] Avoid using global template for all indices
    2. [Datasets] Dataset migration mechanisms for each release
    3. [Datasets] New es document model per task with backward compatibility fields
    4. [Datasets] Apply migration to new es doc model
    5. [Datasets] Build searches and aggregations using new doc model
    enhancement server 
    opened by frascuchon 11
  • Devise workflow to test the tutorials via a github action

    Devise workflow to test the tutorials via a github action

    The idea here is to devise a workflow to test our tutorials in a semi-automatic way. Ideally, we have a workflow that we can launch manually and let's say every two weeks or so, to test our tutorials. Maybe we can use nbmake for this and follow this blogpost. The tricky part is that for some tutorials we need to change/add/delete a few cells to be able to run them in an automated way ...

    documentation good first issue help wanted 
    opened by dcfidalgo 10
  • [Weak supervision] Rules numbers by label

    [Weak supervision] Rules numbers by label

    For instance:

    Sci/tech 2 Sports 1 Business 4 Politics 0 World 0

    his feature could be used for two things:

    • Help to know how is going the rule definition
    • See the full label list (in "define rules" we dont have this list by default)
    ui 
    opened by Amelie-V 9
  • Any plan to support no-whitespace language?

    Any plan to support no-whitespace language?

    I am planning to use rubrix for Japanese text data. The search functionality doesn't seem to work well on this language. I think it's better if we can customize the tokenizer used in elasticsearch instead of hardcoded "whitespace" tokenizer.

    opened by faisalron 9
  • feat: get keywords `metric` from Python client

    feat: get keywords `metric` from Python client

    Is your feature request related to a problem? Please describe. The keywords metric is not retrievable via the Python client.

    Describe the solution you'd like argilla.metrics.commons.keywords

    Describe alternatives you've considered N.A.

    Additional context N.A.

    enhancement 
    opened by davidberenstein1957 0
  • feat: annotator specific `metrics`

    feat: annotator specific `metrics`

    Is your feature request related to a problem? Please describe. N.A.

    Describe the solution you'd like Sometimes I want to see metrics for a specific annotator

    • alignment with predictions
    • alignment with other annotator
    • distribution of labels assigned
    • n_labels assigned
      • multi-label TextClassification
      • Token Classification
    • records discarded
    • annotation speed
    • time spend annotating

    Describe alternatives you've considered N.A.

    Additional context Add any other context or screenshots about the feature request here.

    enhancement 
    opened by davidberenstein1957 0
  • feat: `n-gram` keywords `metrics`

    feat: `n-gram` keywords `metrics`

    Is your feature request related to a problem? Please describe. Sometimes, singular keywords don`t capture enough information.

    Describe the solution you'd like I think it might be interesting to also allow for n-grams within the keywords metric. It might be interesting to be able to distinguish between: "not good" vs "good" vs "very good" vs "not very good".

    Describe alternatives you've considered N.A.

    Additional context N.A.

    enhancement 
    opened by davidberenstein1957 0
  • `prepare_for_training` does not work for multi-label dataset

    `prepare_for_training` does not work for multi-label dataset

    Describe the bug I cannot use multi-label dataset.prepare_for_training directly.

    To Reproduce Steps to reproduce the behavior:

    1. Go to any multi-label dataset.
    2. export the dataset via prepare_for_training
    3. use for training directly

    Expected behavior multi-label datasets ought to be delivered with https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html or they cannot be used for training

    Screenshots N.A.

    Environment (please complete the following information):

    • OS [e.g. iOS]: N.A.
    • Browser [e.g. chrome, safari]: N.A.
    • Argilla Version [e.g. 1.0.0]:1.1.1
    • ElasticSearch Version [e.g. 7.10.2]:N.A.
    • Docker Image (optional) [e.g. argilla:v1.0.0]:N.A.

    Additional context N.A.

    bug 
    opened by davidberenstein1957 0
  • add `prepare_for_training` for `sparknlp`

    add `prepare_for_training` for `sparknlp`

    Is your feature request related to a problem? Please describe. There is no integration with sparknlp

    Describe the solution you'd like I would like to see a better integration of sparknlp with Argilla. "David Berenstein Daniel Vila Suero hey.. you can probably integrate your solution with Spark NLP pipelines as well.. please see this blogpost to see several deployment solutions supported https://medium.com/spark-nlp/deploying-spark-nlp-for-healthcare-from-zero-to-hero-88949b0c866d and here are all the healthcare NLP related notebooks https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Healthcare"

    Describe alternatives you've considered N.A.

    Additional context https://www.linkedin.com/feed/update/urn:li:activity:7016072187870646272/?commentUrn=urn%3Ali%3Acomment%3A(activity%3A7016072187870646272%2C7016076058416295936)&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7016076058416295936%2Curn%3Ali%3Aactivity%3A7016072187870646272)&dashReplyUrn=urn%3Ali%3Afsd_comment%3A(7016095045325856768%2Curn%3Ali%3Aactivity%3A7016072187870646272)&replyUrn=urn%3Ali%3Acomment%3A(activity%3A7016072187870646272%2C7016095045325856768)

    enhancement 
    opened by davidberenstein1957 0
  • change password via UI

    change password via UI

    I have tried EXPORT $ARGILLA.... and following the example closely, but to no avail. I am wondering if it would be better to just allow people to change their password/add users via a webform.

    enhancement 
    opened by alanpaulkwan 1
Releases(v1.1.1)
  • v1.1.1(Nov 29, 2022)

  • v1.1.0(Nov 24, 2022)

    1.1.0 (2022-11-24)

    Highlights

    Add, update, and delete rules from a Dataset using the Python client

    You can now manage rules programmatically and reflect them in Argilla Datasets so you can iterate on labeling rules from both Python and the UI. This is especially useful for leveraging linguistic resources (such as terminological lists) and making the rules available in the UI for domain experts to refine them.

    # Read a file with keywords or phrases
    labeling_rules_df = pd.read_csv("../../_static/datasets/weak_supervision_tutorial/labeling_rules.csv")
    
    # Create rules
    predefined_labeling_rules = []
    for index, row in labeling_rules_df.iterrows():
        predefined_labeling_rules.append(
            Rule(row["query"], row["label"])
        )
    
    # Add the rules to the weak_supervision_yt dataset. The rules will be manageable from the UI
    add_rules(dataset="weak_supervision_yt", rules=predefined_labeling_rules
    

    You can find more info about this feature in the deep dive guide: https://docs.argilla.io/en/latest/guides/techniques/weak_supervision.html#3.-Building-and-analyzing-weak-labels

    Sort by timestamp fields in the UI

    Users can now sort the records by last_updated and other timestamp fields to improve the labeling and review processes

    Features

    • #1929 add warning about using wrong hostnames (#1930) (a3bc554)
    • Add, delete and edit labeling rules from Python client (#1884) (d534a29), closes #1855
    • Added more explicit error message regarding dataset name validation (#1933) (c25a225), closes #1931 #1918
    • Allow sort records by event_timestamp or last_updated fields (#1924) (1c08c36), closes #1835
    • Create a contextual help to support the user in the different dataset views (#1913) (8e3851e)
    • Enable metadata length field config by environment variable (#1923) (0ff2de7), closes #1761
    • Update error page (#1932) (caeb7d4), closes #1894
    • Using new top_k_mentions metrics instead of entity_consistency (#1880) (42f702d), closes #1834

    Bug Fixes

    Documentation

    As always, thanks to our amazing contributors!

    • docs: Link key features (#1805) (#1809) by @chschroeder
    • View Docs link in frontend header users.vue (#1915) by @bengsoon
    • fix: Change method for Doc creation by spacy.Language (#1891) by @jamnicki
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Nov 4, 2022)

  • v0.19.0(Oct 24, 2022)

  • v0.18.0(Oct 5, 2022)

    0.18.0 (2022-10-05)

    ⚡ Highlights

    Better validation of token classification records

    When working with Token Classification records, there are very often misalignment problems between the entity spans and provided tokens. Before this release, it was difficult to understand and fix these errors because validation happened on the server side.

    With this release, records are validated during instantiation, giving you a clear error message which can help you to fix/ignore problematic records.

    For example, the following record:

    import rubrix as rb
    
    rb.TokenClassificationRecord(
        tokens=["I", "love", "Paris"],
        text="I love Paris!",
        prediction=[("LOC",7,13)]
    )
    

    Will give you the following error message:

    ValueError: Following entity spans are not aligned with provided tokenization
    Spans:
    - [Paris!] defined in ...love Paris!
    Tokens:
    ['I', 'love', 'Paris']
    

    Delete records by query

    Now it's possible to delete specific records, either by ids or by a query using Lucene's syntax. This is useful for clean up and better dataset maintenance:

    import rubrix as rb
    
    ## Delete by id
    rb.delete_records(name="example-dataset", ids=[1,3,5])
    
    ## Discard records by query
    rb.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)
    

    New tutorials

    We have two new tutorials!

    Few-shot classification with SetFit and a custom dataset: https://rubrix.readthedocs.io/en/stable/tutorials/few-shot-classification-with-setfit.html

    Analyzing predictions with model explainability methods: https://rubrix.readthedocs.io/en/stable/tutorials/nlp_model_explainability.html https://rubrix.readthedocs.io/en/stable/tutorials/few-shot-classification-with-setfit.html

    Features

    Bug Fixes

    Visual enhancements

    Documentation

    • Add interpret tutorial with Transformers (#1728) (c3fa079), closes #1729
    • Adds tutorial about custom few-shot classification with SetFit (#1739) (4f15ee6), closes #1741
    • fixing the active learning tutorial with small-text (#1726) (909efdf), closes #1693
    • raise small-text version to 1.1.0 and adapt tutorial (#1744) (16f19b7), closes #1693
    • Resolve many typos in documentation, comments and tutorials (#1701) (f05e1c1)
    • using official token class. mapper since is compatible now (#1738) (e82fd13), closes #482

    As always, thanks to our amazing contributors!

    • refactor: accept flat text as input for token classification mapper (#1686) by @Ankush-Chander
    • feat(Client): improve httpx errors handling (#1662) by @Ankush-Chander
    • fix: 'MajorityVoter.score' when using multi-labels (#1678) by @dcfidalgo
    • docs: raise small-text version to 1.1.0 and adapt tutorial (#1744) by @chschroeder
    • refactor: Incompatible attribute type fixed (#1675) by @luca-digrazia
    • docs: Resolve many typos in documentation, comments and tutorials (#1701) by @tomaarsen
    • refactor: Collection of changes, primarily regarding test suite and its coverage (#1702) by @tomaarsen
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Aug 22, 2022)

    0.17.0 (2022-08-22)

    ⚡ Highlights

    Preparing a training set in the spaCy DocBin format

    prepare_for_training is a method that prepares a dataset for training. Before prepare_for_training prepared the data for easily training Hugginface Transformers.

    Now, you can prepare your training data for spaCy NER pipelines, thanks to our great community contributor @ignacioct !

    With the example below, you can export your Rubrix dataset into a Docbin, save it to disk, and then use it with the spacy train command.

    import spacy
    import rubrix as rb
    
    from datasets import load_dataset
    
    # Load annotated dataset from Rubrix
    rb_dataset = rb.load("ner_dataset")
    
    # Loading an spaCy blank language model to create the Docbin, as it works faster
    nlp = spacy.blank("en")
    
    # After this line, the file will be stored in disk
    rb_dataset.prepare_for_training(framework="spacy", lang=nlp).to_disk("train.spacy")
    

    You can find a full example at: https://rubrix.readthedocs.io/en/v0.17.0/guides/cookbook.html#Train-a-spaCy-model-by-exporting-to-Docbin

    Load large datasets using batches

    Before this release, the rb.load method to read datasets from Python retrieved the full dataset. For large datasets, this could cause high memory consumption, network timeouts, and the inability to read datasets larger than the available memory.

    Thanks to the awesome work by @maxserras. Now it's possible to optimize memory consumption and avoid network timeouts when working with large datasets. To that end, a simple batch-iteration over the whole database can be done employing the from_id parameter in the rb.load method.

    An example of reading the first 1000 records and the next batch of up to 1000 records:

    import rubrix as rb
    dataset_batch_1 = rb.load(name="example-dataset", limit=1000)
    dataset_batch_2 = rb.load(name="example-dataset", limit=1000, id_from=dataset_batch_1[-1].id)
    

    The reference to the rb.load method can be found at: https://rubrix.readthedocs.io/en/v0.17.0/reference/python/python_client.html#rubrix.load

    Larger pagination sizes for faster bulk review and annotation

    Using filters and search for data annotation and review, some users are able to filter and quickly review dozens of records in one go. To serve those users, it's now possible to see and bulk annotate 50 and 100 records in each page.

    Screenshot 2022-08-25 at 10 33 58

    Copy record text to clipboard

    Sometimes is useful to copy the text in records to use inspect it or process it with another application. Now, this is possible thanks to the feature request by our great community member and contributor @Ankush-Chander !

    Screenshot 2022-08-25 at 10 38 19

    Better error logging for generic errors

    Thanks to work done by @Ankush-Chander and @frascuchon we now have more meaningful messages for generic server errors!

    Features

    • Add new pagination size ranges (#1667) (5b4f1f2), closes #1578
    • Allow rb.load fetch records in batches passing the from_id argument (3e6344a)
    • Copy to clipboard the record text (#1625) (d634a7b), closes #1616
    • Error Logging: send error detail in response for generic server errors (#1648) (ad17631)
    • Listeners: allow using query params in the condition through search parameter (#1627) (a0a245d), closes #1622
    • prepare_for_training supports spacy (#1635) (8587808)

    Bug Fixes

    Documentation

    Visual enhancements

    You can see all work included in the release here

    • fix: Update progress bar when refreshing after adding new records (#1666) by @leiyre
    • chore: configure miniconda for readthedocs builder by @frascuchon
    • style: Small visual adjustments for Text2Text record card (#1632) by @leiyre
    • feat: Copy to clipboard the record text (#1625) by @leiyre
    • docs: Add Slack support link in README's get started (#1688) by @dvsrepo
    • chore: update version by @frascuchon
    • feat: Add new pagination size ranges (#1667) by @leiyre
    • fix: handle stream api connection errors gracefully (#1636) by @Ankush-Chander
    • feat: allow rb.load fetch records in batches passing the from_id argument by @maxserras
    • fix(Client): reusing the inner httpx client (#1640) by @frascuchon
    • feat(Error Logging): send error detail in response for generic server errors (#1648) by @frascuchon
    • docs: spacy DocBin cookbook (#1642) by @ignacioct
    • feat: prepare_for_training supports spacy (#1635) by @frascuchon
    • style: Improve card spacing (#1638) by @leiyre
    • docs: Adding Elasticsearch persistence to docker compose section (#1643) by @maxserras
    • chore: remove old rubrix client class (#1639) by @frascuchon
    • feat(Listeners): allow using query params in the condition through search parameter (#1627) by @frascuchon
    • doc: show metric graphs in documentation (#1669) by @leiyre
    • fix(docker-compose.yaml): default volume and disable disk threshold (#1656) by @frascuchon
    • fix: Encode rule name in Weak Labeling API requests (#1649) by @leiyre
    Source code(tar.gz)
    Source code(zip)
  • v0.16.1(Jul 22, 2022)

    0.16.1 (2022-07-22)

    Bug Fixes

    • 'WeakMultiLabels.summary' and 'show_records' after extending the weak label matrix (#1633) (3cb4c07), closes #1631
    • Display metadata in Text2Text dataset (#1626) (0089e0a), closes #1623
    • Show predicted OK/KO when predictions exist (#1620) (ef66e9c), closes #1619

    Documentation

    You can see all work included in the release here

    • fix: 'WeakMultiLabels.summary' and 'show_records' after extending the weak label matrix (#1633) by @dcfidalgo
    • fix: Display metadata in Text2Text dataset (#1626) by @leiyre
    • chore: set version by @dcfidalgo
    • docs: Fix typo in Getting Started -> Concepts (#1618) by @dcfidalgo
    • fix: Show predicted OK/KO when predictions exist (#1620) by @leiyre
    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(Jul 8, 2022)

    0.16.0 (2022-07-08)

    Highlights

    👂 Listeners: enable more interactive workflows between client and server

    Listeners enable you to define functions that get executed under certain conditions when something changes in a dataset. There are many use cases for this: monitoring annotation jobs, monitoring model predictions, enabling active learning workflows, and many more.

    You can find the Python API reference docs here: https://rubrix.readthedocs.io/en/stable/reference/python/python_listeners.html#python-listeners

    We will be documenting these use cases with practical examples, but for this release, we've included a new tutorial for using this with active learning: https://rubrix.readthedocs.io/en/stable/tutorials/active_learning_with_small_text.html. This tutorial includes the following listener function, which implements the active learning loop:

    from rubrix.listeners import listener
    from sklearn.metrics import accuracy_score
    
    # Define some helper variables
    LABEL2INT = trec["train"].features["label-coarse"].str2int
    ACCURACIES = []
    
    # Set up the active learning loop with the listener decorator
    @listener(
        dataset=DATASET_NAME,
        query="status:Validated AND metadata.batch_id:{batch_id}",
        condition=lambda search: search.total==NUM_SAMPLES,
        execution_interval_in_seconds=3,
        batch_id=0
    )
    def active_learning_loop(records, ctx):
    
        # 1. Update active learner
        print(f"Updating with batch_id {ctx.query_params['batch_id']} ...")
        y = np.array([LABEL2INT(rec.annotation) for rec in records])
    
        # initial update
        if ctx.query_params["batch_id"] == 0:
            indices = np.array([rec.id for rec in records])
            active_learner.initialize_data(indices, y)
        # update with the prior queried indices
        else:
            active_learner.update(y)
        print("Done!")
    
        # 2. Query active learner
        print("Querying new data points ...")
        queried_indices = active_learner.query(num_samples=NUM_SAMPLES)
        ctx.query_params["batch_id"] += 1
        new_records = [
            rb.TextClassificationRecord(
                text=trec["train"]["text"][idx],
                metadata={"batch_id": ctx.query_params["batch_id"]},
                id=idx,
            )
            for idx in queried_indices
        ]
    
        # 3. Log the batch to Rubrix
        rb.log(new_records, DATASET_NAME)
    
        # 4. Evaluate current classifier on the test set
        print("Evaluating current classifier ...")
        accuracy = accuracy_score(
            dataset_test.y,
            active_learner.classifier.predict(dataset_test),
        )
        ACCURACIES.append(accuracy)
        print("Done!")
    
        print("Waiting for annotations ...")
    

    📖 New docs!

    https://rubrix.readthedocs.io/

    Screenshot 2022-07-13 at 12 49 42

    🧱 extend_matrix: Weak label augmentation using embeddings

    This release includes an exciting feature to augment the coverage of your weak labels using embeddings. You can find a practical tutorial here: https://rubrix.readthedocs.io/en/stable/tutorials/extend_weak_labels_with_embeddings.html

    Features

    Bug Fixes

    Documentation

    • #1512: change theme to furo (#1564, #1604) (98869d2), closes #1512
    • add 'how to prepare your data for training' to basics (#1589) (a21bcf3)
    • add active learning with small text and listener tutorial (#1585, #1609) (d59573f), closes #1601 #421
    • Add MajorityVoter to references + Add comments about multi-label support of the label models (#1582) (ab481c7)
    • add pip version and dockertag as parameter in the build process (#1560) (73a31e2)

    You can see all work included in the release here

    • chore(docs): remove by @frascuchon
    • docs: add active learning with small text and listener tutorial (#1585, #1609) by @dcfidalgo
    • docs(#1512): change theme to furo (#1564, #1604) by @frascuchon
    • chore: set version by @frascuchon
    • feat(token-class): adjust token spans spaces (#1599) by @frascuchon
    • feat(#1602): new rubrix dataset listeners (#1507, #1586, #1583, #1596) by @frascuchon
    • docs: add 'how to prepare your data for training' to basics (#1589) by @dcfidalgo
    • test: configure numpy to disable multi threading (#1593) by @frascuchon
    • docs: Add MajorityVoter to references + Add comments about multi-label support of the label models (#1582) by @dcfidalgo
    • feat(#1561): standardize icons (#1565) by @leiyre
    • Feat: Improve from datasets (#1567) by @dcfidalgo
    • feat: Add 'extend_matrix' to the WeakMultiLabel class (#1577) by @dcfidalgo
    • docs: add pip version and dockertag as parameter in the build process (#1560) by @frascuchon
    • refactor: remove words references in searches (#1571) by @frascuchon
    • ci: check conda env cache (#1570) by @frascuchon
    • fix(#1264): discard first space after a token (#1591) by @frascuchon
    • ci(package): regenerate view snapshot (#1600) by @frascuchon
    • fix(#1574): search highlighting for a single dot (#1592) by @leiyre
    • fix(#1575): show predicted ok/ko in Text Classifier explore mode (#1576) by @leiyre
    • fix(#1548): access datasets for superusers when workspace is not provided (#1572, #1608) by @frascuchon
    • fix(#1551): don't show error traces for EntityNotFoundError's (#1569) by @frascuchon
    • fix: compatibility with new dataset version (#1566) by @dcfidalgo
    • fix(#1557): allow text editing when clicking the "edit" button (#1558) by @leiyre
    • fix(#1545): highlight words with accents (#1550) by @leiyre
    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(Jun 8, 2022)

    0.15.0 (2022-06-08)

    🔆 Highlights

    🏷️ Configure datasets with a labeling scheme

    You can now predefine and change the label schema of your datasets. This is useful for fixing a set of labels for you and your annotation teams.

    import rubrix as rb
    
    # Define labeling schema
    settings = rb.TextClassificationSettings(label_schema=["A", "B", "C"])
    
    # Apply seetings to a new or already existing dataset
    rb.configure_dataset(name="my_dataset", settings=settings)
    
    # Logging to the newly created dataset triggers the validation checks
    rb.log(rb.TextClassificationRecord(text="text", annotation="D"), "my_dataset")
    #BadRequestApiError: Rubrix server returned an error with http status: 400
    

    Read the docs: https://rubrix.readthedocs.io/en/stable/guides/dataset_settings.html

    🧱 Weak label matrix augmentation using embeddings

    You can now use an augmentation technique inspired by https://github.com/HazyResearch/epoxy to augment the coverage of your rules using embeddings (e.g., sentence transformers). This is useful for improving the recall of your labeling rules.

    Read the tutorial: https://rubrix.readthedocs.io/en/stable/tutorials/extend_weak_labels_with_embeddings.html

    🏛️ Tutorial Gallery

    Tutorials are now organized into different categories and with a new gallery design!

    Read the docs: https://rubrix.readthedocs.io/en/stable/tutorials/introductory.html

    🏁 Basics guide

    This is the first version of the basics guide. This guide will show you how to perform the most basic actions with Rubrix, such as uploading data or data annotation.

    Read the docs: https://rubrix.readthedocs.io/en/stable/getting_started/basics.html

    Features

    • #1134: Allow extending the weak label matrix with embeddings (#1487) (4d54994), closes #1134
    • #1432: configure datasets with a label schema (21e48c0), closes #1432
    • #1446: copy icon position in datasets list (#1448) (7c9fa52), closes #1446
    • #1460: include text hyphenation (#1469) (ec23b2d), closes #1460
    • #1463: change icon position in table header (#1473) (5172324), closes #1463
    • #1467: include animation delay for last progress bar track (#1462) (c772b74), closes #1467
    • configuraton: add elasticsearch ca_cert path variable (#1502) (f0eda12)
    • UI: improve access to actions in metadata and sort dropdowns (#1510) (8d33090), closes #1435

    Bug Fixes

    • #1522: dates metadata fields accessible for sorting (#1529) (a576ceb), closes #1522
    • #1527: check agents instead labels for predicted computation (#1528) (2f2ee2e), closes #1527
    • #1532: correct domain for filter score histogram (#1540) (7478d6c), closes #1532
    • #1533: restrict highlighted fields (3a8b8a9), closes #1533
    • #1534: fix progress in the metrics sidebar when page is refreshed (#1536) (1b572c4)
    • #1539: checkbox behavior with value 0 (#1541) (7a0ab63), closes #1539
    • metrics: compute f1 for text classification (#1530) (147d38a)
    • search: highlight only textual input fields (8b83a82), closes #1538 #1544

    New contributors

    @RafaelBod made his first contribution in https://github.com/recognai/rubrix/pull/1413

    Source code(tar.gz)
    Source code(zip)
  • v0.14.2(May 31, 2022)

    0.14.2 (2022-05-31)

    Bug Fixes

    • #1514: allow ent score None and change default value to 0.0 (#1521) (0a02c70), closes #1514
    • #1516: restore read-only to copied dataset (#1520) (5b9cf0e), closes #1516
    • #1517: stop background task when something happens to main thread (#1519) (0304f40), closes #1517
    • #1518: disable global actions checkbox when no data was found (#1525) (bf35e72), closes #1518
    • UI: remove selected metadata fields for sortable fields dropdown (#1513) (bb9482b)
    Source code(tar.gz)
    Source code(zip)
  • v0.14.1(May 20, 2022)

    0.14.1 (2022-05-20)

    Bug Fixes

    • #1447: change agent when validating records with annotation but default status (#1480) (126e6f4), closes #1447
    • #1472: hide scrollbar in scrollable components (#1490) (b056e4e), closes #1472
    • #1483: close global actions "Annotate as" selector after deselect records checkbox (#1485) (a88f8cb)
    • #1503: Count filter values when loading a dataset with a route query (#1506) (43be9b8), closes #1503
    • documentation: fix user management guide (#1511) (63f7bee), closes #1501
    • filters: sort filter values by count (#1488) (0987167), closes #1484
    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(May 10, 2022)

    0.14.0 (2022-05-10)

    Async version of rb.log

    You can now use the parameter background in the rb.log method to log records without blocking the main process. The main use case is monitoring production pipelines to do prediction monitoring. Here's an example with BentoML (you can find the full example in the updated Monitoring guide):

    from bentoml import BentoService, api, artifacts, env
    from bentoml.adapters import JsonInput
    from bentoml.frameworks.spacy import SpacyModelArtifact
    
    import rubrix as rb
    
    import spacy
    
    nlp = spacy.load("en_core_web_sm")
    
    
    @env(infer_pip_packages=True)
    @artifacts([SpacyModelArtifact("nlp")])
    class SpacyNERService(BentoService):
    
        @api(input=JsonInput(), batch=True)
        def predict(self, parsed_json_list):
            result, rb_records = ([], [])
            for index, parsed_json in enumerate(parsed_json_list):
                doc = self.artifacts.nlp(parsed_json["text"])
                prediction = [{"entity": ent.text, "label": ent.label_} for ent in doc.ents]
                rb_records.append(
                    rb.TokenClassificationRecord(
                        text=doc.text,
                        tokens=[t.text for t in doc],
                        prediction=[
                            (ent.label_, ent.start_char, ent.end_char) for ent in doc.ents
                        ],
                    )
                )
                result.append(prediction)
    
            rb.log(
                name="monitor-for-spacy-ner",
                records=rb_records,
                tags={"framework": "bentoml"},
                background=True,
                verbose=False
            ) # By using the background=True, the model latency won't be affected
    
            return result
    

    Confidence scores in Token Classification (NER)

    To store entity predictions you can attach a score using the last position of the entity tuple (label, char_start, char_end, score). Let's see an example:

    import rubrix as rb
    
    text = "Rubrix is a data science tool"
    
    record = rb.TokenClassificationRecord(
        text=text, 
        tokens=text.split(" "), 
        prediction=[("PRODUCT",  0, 6, 0.99)]
    )
    
    rb.log(record, "ner_with_scores")
    

    Then, in the web application, you and your team can use the score filter to find potentially problematic entities, like in the screenshot below:

    Screenshot 2022-05-12 at 11 49 43

    If you want to see this in action, check this blog post by David Berenstein:

    https://www.rubrix.ml/blog/concise-concepts-rubrix/

    Rule metrics sidebar

    We have a fresh new sidebar for the weak labeling mode, where you can see your overall rule metrics as you define new rules.

    This sidebar should help you quickly understand your progress:

    Screenshot 2022-05-12 at 11 52 10

    See the updated user guide here: https://rubrix.readthedocs.io/en/v0.14.0/reference/webapp/define_rules.html

    Features

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.13.3(Apr 27, 2022)

  • v0.13.2(Apr 12, 2022)

    0.13.2 (2022-04-12)

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.13.1(Apr 1, 2022)

  • v0.13.0(Mar 30, 2022)

    0.13.0 (2022-03-30)

    🗂 Multilabel weak supervision

    You can now build multilabel text classification datasets using query-based rules

    If you want to get started, check out this tutorial.

    https://user-images.githubusercontent.com/1107111/160930404-7b909f1e-b871-4e4c-b1c8-ea9eabfcad21.mp4

    🤗 Reading Hugging Face datasets from the Hub

    You can now read ANY text classification, NER, or text2text dataset directly from the Hub and load it into Rubrix.

    To understand how Rubrix datasets work check out this guide.

    rubrix_conll

    👥 Redesigned team workspaces

    Organizing teams and datasets is a key Rubrix feature. After several rounds of feedback with early users, we've completely redesigned the user experience. Let us know what you think.

    image

    You can get started and configure users and workspaces following this guide

    🔎 Guide for the query language and model

    We have included a new in-depth guide about the Lucene-based query language and data model used for search, weak labeling, loading subsets of data, and metrics.

    Features

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.12.1(Mar 11, 2022)

  • v0.11.1(Mar 11, 2022)

  • v0.12.0(Mar 8, 2022)

    0.12.0 (2022-03-08)

    Features

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Feb 20, 2022)

    0.11.0 (2022-02-19)

    Highlights

    Introducing rb.Dataset* and 🤗 Hub integration

    The Dataset classes are lightweight containers for Rubrix records. These classes facilitate importing from and exporting to different formats (e.g., pandas.DataFrame, datasets.Dataset) as well as sharing and versioning Rubrix datasets using the Hugging Face Hub.

    With this release, Rubrix users and teams can use the Hugging Face Hub to share and read both public and private Rubrix datasets for TextClassification, TokenClassification, and Text2Text datasets. This opens up a whole new world of possibilities for data reproducibility and sharing. Let's see an example:

    import rubrix as rb
    from datasets import load_datasets
    
    # 👧🏻 🏷️ Leire has labeled a text classification dataset using a local Rubrix instance
    dataset_rb = rb.load("text_classification_ds", as_pandas=False)
    
    # 👧🏻 exports a Rubrix Dataset to a hf Dataset
    dataset_ds = dataset_rb.to_datasets()
    
    # 👧🏻 🚀 Leire shares the labelled dataset with the world 
    dataset_ds.push_to_hub("text_classification_ds")
    
    # 👨 John downloads the dataset from the Hugging Face Hub
    dataset_ds = load_dataset("leire/text_classification_ds", split="train")
    
    # 👨 reads in dataset
    dataset_rb = rb.read_datasets(dataset_ds, task="TextClassification")
    
    # 👨 🏷️ logs the dataset and continues labeling with his own Rubrix instance
    rb.log(dataset_rb, "john_text_classification_ds")
    

    You can read more at https://rubrix.readthedocs.io/en/stable/guides/datasets.html

    For each record type, there’s a corresponding Dataset class called DatasetFor<RecordType>. You can look up their API in the reference section.

    Improving NER UI and UX

    The UI for Token Classification has been completely redesigned to provide a better user experience for exploration and annotation. This is the first of a set of changes focusing on annotation productivity for token classification.

    Screenshot 2022-02-21 at 12 39 22

    Features

    Bug Fixes

    • #1140: fix/make client models more consistent (#1147) (926bb16), closes #1140
    • client: parse unauthorized api error properly (#1164) (1a5a08d)
    • search: prevent metrics computation breaks searches (#1175) (9f2adc9)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Feb 20, 2022)

    0.10.0 (2022-02-12)

    Now you can use filters in the Define Rules mode (weak labeling). These filters are useful for seeing the impact of rules on specific dataset subpopulations/subsets (e.g., with certain metadata fields, annotated records, etc.):

    Screenshot 2022-02-14 at 11 56 27

    Features

    Bug Fixes

    • #1054: reduce collapsable area. Optimize for annotation (#1106) (48024ba), closes #1054
    • #1054: remove old scroll padlock button (a1d6444), closes #1054
    • #1094: remove computed record fields returned in API results (#1095) (cd61d1e), closes #1094
    • #831: Remove sort field when only one is applied (#1116) (36b276b), closes #831
    • convert pd.NaT to None for event_timestamp (#1105) (21e78e4)
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Feb 4, 2022)

    🎉 0.9.0 (2022-02-02)

    • Improve logging
    • Small improvements to the labelling module and weak labeling mode
    • Better setup documentation (python -m rubrix)

    Features

    • #932: label models now modify the prediction_agent when calling LabelModel.predict (#1049) (4a024ee), closes #932
    • #953: add additional metrics to LabelModel.score method (#979) (2887907), closes #953
    • #955: add default for rules in WeakLabels (#976) (34389d3), closes #955 #1011

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Jan 31, 2022)

    0.8.2 (2022-01-31)

    Features

    • #1036: remove prediction ok/ko in labelling rules (#1037) (672b852), closes #1036
    • #735: add warning when agent but no prediction/annotation is provided (#987) (ba88c34), closes #735

    Bug Fixes

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Jan 20, 2022)

    0.8.1 (2022-01-20)

    Bug Fixes

    • #1002: Show 0 records overall metrics when no rules defined (#1013) (a8a5c79), closes #1002 #1002
    • Breadcrumbs: copy workspace from the breadcrumbs when dataset loading has errors #1003 (33e372d), closes #844
    • statics: handle 404 errors for static files (#1006) (f4b656a)
    • #800: compute common aggregations one by one (#990) (8cf420a), closes #800
    • #800: limit number of metadata fields (#993) (bb6b76b), closes #800
    • #905: copy dataset with rules (#948) (8597b83), closes #905
    • #974: display the dropdown in the last record of the scroll (#986) (e5f8d53), closes #974
    • #977: Remove redirection when accessing login (#996) (b3fe2cb), closes #977
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1-alpha.3(Jan 20, 2022)

  • v0.8.1-alpha.2(Jan 20, 2022)

  • v0.8.1-alpha.1(Jan 19, 2022)

  • v0.8.1-alpha.0(Jan 19, 2022)

  • v0.8.0(Jan 12, 2022)

    Introducing interactive Weak labeling (Define rules mode) 🚀

    We are glad to introduce the most important feature to date: now it's possible to iterate on labeling queries directly in the UI with initial support for multi-class text classification. Multilabel and token classification support is coming soon.

    See the video for the recommended workflow:

    https://user-images.githubusercontent.com/1107111/149346471-93cbd7ee-96a2-451a-8f5e-f9e26b246407.mp4

    Check the updated tutorial: https://rubrix.readthedocs.io/en/master/tutorials/weak-supervision-with-rubrix.html

    What's changed

    • [WeakSupervision] Change load_rules import path in guide and tutorial (#939)
    • fix links to new web app reference (#936)
    • Bugfixes/avoid infinite loop when dataset loading (#934)
    • show nan instead of 0 for precision in summary (#930)
    • fix(api): include_metrics param only for search endponts (#929)
    • [Documentation] Update title page video for docs (#928)
    • update skweak tutorial (#922)
    • [Documentation] Updating the web app docu (#827)
    • publish python package to test.pypi for master and releases branches (#927)
    • [WeakLabels] Align WeakLabels.summary() with web app (#925)
    • UI: show rules without precision properly (#919)
    • chore(build): build docker images for release branches (#921)
    • Docs: Updates readme front video (#923)
    • Docs: Updates weak supervision resources (#920)
    • feat(rules): compute total & ann. coverage before label selection (#916)
    • fix(rules): compute annotated coverage when no label properly (#915)
    • Tutorial: Human-in-the-loop weak supervision with skweak (#869)
    • UI: include affected #records to overall coverage/ann. coverage metrics (#914)
    • fix lint build (#913)
    • UI: manage precision and rules without annotation coverage (#909)
    • fix(#876): process 400 response detail properly (#889)
    • feat(rules): allow compute partial query rule metrics (#907)
    • fix(security): providing default workspace should pass check (#911)
    • UI: reset filters from define rules view (#908)
    • UI: Show number of created rules in rules management view (#910)
    • UI: drop access to rule name field (#904)
    • fix(rules): prevent lost rules with dataset updates (#892)
    • fix(datasets): process owner as part of dataset id (#870)
    • (UI) Rules summary metrics format (#888)
    • UI: Improve code snippet for empty workspace (#886)
    • fix(UI): Remove case sensitive when filtering labels (#882)
    • Docs: Updates Flair zeroshot tutorial (#887)
    • removing wrong video (#885)
    • Update readme (#883)
    • fix(UI) Metrics value by default if no metric (#875)
    • feat(metrics): add token level metrics for token classification from client (#849)
    • UI: New rule metrics layout (#861)
    • chore: expose load_rules from base module (#866)
    • Docs: Regenerates graphs metrics guide (#865)
    • updating loss video (#864)
    • Docs: Update weak supervision guide (#863)
    • Update README.md (#862)
    • Fix: Link loss tutorial (#859)
    • Docs: Improve loss tutorial (#858)
    • Docs: Improve AL and ws tutorials (#857)
    • chore(ci): Include component testing configuration (#839)
    • fix/loss video updated (#853)
    • Docs: Weak supervision guide update (#855)
    • chore(app): upgrade lint dependencies (#841)
    • feat: weak supervision mode (#814)
    • Docs: Review hf tutorial (#852)
    • fix: error link to workspace home (#845)
    • fix(metrics): compute token length for each token (#850)
    • add streaming (#851)
    • fix(rules): prevent division by 0 for overall metrics (#848)
    • small change
    • [Tutorials] Update media structure, remove TLDR heading (#847)
    • Updating videos and images for sentiment classification tutorial (#846)
    • fix(rules): prevent division by zero (#843)
    • new folder and videos for model loss tutorial (#805)
    • feat(token class): add metrics at token level (#838)
    • new folder and images for active learning tutorial (#796)
    • [Tutorials] Typo fix in find label errors tutorial (#842)
    • [Tutorials] Add the new find_label_errors tutorial (#833)
    • [Rule] Modify the client API to the server's weak supervision feature (#840)
    • [LabelModel] Improve Snorkel to not modify the passed in WeakLabels object (#836)
    • feat (search): allow to filtering record metrics fields in search (#837)
    • fix(ui): remove workspace home from code snippet api url (#834)
    • ui: Hide validate button for binary cases in Text classifier (#830)
    • fix print message (#829)
    • feat: Include workspace in url path (#820)
    • fix(ui): align records and global action layouts #825
    • fix(ui): Show labels as selected after validate (#826)
    • feat(labeling rule): implements api endpoint to fetch a single rule (#817)
    • [LabelErrors] Add find_label_errors method (#775)
    • fix(ui): Fix styles in Safari (#815)
    • docs: Add contributors to readme (#822)
    • add missing rubrix import (#819)
    • new folder and images for spacy tutorial (#794)
    • feat(labeling rules): allow edition for rule label and description (#813)
    • refactor(labeling rules): optional label for rule metrics (#811)
    • Fix token alignment on CreationTokenClassificationRecord (#812)
    • feat(server): add overall dataset labeling rules metrics (#807)
    • feat(labeling rules): add coverage for annotated records (#806)
    • fix(ui): Unique ID for scroll state to avoid same state for different dataset records (#809)
    • new folder and images for zeroshot ner tutorial (#804)
    • new folder and images for zeroshot data annotation tutorial (#803)
    • fix(log): check multi-label integrity without search aggregations (#802)
    • updated images, added folder for fastapi tutorial (#801)
    • added folder for weak supervision tutorial (#795)
    • feat(weak supervision): client labeling rules from server (#799)
    • feat(server): labeling rule metrics (#790)
    • fix/edit zero-shot tutorial (#774)
    • fix/edited fastapi tutorial (#773)
    • Fix/edit ner flair tutorial (#766)
    • Fix/edit weaksupervision tutorial (#759)
    • fix(ui): Little changes in fonts (#793)
    • fix(ui): Allow open dataset in new tab from datasets list (#792)
    • feat(server): rubrix namespaces for elasticsearch indices (#789)
    • fix(ui): Show annotation after global validation (#786)
    • remove reload arg launching server using python (#787)
    • updated readme with conda install instruction (#788)
    • fix(ui): Hide scroller component when loading or paginate (#784)
    • fix(ui): allow remove metadata filter from record metadata modal (#772)
    • fix(ui): Token Classifier: validate record without annotation or prediction (#782)
    • Fix/edit active learning tutorial (#760)
    • Docs:minor changes to loss tutorial (#778)
    • Fix/edit model loss tutorial (#767)
    • fix(server): missing deprecated dep (#777)
    • fix(ui): Global validate for records without annotation or prediction (#746)
    • Fix/edit spacy tutorial (#758)
    • Fix/edit labeling tutorial (#750)
    • fix(server) - misaligned entity mentions on CreationTokenClassificationRecord (#771)
    • [Requirements] Require python>=3.7 (#770)
    • [Labeling] Add FlyingSquid label model (#755)
    • Update README.md (#769)
    • Adds Flair example to guide (#762)
    • docs: Updates huggingface examples and adds monitor for Flair (#761)
    • feat(search): show boolean values in metadata (#753)
    • feat(server): allow handle labeling rules for datasets from API (#744)
    • fix(imports): import monitoring with spacy<3.0 fails (#754)
    • [UI] new fonts families (#751)
    • fix(scroll): using new scroll component (#710)
    • fix(ui): filter "validatable" records for global action validate button (#741)
    • feat(monitor): flair ner auto-monitor (#738)

    New Contributors

    • @sugatoray made their first contribution
    • @ruanchaves made their first contribution
    Source code(tar.gz)
    Source code(zip)
  • v0.8.0-alpha.1(Jan 11, 2022)

    • Bugfixes/avoid infinite loop when dataset loading (#934)
    • show nan instead of 0 for precision in summary (#930)
    • fix(api): include_metrics param only for search endponts (#929)
    • [Documentation] Update title page video for docs (#928)
    • update skweak tutorial (#922)
    • [Documentation] Updating the web app docu (#827)
    • revert test.pypi publish
    • publish python package to test.pypi for master and releases branches (#927)
    • [WeakLabels] Align WeakLabels.summary() with web app (#925)
    • UI: show rules without precision properly (#919)
    • chore(build): build docker images for release branches (#921)
    • Docs: Updates readme front video (#923)
    • Docs: Updates weak supervision resources (#920)
    • feat(rules): compute total & ann. coverage before label selection (#916)
    • fix(rules): compute annotated coverage when no label properly (#915)
    • Tutorial: Human-in-the-loop weak supervision with skweak (#869)
    • UI: include affected #records to overall coverage/ann. coverage metrics (#914)
    • fix lint build (#913)
    • UI: manage precision and rules without annotation coverage (#909)
    • fix(#876): process 400 response detail properly (#889)
    • feat(rules): allow compute partial query rule metrics (#907)
    • fix(security): providing default workspace should pass check (#911)
    • UI: reset filters from define rules view (#908)
    • UI: Show number of created rules in rules management view (#910)
    • UI: drop access to rule name field (#904)
    • fix(rules): prevent lost rules with dataset updates (#892)
    • fix(datasets): process owner as part of dataset id (#870)
    • (UI) Rules summary metrics format (#888)
    • UI: Improve code snippet for empty workspace (#886)
    • fix(UI): Remove case sensitive when filtering labels (#882)
    • Docs: Updates Flair zeroshot tutorial (#887)
    • removing wrong video (#885)
    • Update readme (#883)
    • fix(UI) Metrics value by default if no metric (#875)
    • feat(metrics): add token level metrics for token classification from client (#849)
    • UI: New rule metrics layout (#861)
    • chore: expose load_rules from base module (#866)
    • Docs: Regenerates graphs metrics guide (#865)
    • updating loss video (#864)
    • Docs: Update weak supervision guide (#863)
    • Update README.md (#862)
    • Fix: Link loss tutorial (#859)
    • Docs: Improve loss tutorial (#858)
    • Docs: Improve AL and ws tutorials (#857)
    • chore(ci): Include component testing configuration (#839)
    • fix/loss video updated (#853)
    • Docs: Weak supervision guide update (#855)
    • chore(app): upgrade lint dependencies (#841)
    • feat: weak supervision mode (#814)
    • Docs: Review hf tutorial (#852)
    • fix: error link to workspace home (#845)
    • fix(metrics): compute token length for each token (#850)
    • chore: improve dockerignore files
    • add streaming (#851)
    • fix(rules): prevent division by 0 for overall metrics (#848)
    • small change
    • [Tutorials] Update media structure, remove TLDR heading (#847)
    • Updating videos and images for sentiment classification tutorial (#846)
    • fix(rules): prevent division by zero (#843)
    • new folder and videos for model loss tutorial (#805)
    • feat(token class): add metrics at token level (#838)
    • new folder and images for active learning tutorial (#796)
    • [Tutorials] Typo fix in find label errors tutorial (#842)
    • [Tutorials] Add the new find_label_errors tutorial (#833)
    • [Rule] Modify the client API to the server's weak supervision feature (#840)
    • [LabelModel] Improve Snorkel to not modify the passed in WeakLabels object (#836)
    • feat (search): allow to filtering record metrics fields in search (#837)
    • fix(ui): remove workspace home from code snippet api url (#834)
    • ui: Hide validate button for binary cases in Text classifier (#830)
    • fix print message (#829)
    • feat: Include workspace in url path (#820)
    • fix(ui): align records and global action layouts #825
    • fix(ui): Show labels as selected after validate (#826)
    • feat(labeling rule): implements api endpoint to fetch a single rule (#817)
    • [LabelErrors] Add find_label_errors method (#775)
    • fix(ui): Fix styles in Safari (#815)
    • docs: Add contributors to readme (#822)
    • add missing rubrix import (#819)
    • new folder and images for spacy tutorial (#794)
    • feat(labeling rules): allow edition for rule label and description (#813)
    • refactor(labeling rules): optional label for rule metrics (#811)
    • Fix token alignment on CreationTokenClassificationRecord (#812)
    • feat(server): add overall dataset labeling rules metrics (#807)
    • feat(labeling rules): add coverage for annotated records (#806)
    • fix(ui): Unique ID for scroll state to avoid same state for different dataset records (#809)
    • new folder and images for zeroshot ner tutorial (#804)
    • new folder and images for zeroshot data annotation tutorial (#803)
    • fix(log): check multi-label integrity without search aggregations (#802)
    • updated images, added folder for fastapi tutorial (#801)
    • added folder for weak supervision tutorial (#795)
    • feat(weak supervision): client labeling rules from server (#799)
    • feat(server): labeling rule metrics (#790)
    • fix/edit zero-shot tutorial (#774)
    • fix/edited fastapi tutorial (#773)
    • Fix/edit ner flair tutorial (#766)
    • Fix/edit weaksupervision tutorial (#759)
    • fix(ui): Little changes in fonts (#793)
    • fix(ui): Allow open dataset in new tab from datasets list (#792)
    • feat(server): rubrix namespaces for elasticsearch indices (#789)
    • fix(ui): Show annotation after global validation (#786)
    • remove reload arg launching server using python (#787)
    • updated readme with conda install instruction (#788)
    • fix(ui): Hide scroller component when loading or paginate (#784)
    • fix(ui): allow remove metadata filter from record metadata modal (#772)
    • fix(ui): Token Classifier: validate record without annotation or prediction (#782)
    • Fix/edit active learning tutorial (#760)
    • Docs:minor changes to loss tutorial (#778)
    • Fix/edit model loss tutorial (#767)
    • fix(server): missing deprecated dep (#777)
    • fix(ui): Global validate for records without annotation or prediction (#746)
    • Fix/edit spacy tutorial (#758)
    • Fix/edit labeling tutorial (#750)
    • fix(server) - misaligned entity mentions on CreationTokenClassificationRecord (#771)
    • [Requirements] Require python>=3.7 (#770)
    • [Labeling] Add FlyingSquid label model (#755)
    • Update README.md (#769)
    • Adds Flair example to guide (#762)
    • docs: Updates huggingface examples and adds monitor for Flair (#761)
    • feat(search): show boolean values in metadata (#753)
    • feat(server): allow handle labeling rules for datasets from API (#744)
    • fix(imports): import monitoring with spacy<3.0 fails (#754)
    • [UI] new fonts families (#751)
    • fix(scroll): using new scroll component (#710)
    • fix(ui): filter "validatable" records for global action validate button (#741)
    • feat(monitor): flair ner auto-monitor (#738)

    Full Changelog: https://github.com/recognai/rubrix/compare/v0.7.0...v0.8.0-alpha.0

    Source code(tar.gz)
    Source code(zip)
Owner
Recognai
A software company building Natural Language Processing and Machine Learning tools
Recognai
Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

null 688 Jan 3, 2023
Library for exploring and validating machine learning data

TensorFlow Data Validation TensorFlow Data Validation (TFDV) is a library for exploring and validating machine learning data. It is designed to be hig

null 520 Feb 17, 2021
Splore - a simple graphical interface for scrolling through and exploring data sets of molecules

Scroll through and exPLORE molecule sets The splore framework aims to offer a si

null 3 Jun 18, 2022
Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI

Data-Visualization-Projects Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI Indigenous-Brands-Social-Movements Pyt

Jinwoo(Roy) Yoon 1 Feb 5, 2022
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 2.4k Jan 7, 2023
The open-source tool for building high-quality datasets and computer vision models

The open-source tool for building high-quality datasets and computer vision models. Website • Docs • Try it Now • Tutorials • Examples • Blog • Commun

Voxel51 209 Feb 17, 2021
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 820 Jan 6, 2023
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 509 Feb 17, 2021
Implement the Perspective open source code in preparation for data visualization

Task Overview | Installation Instructions | Link to Module 2 Introduction Experience Technology at JP Morgan Chase Try out what real work is like in t

Abdulazeez Jimoh 1 Jan 23, 2022
metedraw is a project mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors

It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.

Nephele 11 Jul 5, 2022
RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

Matheus Breguêz 13 Oct 12, 2022
Plotly Dash Command Line Tools - Easily create and deploy Plotly Dash projects from templates

??️ dash-tools - Create and Deploy Plotly Dash Apps from Command Line | | | | | Create a templated multi-page Plotly Dash app with CLI in less than 7

Andrew Hossack 50 Dec 30, 2022
Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

null 1 Nov 8, 2021
Displaying plot of death rates from past years in Poland. Data source from these years is in readme

Average-Death-Rate Displaying plot of death rates from past years in Poland The goal collect the data from a CSV file count the ADR (Average Death Rat

Oliwier Szymański 0 Sep 12, 2021
Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Pebble is a stat's visualization tool, this will provide a skeleton to develop a monitoring tool.

Aravind Kumar G 2 Nov 17, 2021
Yata is a fast, simple and easy Data Visulaization tool, running on python dash

Yata is a fast, simple and easy Data Visulaization tool, running on python dash. The main goal of Yata is to provide a easy way for persons with little programming knowledge to visualize their data easily.

Cybercreek 3 Jun 28, 2021
Squidpy is a tool for the analysis and visualization of spatial molecular data.

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

Theis Lab 251 Dec 19, 2022
erdantic is a simple tool for drawing entity relationship diagrams (ERDs) for Python data model classes

erdantic is a simple tool for drawing entity relationship diagrams (ERDs) for Python data model classes. Diagrams are rendered using the venerable Graphviz library.

DrivenData 129 Jan 4, 2023
A command line tool for visualizing CSV/spreadsheet-like data

PerfPlotter Read data from CSV files using pandas and generate interactive plots using bokeh, which can then be embedded into HTML pages and served by

Gino Mempin 0 Jun 25, 2022