Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Jina AI

Last update: Dec 31, 2022

Related tags

Deep Learning tensorflow keras pytorch metric-learning transfer-learning pretrained-models triplet-loss paddlepaddle labeling-tool siamese-network fine-tuning few-shot-learning negative-sampling neural-search jina

Overview

Finetuning any deep neural network for better embedding on neural search tasks

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks. It accompanies Jina to deliver the last mile of performance for domain-specific neural search applications.

🎛 Designed for finetuning: a human-in-the-loop deep learning tool for leveling up your pretrained models in domain-specific neural search applications.

🔱 Powerful yet intuitive: all you need is finetuner.fit() - a one-liner that unlocks rich features such as siamese/triplet network, interactive labeling, layer pruning, weights freezing, dimensionality reduction.

⚛️ Framework-agnostic: promise an identical API & user experience on PyTorch, Tensorflow/Keras and PaddlePaddle deep learning backends.

🧈 Jina integration: buttery smooth integration with Jina, reducing the cost of context-switch between experiment and production.

How does it work

Install

Requires Python 3.7+ and one of PyTorch(>=1.9) or Tensorflow(>=2.5) or PaddlePaddle installed on Linux/MacOS.

pip install finetuner

Documentation

Usage

Usage		Do you have an embedding model?
Usage		Yes	No
Do you have labeled data?	Yes	🟠	🟡
Do you have labeled data?	No	🟢	🔵

🟠 Have embedding model and labeled data

Perfect! Now embed_model and labeled_data are given by you already, simply do:

import finetuner

finetuner.fit(
    embed_model,
    train_data=labeled_data
)

🟢 Have embedding model and unlabeled data

You have an embed_model to use, but no labeled data for finetuning this model. No worry, that's good enough already! You can use Finetuner to interactive label data and train embed_model as below:

import finetuner

finetuner.fit(
    embed_model,
    train_data=unlabeled_data,
    interactive=True
)

🟡 Have general model and labeled data

You have a general_model which does not output embeddings. Luckily you provide some labeled_data for training. No worries, Finetuner can convert your model into an embedding model and train it via:

import finetuner

finetuner.fit(
    general_model,
    train_data=labeled_data,
    to_embedding_model=True,
    output_dim=100
)

🔵 Have general model and unlabeled data

You have a general_model which is not for embeddings. Meanwhile, you don't have labeled data for training. But no worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly:

import finetuner

finetuner.fit(
    general_model,
    train_data=unlabeled_data,
    interactive=True,
    to_embedding_model=True,
    output_dim=100
)

Finetuning ResNet50 on CelebA

⚡ To get the best experience, you will need a GPU-machine for this example. For CPU users, we provide finetuning a MLP on FashionMNIST and finetuning a Bi-LSTM on CovidQA that run out the box on low-profile machines. Check out more examples in our docs!

Download CelebA-small dataset (7.7MB) and decompress it to './img_align_celeba'. Full dataset can be found here.

Finetuner accepts Jina DocumentArray/DocumentArrayMemmap, so we load CelebA image into this format using a generator:

from jina.types.document.generators import from_files

def data_gen():
    for d in from_files('./img_align_celeba/*.jpg', size=100, to_dataturi=True):
        d.convert_image_datauri_to_blob(color_axis=0)  # `color_axis=-1` for TF/Keras users
        yield d

Load pretrained ResNet50 using PyTorch/Keras/Paddle:

PyTorch

import torchvision
model = torchvision.models.resnet50(pretrained=True)

Keras

import tensorflow as tf
model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')

Paddle

import paddle
model = paddle.vision.models.resnet50(pretrained=True)

Start the Finetuner:

import finetuner

finetuner.fit(
    model=model,
    interactive=True,
    train_data=data_gen,
    freeze=True,
    to_embedding_model=True,
    input_size=(3, 224, 224),
    output_dim=100
)

After downloading the model and loading the data (takes ~20s depending on your network/CPU/GPU), your browser will open the Labeler UI as below. You can now label the relevance of celebrity faces via mouse/keyboard. The ResNet50 model will get finetuned and improved as you are labeling. If you are running this example on a CPU machine, it may take up to 20 seconds for each labeling round.

Support

Use Discussions to talk about your use cases, questions, and support queries.
Join our Slack community and chat with other Jina community members about ideas.
Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
Subscribe to the latest video tutorials on our YouTube channel

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in opensource.

Comments

docs: add colab column
text-to-text with bert

image-to-image with resnet

text-to-iamge with clip

Removed all three examples and replaced with three google colabs (links above). Embed three google colabs into the documentation page in order to make sure we only maintain a single notebook per task. How to use?

Update google colab.

Export google colab as ipynb, download to docs/notebooks folder.

Run make notebook in docs folder, will generate user-friendly markdown from notebook using jupytxt

Run make dirhtml locally to see generated notebooks.

This allows us to potentially integration test all the colabs (if we can login) end-to-end periodically using nbsphinx.

review it here

in docs:

in readme:

[ ] This PR references an open issue

[x] I have added a line about this change to CHANGELOG

area/housekeeping area/cicd size/xl area/docs area/setup
opened by bwanglzu 13
Login Error

Hi @ZiniuYu , I am trying to log in to Finetuner using finetuner.login() command.

Although, it displays 🔐 Successfully login to Jina Ecosystem! on the terminal and get a login successful message on the browser, soon after that I get the below error on the terminal.

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/dist-packages/finetuner/__init__.py", line 31, in login ft.login() File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 29, in login self._default_experiment = self._get_default_experiment() File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 43, in _get_default_experiment return self.create_experiment(name=experiment_name) File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 54, in create_experiment experiment_info = self._client.create_experiment(name=name) File "/usr/local/lib/python3.8/dist-packages/finetuner/client/client.py", line 37, in create_experiment return self._handle_request( File "/usr/local/lib/python3.8/dist-packages/finetuner/client/base.py", line 77, in _handle_request raise FinetunerServerError( finetuner.exception.FinetunerServerError: Unprocessable Entity (422): [{'loc': ['body', 'name'], 'msg': 'Use a non-empty name for your experiment.', 'type': 'value_error'}]

Any help on this?

opened by RaiAmanRai 8
feat: do not send embedding model but definition + checkpoint

** This is a proposal and is in progress**

I am trying to get working around the finetuner, and one thing I would like to see is to remove this dependency of sharing the executor in a different thread.

I think that with this approach we can bypass this information. I am sure paddle has the same interfaces but I did not find. I will see if I take time to dig deeper.

Otherwise this is more of an inspiration work

opened by JoanFM 8
Tagging of specific class based on seed instance.

Say I want to quickly label many examples of a single class. Is there a way I can use a seed example of that class and use a combination of the nearest neighbour technique and my input to quickly label several 100?

opened by GeorgePearse 7
docs: copyedit readme
Ticket: https://github.com/jina-ai/team-tech-content/issues/37

General copyediting and language fixes to the README.md file.

[X] This PR references an open issue

[ ] I have added a line about this change to CHANGELOG

size/s
opened by scott-martens 6
docs: how it works
Added how it works section and the documentation structure:

see it lively here: https://ft-docs-how-it-works--jina-docs.netlify.app/docs-how-it-works/

please review:

index page

how it works section

documentation structure

size/m area/docs
opened by bwanglzu 5
Message jina.RequestProto exceeds maximum protobuf size

Hey, thanks for the great package! I'm currently testing the fine-tuning for some image detection. I'm not passing images to the model but embeddings (I'm training an extension only), so my document looks like this (embedding is here just a 128-dim vector). My model is PyTorch 2 linear layers that inference on that embedding and spit out 128 dimensional vector.

The UI works great but after doing a couple rounds of annotations I repeatedly run into the following error:

Any idea what could be responsible for that error?
type/bug priority/important-soon

opened by LemurPwned 5
Introduce catalog + ndcg
This PR contains the following changes:

add catalog to Labeler and tuner API

for Labeler: catalog performs best if it is a DocumentArrayMemmap, since no copying of data is necessary.

for tuner: no special requirements of the datatype

changed how the frontend requests new Documents from the backend. The decision, which Documents are shown next lies with the backend now.

toydata (both QA and FMNIST) return now a data generator and a catalog. If pre_init_generator is set to False, it will return a callable, which will return the generator. The precalculation of the catalog takes a little longer than before. This makes test take longer.

Tuner can now compute metrices. For now hits and NDCG is implemented.

the result of Tuner.fit will now be a TunerStats object. This object allows easy printing and saving to file.

fixed a whole lot of tests in order to respect the new interfaces.

TODO:

docstrings

area/testing area/core component/tuner component/misc component/labeler size/l area/docs
opened by maximilianwerk 5
tailor: unify all test models
unify all test models, make sure the model as exactly the same structure, including dense, simple_cnn, vgg and lstm, add bert to test models.

more robust test on lstm.

given the same test models produced by 1, make sure three tailors produce exactly the same output model.
opened by bwanglzu 5
feat: add csv parsing for meshes and tutorial
Add csv parsing for meshes and tutorial

[x] This PR references an open issue https://github.com/jina-ai/finetuner-core/issues/420

[x] I have added a line about this change to CHANGELOG

area/testing area/core area/entrypoint size/l area/docs area/setup
opened by guenthermi 4
Add before/after comparisons at the end of notebooks

Each of our notebooks currently describe the process of finetuning a model, but don't provide any direct comparisons of the search results between the zero-shot and the finetuned models. A before/after section should be added to each notebook, they should show example queries and the top result(s) returned by both the zero-shot and the pre-trained model

opened by LMMilliken 4
skip malformed training data

as was suggested by @Callum, we do not have any error handling on the server side, it is better to skip malformed training during training to increase the robustness of the training pipeline.

opened by bwanglzu 0
CI: communicate status of core-ci with core
This pr adds a step at the start and end of the remote-ci workflow that updates a comment in the pr of the branch that is being tested, updating the progress and outcome of the job.

[x] This PR references an open issue

[ ] I have added a line about this change to CHANGELOG

size/s area/housekeeping area/cicd area/docs
opened by LMMilliken 0
ci: authenticate repo-dispatch
Currently, the core-ci workflow will always run the core tests, regardless of who triggered the workflow. This pr requires that a github token be passed in the body of the repository dispatch that triggers the core-ci, and this token is then used when pulling core to test. If the token is not valid, then core wont be pulled and the tests wont run

[ ] This PR references an open issue

[x] I have added a line about this change to CHANGELOG

size/s area/housekeeping area/cicd area/docs
opened by LMMilliken 1
update the numbers in documentation after bumping docarray
for all the notebook, we need to run the experiments again, collect new data and get it updated. This will offer:

more metrics include recall

before-after comparison, the numbers.

This should happen before 0.7 release
opened by bwanglzu 0
Add support for CSVs based on binary relevance judgement
Currently, training/evaluation data for finetuning can only be passed as CSV files if the rows either consist of pairs of similar items, or an item followed by a label. CSVs should also be able to be passed in the following format:

query, document, relevancy "my example query", "a candidate retrieval results", 1 "my example query", "another candidate retrieval result", 0 ...
opened by LMMilliken 0

Releases(v0.6.7)

v0.6.7(Nov 25, 2022)
Release Note Finetuner 0.6.7

This release contains 4 new features.

🆕 Features

Add support for cross-modal evaluation in the EvaluationCallback (#615)

In previous versions of Finetuner, when using the EvaluationCallback to calculate IR metrics, you could only use a single model to encode both the query and the index data. This means that for training multiple models at the same time, like in CLIP fine-tuning, you could only use one encoder for evaluation. It is now possible to do cross-modal evaluation, where you use one model for encoding the query data and a second model for encoding the index data. This is useful in multi-modal tasks like text-to-image.

For doing the cross-modal evaluation, all you need to do is specify the model and index_model arguments in the EvaluationCallback, like so:

import finetuner from finetuner.callback import EvaluationCallback run = finetuner.fit( model='openai/clip-vit-base-patch32', train_data=train_data, eval_data=eval_data, loss='CLIPLoss', callbacks=[ EvaluationCallback( query_data=query_data, index_data=index_data, model='clip-text', index_model='clip-vision' ) ] )

See the EvaluationCallback section of the Finetuner documentation for details on using this callback. See also the sections Text-to-Image Search via CLIP and Multilingual Text-to-Image search with MultilingualCLIP for concrete examples of cross-modal evaluation.

Add support for Multilingual CLIP (#611)

Finetuner now supports a Multilingual CLIP model from the OpenCLIP project. Multilingual CLIP models are trained on large text and image datasets from different languages using the CLIP constrastive learning approach.

They are a good fit for text-to-image applications where texts are in languages other than English.

The currently supported Multilingual CLIP model - xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k - uses a ViT Base32 image encoder and an XLM Roberta Base text encoder.

You can find details on how to fine-tune this specific model in the Multilingual Text-to-Image search with MultilingualCLIP section of the documentation.

import finetuner run = finetuner.fit( model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k', train_data=train_data, eval_data=eval_data, epochs=5, learning_rate=1e-6, loss='CLIPLoss', device='cuda', )

Filter models by task in finetuner.describe_models() (#610)

The finetuner.describe_models() function, which provides an overview of supported model backbones, now accepts an optional task argument that filters the models by task.

To display all models you can omit the argument.

import finetuner finetuner.describe_models()

To filter based on task, you need to provide a valid task name. For example:

finetuner.describe_models(task='image-to-image')

or

finetuner.describe_models(task='text-to-image')

Currently valid task names are text-to-text, text-to-image and image-to-image.

Configure the num_items_per_class argument in finetuner.fit() (#614)

The finetuner.fit() method now includes a new argument num_items_per_class that allows you to set the number of items per label that will be included in each batch. This gives the user the ability to further tailor batch construction to their liking. If not set, this argument has a default value of 4, compatible with the previous versions of Finetuner.

You can easily set this when calling finetuner.fit():

import finetuner run = finetuner.fit( model='efficient_b0', train_data=train_data, eval_data=eval_data, batch_size=128, num_items_per_class=8, )

⚠️ The batch size needs to be a multiple of the number of items per class, in other words batch_size % num_items_per_class == 0. Otherwise Finetuner cannot respect the given num_items_per_class and throws an error.

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)

Michael Günther (@guenthermi)

Louis Milliken (@LMMilliken)

George Mastrapas (@gmastrapas)

Source code(tar.gz)
Source code(zip)
v0.6.5(Nov 11, 2022)
Release Note Finetuner 0.6.5

This release contains 6 new features, 1 bug fix, 2 refactorings, and 2 documentation improvements.

🆕 Features

Support loading training data and evaluation data from CSV files (#592)

We now support CSV files in the finetuner.fit()method. This simplifies training because it is no longer necessary to construct a DocumentArray object to contain training data. Instead, you can use a CSV file that contains the training data or pointers (i.e. URIs) to the relevant data objects.

train_data = '/path/to/data.csv' run = finetuner.fit( model='efficientnet_b0', train_data=train_data, )

See the Finetuner documentation page for preparing CSV files for more information.

You can also provide CSV files for evaluation data, as well as for query and index data when using EvaluationCallback. See the EvaluationCallback page in the Finetuner documentation for more information.

import finetuner from finetuner.callback import EvaluationCallback finetuner.fit( model='efficient_b0', train_data='/path/to/train.csv', eval_data='/path/to/eval.csv', callbacks=[ EvaluationCallback( query_data='/path/to/query.csv', index_data='/path/to/index.csv', ) ] )

Support for data in lists when encoding (#598)

The finetuner.encode() method now takes lists of texts or image URIs as well as DocumentArray objects as inputs. This simplifies encoding because it is no longer necessary to construct a DocumentArray object to contain data.

model = finetuner.get_model('/path/to/YOUR-MODEL.zip') texts = ['some text to encode'] embeddings = finetuner.encode(model=model, data=texts)

See the Finetuner documentation page for encoding documents for more information.

Artifact sharing (#602)

Users can now share their model artifacts with anyone who has access to Jina and has the artifact ID by adding the public=True flag to finetuner.fit(). By default, artifacts are set to private, equivalent to public=False.

finetuner.fit( model=model_name train_data=data, public=True, )

See the Finetuner documentation for advanced job options for more information.

Allow access_paths for FinetunerExecutor

The FinetunerExecutor now takes an optional argument access_paths that allows users to specify a traversal path through an array of nested Document instances. The executor only processes those document chunks specified by the traversal path.

See the FinetunerExecutor documentation and the DocArray documentation for information on constructing document paths.

Allow logger callback for Weights & Biases during Finetuner runs

You can now use the Weights & Biases logger callback to track metrics for your finetuner run, using anonymous mode. After finetuning runs are finished, users receive a URL in the logs that points to a Weights & Biases web page with the tracked metrics of the run. This log is temporary (automatically deleted after seven days if unclaimed), and users can claim it by logging in with their Weights & Biases account credentials.

wandb: Currently logged in as: anony-mouse-279369. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.13.5 wandb: Run data is saved locally in [YOUR-PATH] wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run cool-wildflower-2 wandb: View project at https://wandb.ai/anony-mouse-279369/[YOUR-PROJECT-URL] wandb: View run at https://wandb.ai/anony-mouse-279369/[YOUR-RUN-URL]

See the Finetuner documentation page on callbacks for more information.

Support for image blobs

We now support DocumentArray image blobs in Finetuner. It is no longer necessary to directly convert images into tensors before sending them to the cloud.

You can convert image filepaths or URIs to blobs with the Document.load_uri_to_blob() method.

This saves a lot of memory and bandwidth since blobs are stored in their native, typically compressed format. Blobs are usually as small as 10% of the size of their corresponding tensor.

d = Document(uri='tests/resources/lena.png') d.load_uri_to_blob()

If you use CSV to input local image files to Finetuner, this conversion happens automatically by default.

⚙ Refactoring

Bump Hubble SDK version to 0.23.3 (#594)

We have updated Finetuner to the latest version of Hubble, improving functionality and particularly improving access from code running in notebooks.

We will deprecate the method finetuner.notebook_login() starting from version 0.7 of Finetuner. Inside notebooks, finetuner.login() will now detect the environment automatically.

Remove connect function (#596)

We have removed the finetuner.connect() method, since Finetuner no longer requires you to log in to Jina again if you are already logged in.

🐞 Bug Fixes

Fix executor _finetuner import

This bug caused the Finetuner executor to fail to start, and we have fixed the underlying issue.

📗 Documentation Improvements

Document the force argument to finetuner.login() (#596)

We documented the force parameter to finetuner.login(), which forces users to log in to Jina again, even if already logged in.

Update Image-to-Image example (#599)

We have changed the configuration and training sets in the examples in the Image-to-Image Search via ResNet50 documentation page.

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)

Michael Günther (@guenthermi)

Louis Milliken (@LMMilliken)

Isabelle Mohr (@violenil)

George Mastrapas (@gmastrapas)

Source code(tar.gz)
Source code(zip)
v0.6.4(Oct 27, 2022)
Release Note Finetuner 0.6.4

This release contains 6 new features, 1 bug fix and 1 documentation improvement.

🆕 Features

User-friendly login from Python notebooks (#576)

We've added the method finetuner.notebook_login() as a new method for logging in from notebooks like Jupyter in a more user-friendly way.

Change device specification argument in finetuner.fit() (#577)

We've deprecated the cpu argument to the finetuner.fit() method, replacing it with the device argument.

Instead of specifying cpu=False, for a GPU run, you should now use device='cuda'; and for a CPU run, instead of cpu=True, use device='cpu'.

The default is equivalent to device='cuda'. Unless you're certain that your Finetuner job will run quickly on a CPU, you should use the default argument.

We expect to remove the cpu argument entirely in version 0.7, which will break any old code still using it.

Validate Finetuner run arguments on the client side (#579)

The Finetuner client now checks that the arguments to Finetuner runs are coherent and at least partially valid, before transmitting them to the cloud infrastructure. Not all arguments can be validated on the client-side, but the Finetuner client now checks all the ones that can.

Update names of OpenCLIP models (#580)

We have changed the names of open-access CLIP models available via Finetuner to be compatible with CLIP-as-Service. For example, the model previously referenced as ViT-B-16#openai is now ViT-B-16::openai.

Add method finetuner.build_model() to load pre-trained models without fine-tuning (#584)

Previously, it was not possible to load a pre-trained model via Finetuner without performing some retraining or 'fine-tuning' on it. Now it is possible to get a pre-trained model, as is, and use it via Finetuner immediately.

For example, to use a BERT model in the finetuner without any fine-tuning:

import finetuner from docarray import Document, DocumentArray model = finetuner.build_model('bert-base-cased') # load pre-trained model documents = DocumentArray([Document(text='example text 1'), Document(text='example text 2')]) finetuner.encode(model=model, data=documents) # encode texts without having done any fine-tuning assert documents.embeddings.shape == (2, 768)

Show progress while encoding documents (#586)

You will now see a progress bar when using finetuner.encode().

🐞 Bug Fixes

Fix GPU-availability issues

We have observed some problems with GPU availability in Finetuner's use of Jina AI's cloud infrastructure. We've fully analyzed and repaired these issues.

📗 Documentation Improvements

Add Colab links to Finetuning Tasks pages (#583)

We have added runnable Google Colab notebooks for the examples in the Finetuning Tasks documentation pages: Text-to-Text, Image-to-Image, and Text-to-Image.

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)

Michael Günther (@guenthermi)

George Mastrapas (@gmastrapas)

Louis Milliken (@LMMilliken)

Source code(tar.gz)
Source code(zip)
v0.6.3(Oct 13, 2022)
Release Note

This release contains 2 new features, 2 bug fixes, and 1 documentation improvement.

🆕 Features

Allocate more GPU memory in GPU environments

Previously, the run scheduler was allocating 16GB of VRAM for GPU runs. Now, it allocates 24GB.

Users can now fine-tune significantly larger models and use larger batch sizes.

Add WiSE-FT to CLIP finetuning (#571)

WiSE-FT is a recent development that has proven to be an effective way to fine-tune models with a strong zero-shot capability, such as CLIP. We have added it to Finetuner along with documentation on its use.

Finetuner allows you to apply WiSE-FT easily using WiSEFTCallback. Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:

from finetuner.callbacks import WiSEFTCallback run = finetuner.fit( model='ViT-B-32#openai', ..., loss='CLIPLoss', callbacks=[WiSEFTCallback(alpha=0.5)], )

See the documentation for advice on how to set alpha.

🐞 Bug Fixes

Fix Image Normalization for CLIP Models (#569)

Finetuner's image processing was not identical to that used by OpenAI for training CLIP, potentially leading to inconsistent results.

The new version fixes the bug and matches OpenAI's preprocessing.

Add open_clip to FinetunerExecutor requirements

The previous version of FinetunerExecutor failed to include the open_clip package in its requirements, forcing users to add it manually to their executors. This has now been repaired.

📗 Documentation Improvements

Add callbacks documentation (#564)

There is now full documentation for using callbacks with the Finetuner.

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)

Louis Milliken (@LMMilliken)

Michael Günther (@guenthermi)

George Mastrapas (@gmastrapas)

Source code(tar.gz)
Source code(zip)
v0.6.2(Sep 29, 2022)
Release Note

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

What's in this Release?

This release covers Finetuner version 0.6.2, including dependencies finetuner-api 0.4.1 and finetuner-core 0.10.2.

It contains 3 new features and 1 bug fix.

🆕 Features

Finetuner can now produce PyTorch models

Previously, Finetuner only produced ONNX models. Users can now choose between ONNX and PyTorch models.

⚠️ PyTorch is now the default format for Finetuner output.

To select ONNX you must add the to_onnx flag to calls to finetuner.fit():

run = finetuner.fit( ..., to_onnx=True, )

You must also add the flag to calls to finetuner.get_model() to use an ONNX model directly with DocArray:

model = finetuner.get_model(..., is_onnx=True)

To use an ONNX model inside a Jina Flow:

f = Flow().add(uses='jinahub+docker://FinetunerExecutor/v0.10.2', uses_with={'is_onnx': True})

Resubmit jobs automatically

Previously, when submitting a request for Finetuner to use cloud computing resources, if the request failed, the job would fail and the user would have to resubmit it. Now, the job will be resubmitted automatically up to five times, before failing completely.

Concise and more readable log messages

We have improved the logging in Finetuner to provide fewer and more readable messages for users.

🐞 Bug Fixes

Require ONNX runtime version > 1.11.1

This bug was causing version incompatibility errors for users of Python 3.10.

The new version fixes the bug and makes Finetuner fully compatible with the latest Python releases.

🤟 Contributors

We would like to thank all contributors to this release:

Michael Günther(@guenthermi)

Zhaofeng Miao(@mapleeit)

George Mastrapas(@gmastrapas)

Wang Bo(@bwanglzu)

Source code(tar.gz)
Source code(zip)
v0.6.1(Sep 27, 2022)
[0.6.1] - 2022-09-27

Added

Add finetuner_version equal to the stubs version in the create run request. (#552)

Removed

Changed

Bump hubble client version. (#546)

Fixed

Preserve request headers in redirects to the same domain. (#552)

Docs

Improve example and install documentation. (#534)

Update finetuner executor version in docs. (#543)

Source code(tar.gz)
Source code(zip)
v0.6.0(Sep 9, 2022)
[0.6.0] - 2022-09-09

Added

Add get_model and encode method to encode docarray. (#522)

Add connect function to package (#532)

Removed

Changed

Incorporate commons and stubs to use shared components. (#522)

Improve usability of stream_logs. (#522)

Improve describe_models with open-clip models. (#528)

Use stream logging in the README example (#532)

Fixed

Print logs before run status is STARTED. (#531)

Docs

Add inference session in examples. (#529)

Source code(tar.gz)
Source code(zip)
v0.5.2(Aug 31, 2022)
[0.5.2] - 2022-08-31

Added

Enable wandb callback. (#494)

Support log streaming in finetuner client. (#504)

Support optimizer and miner options #517

Removed

Changed

Mark fit as login required. (#494)

Fixed

Replace the artifact name from dot to dash. (#519)

Docs

Fix google analytics Id for docs. (#499)

Update sphinx-markdown-table to v0.0.16 to get this fix (#499)

Place install instructions in the documentation more prominent (#518)

Source code(tar.gz)
Source code(zip)
v0.5.1(Jul 15, 2022)
[0.5.1] - 2022-07-15

Added

Add artifact id and token interface to improve usability. (#485)

Removed

Changed

save_artifact should show progress while downloading. (#483)

Give more flexibility on dependency versions. (#483)

Bump jina-hubble-sdk to 0.8.1. (#488)

Improve integration section in documentation. (#492)

Bump docarray to 0.13.31. (#492)

Fixed

Use uri to represent image content in documentation creating training data code snippet. (#484)

Remove out-dated CLIP-specific documentation. (#491)

Source code(tar.gz)
Source code(zip)
v0.5.0(Jun 30, 2022)
[v0.5.0] - 2022-06-30

Added

Merge dev to main. (#477)

Docs 0.4.1 backup. (#462)

Add CD back with semantic release. (#472)

Removed

Changed

Refactor the guide for image to image search. (#458)

Refactor the guide for text to image search. (#459)

Refactor the default hyper-params and docstring format. (#465)

Various updates on style, how-to and templates. (#462)

Remove time column from Readme table. (#468)

Change release trigger to push to main branch. (#478)

Fixed

Use finetuner docs links in docs instead of netlify. (#475)

Use twine pypi release . (#480)

Source code(tar.gz)
Source code(zip)
v0.4.1(Feb 17, 2022)
Release Note (0.4.1)

Release time: 2022-02-17 15:25:53

🙇 We'd like to thank all contributors for this new release! In particular, Jie Fu, Michael Günther, Aziz Belaweid, CatStark, Wang Bo, Yanlong Wang, Tadej Svetina, Florian Hönicke, Jina Dev Bot, 🙇

🐞 Bug fixes

[6314a0dd] - use small batch size by default (#366) (Wang Bo)

[cb0e3d5e] - shuffle batches (#351) (Florian Hönicke)

🧼 Code Refactoring

[fb0dc916] - add device option to tailor (Michael Günther)

📗 Documentation

[f4807162] - fix code mesh tutorial (#372) (Aziz Belaweid)

[9edcbe68] - fix typos in tll tutorial(#370) (CatStark)

[ae655e9b] - use new docsqa server address (#364) (Yanlong Wang)

[dc9d306f] - change ResNet18 to ResNet50 in README example (#362) (Michael Günther)

🏁 Unit Test and CICD

[7aeb8861] - fix gpu (#365) (Tadej Svetina)

🍹 Other Improvements

[adf7b2ee] - Docs onnx tutorial (#373) (Jie Fu)

[40aa8087] - version: the next version will be 0.4.1 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.4.0(Jan 27, 2022)
Release Note (0.4.1)

Release time: 2022-01-27 15:16:47

🙇 We'd like to thank all contributors for this new release! In particular, Wang Bo, George Mastrapas, Aziz Belaweid, Tadej Svetina, Yanlong Wang, Gregor von Dulong, Han Xiao, Nan Wang, Jina Dev Bot, 🙇

🆕 New Features

[82238b14] - use da.evaluate in Evaluator + configurable metrics (#352) (George Mastrapas)

[ffce20cd] - use docarray package and remove labeler (#338) (Tadej Svetina)

[e3543f7b] - self supervised learning integration (#336) (Wang Bo)

[fc326873] - tuner: onnx model dump load 280 (#308) (Gregor von Dulong)

[b760da02] - add NTXent loss (#326) (Tadej Svetina)

[554878ea] - tuner: add default projection head for ssl (#316) (Wang Bo)

[bc25c379] - add default preprocess fn for ssl (#331) (Wang Bo)

[2e1c5b7c] - evaluator integration (#284) (George Mastrapas)

[4b245eb6] - add unlabeled data classes (#320) (Tadej Svetina)

[fddf0bf7] - model checkpoint (#249) (Aziz Belaweid)

[382ebd71] - early stop callback (#266) (Aziz Belaweid)

🐞 Bug fixes

[36ad4a27] - module level PyTorch collate all function (#354) (George Mastrapas)

[6ef74f13] - do not normalize float images (#342) (Wang Bo)

[68f2ea2c] - use replace sort with sorted (#341) (Wang Bo)

[ec28b04f] - add normalization to preprocessor (#340) (Wang Bo)

[86c5f283] - remove double freeze from tutorial (#324) (Gregor von Dulong)

[7d3c05a5] - cd tests (#318) (Tadej Svetina)

[8ff56499] - docs: fix bottom github link (#310) (Yanlong Wang)

[0eeb4b70] - tuner: logging (#303) (Tadej Svetina)

[c75b6701] - make sure logging is correct (#296) (Aziz Belaweid)

[d1b8bd91] - qa-bot: fix link reference and width style (#292) (Yanlong Wang)

[9a51e1a1] - handle exceptions in callbacks (#286) (Tadej Svetina)

🧼 Code Refactoring

[0fd54bf0] - adjust readme after remove labeler (#350) (Wang Bo)

[a3b0a1db] - doc-bot: migrate to <jina-qa-bot> (#283) (Yanlong Wang)

📗 Documentation

[bb8e974c] - clean up readme and ndcg (#359) (Wang Bo)

[79581b47] - add text tutorial (#357) (George Mastrapas)

[7c6a9f51] - fix labeler docs adjust quick start (#358) (Wang Bo)

[e8058192] - 3d mesh finetuning tutorial (#345) (Aziz Belaweid)

[4f9f4a32] - rename bottleneck to projection head (#356) (Wang Bo)

[ec170afe] - clean up docs (#355) (Wang Bo)

[42ca1498] - bump qabot (#330) (Yanlong Wang)

[0eae3206] - add checkpoints to documentation (#312) (Tadej Svetina)

[7fef9df6] - fix section title in docs (Han Xiao)

[a5f956ad] - adjust readme based on new release (#302) (Wang Bo)

[7cc242f4] - add tll tutorial (#285) (Wang Bo)

🍹 Other Improvements

[b86c0e64] - cd: fix cd if condition (Han Xiao)

[49813c95] - docs: add docarray link to sidebar (Han Xiao)

[b4a10f7e] - remove doc building from CD (#317) (Tadej Svetina)

[954acce7] - CI improvements (#305) (Tadej Svetina)

[c796bdd4] - fix typos in covid-qa (#294) (Nan Wang)

[c51b044a] - adapt to the latest changes (#267) (Nan Wang)

[61e99141] - version: the next version will be 0.3.1 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.3.0(Dec 16, 2021)
Minor release 0.3.0

New features

Tailor now allows you to freeze weights by layer names freeze=['layer1', 'layer2'] and attach a customised bottleneck net module bottleneck_net=MLP() on top of your embedding model #230, #238.

Finetuner now support callbacks. Callbacks can be triggered on model training process, and we've implemented several built-in callbacks such as WandBLogger which could log your tranining progress with Weights and Biases #231, #237.

Built-in different mining strategy, such as hard negative mining, can be plug-into loss fucntions, such as TripletLoss(miner=TripletEasyHardMiner(neg_strategy='semihard') ) #236.

Learning rate scheduler support on batch or epoch level using scheduler_step #248.

Multiprocess data loading now supports with Pytorch and PaddlePaddle backend with num_workers #263.

Built-in Evaluator support with different metrics supported, such as precision, recall, mAP, nDCG etc #223, #224.

Bug fixes & Refactoring & Testing

Make blob property writable with Pytorch backend #244.

Now the reserved tag for finetuner change to finetuner_label #251.

Code consistency improvement in embed and preprocessing #256, #255.

Minor bug fixed includs type casting #268, unit/integration test improvement #264, #253, DocArray import refactoring after we split docarray as a seperate project #277, #275.

🙇 We'd like to thank all contributors for this new release! In particular, Tadej Svetina, Wang Bo, George Mastrapas, Gregor von Dulong, Aziz Belaweid, Han Xiao, Mohammad Kalim Akram, Deepankar Mahapatro, Nan Wang, Maximilian Werk, Roshan Jossy, Jina Dev Bot, 🙇
Source code(tar.gz)
Source code(zip)
v0.2.4(Nov 24, 2021)
Release Note (0.2.4)

Release time: 2021-11-24 16:13:58

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, 🙇

📗 Documentation

[b7ff2920] - fix doc gen (Han Xiao)

🍹 Other Improvements

[67c66fe4] - bump jina min req. version (Han Xiao)

[c39f2a2b] - version: the next version will be 0.2.4 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.2.3(Nov 24, 2021)
Release Note (0.2.3)

Release time: 2021-11-24 14:08:12

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Deepankar Mahapatro, Yanlong Wang, Tadej Svetina, Jina Dev Bot, 🙇

🐞 Bug fixes

[88f37a29] - docbot: feedback tooltip ui style (#222) (Yanlong Wang)

🧼 Code Refactoring

[2d9e9d72] - dataset: make preprocess_fn return any (#217) (Han Xiao)

📗 Documentation

[62214aa2] - fix css layout of versions (Han Xiao)

[08336e87] - dataset: restructure docs on datasets (#226) (Han Xiao)

[6e5934ba] - versioning (#225) (Deepankar Mahapatro)

[97639dac] - improve docstring for preprocess_fn (#221) (Tadej Svetina)

🍹 Other Improvements

[670adbe0] - version: the next version will be 0.2.3 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.2.2(Nov 21, 2021)
Release Note (0.2.2)

Release time: 2021-11-21 21:14:37

🙇 We'd like to thank all contributors for this new release! In particular, Yanlong Wang, Han Xiao, Jina Dev Bot, 🙇

🐞 Bug fixes

[89511dc9] - docbot overflow and scrolling (#216) (Yanlong Wang)

🧼 Code Refactoring

[7778855e] - dataset: make preprocess_fn work on document (#215) (Han Xiao)

🍹 Other Improvements

[55e0888e] - fix readme (Han Xiao)

[dc526452] - version: the next version will be 0.2.2 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.2.1(Nov 20, 2021)
Release Note (0.2.1)

Release time: 2021-11-20 19:39:37

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Tadej Svetina, Jina Dev Bot, 🙇

🧼 Code Refactoring

[d70546ac] - sampling: make num_items_per_class optional (#214) (Han Xiao)

📗 Documentation

[acc6e388] - tutorial: add swiss roll tutorial (Han Xiao)

[8748e9ee] - labeler: fix docstring (#213) (Tadej Svetina)

🍹 Other Improvements

[23d8ca80] - remove notebook from static (Han Xiao)

[5b0b9a1d] - version: the next version will be 0.2.1 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.2.0(Nov 19, 2021)
Release Note (0.2.0)

Release time: 2021-11-19 14:22:57

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Yanlong Wang, Tadej Svetina, Wang Bo, Jina Dev Bot, 🙇

🆕 New Features

[f920fe25] - reformat pipeline (#192) (Tadej Svetina)

🐞 Bug fixes

[fe67bb92] - docs celeba (#211) (Tadej Svetina)

[e0f81474] - make get_framework robust (#207) (Tadej Svetina)

[376f4028] - tailor: fix to emebdding model (#196) (Wang Bo)

📗 Documentation

[0a67481d] - fix doc-bot style during load (#212) (Yanlong Wang)

🍹 Other Improvements

[c6041fde] - version: set next version to 0.2.0 (Han Xiao)

[20eb41c2] - style: fix coding style optimize imports (Han Xiao)

[6539237b] - version: the next version will be 0.1.6 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.5(Nov 8, 2021)
Release Note (0.1.5)

Release time: 2021-11-08 10:20:47

🙇 We'd like to thank all contributors for this new release! In particular, Roshan Jossy, Han Xiao, Wang Bo, Tadej Svetina, Jina Dev Bot, 🙇

🆕 New Features

[531d9052] - tuner: add miner for session dataset (#184) (Wang Bo)

[77df7676] - reformat data loading (#181) (Tadej Svetina)

🐞 Bug fixes

[3d5fc769] - embedding: fix embedding train/eval time behavior (#190) (Han Xiao)

🏁 Unit Test and CICD

[d80a4d0f] - embedding: add test for #190 (Han Xiao)

[fd1fe384] - upgrade tf version (#189) (Wang Bo)

[5059e202] - pin framework version (#188) (Wang Bo)

🍹 Other Improvements

[e1a73434] - labeler: add component for audio matches (#185) (Roshan Jossy)

[717f06a0] - version: the next version will be 0.1.5 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.4(Nov 2, 2021)
Release Note (0.1.4)

Release time: 2021-11-02 21:06:01

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Aziz Belaweid, Jina Dev Bot, 🙇

🆕 New Features

[1e4a1aee] - tuner: add miner v1 (#180) (Wang Bo)

[ae8e3990] - helper: add batch_size to embed fn (#183) (Han Xiao)

📗 Documentation

[d21345a3] - update according to new jina api (Han Xiao)

[7e9c04fa] - added resize to fix keras shape error (#174) (Aziz Belaweid)

🍹 Other Improvements

[1ce3d8e1] - bump jina requirements (Han Xiao)

[43d62f06] - readme: update logo (Han Xiao)

[489014ee] - version: the next version will be 0.1.4 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.3(Oct 27, 2021)
Release Note (0.1.3)

Release time: 2021-10-27 07:27:34

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, 🙇

🧼 Code Refactoring

[1ae201a0] - embedding: level up embed method to top API add docs (#178) (Han Xiao)

🍹 Other Improvements

[bf07ab12] - version: the next version will be 0.1.3 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.2(Oct 26, 2021)
Release Note (0.1.2)

Release time: 2021-10-26 19:03:12

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, 🙇

🆕 New Features

[df192645] - labeler: gently terminate the labler UI from frontend (#177) (Han Xiao)

[115a0aa4] - tuner: add plot function for tuner.summary (#167) (Han Xiao)

🐞 Bug fixes

[40261d47] - api: levelup save and display to top-level (#176) (Han Xiao)

[320ec5df] - api: return model and summary in highlevel fit (#175) (Han Xiao)

🍹 Other Improvements

[ebb9c8d5] - setup: update jina minimum requirement for new block() (Han Xiao)

[1c5d00cd] - version: the next version will be 0.1.2 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.1(Oct 24, 2021)
Release Note (0.1.1)

Release time: 2021-10-24 11:03:40

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Deepankar Mahapatro, Mohammad Kalim Akram, Jina Dev Bot, 🙇

🆕 New Features

[43480cc3] - helper: set_embedding function for all frameworks (#163) (Han Xiao)

[fddc57dc] - labeler: allow user fixing the question (#159) (Han Xiao)

🐞 Bug fixes

[1e07e34c] - reset toggle on reload (#154) (Mohammad Kalim Akram)

🧼 Code Refactoring

[d8d875ff] - labeler: use set_embeddings in labeler (#165) (Han Xiao)

📗 Documentation

[d1a9396d] - remind user again change the data pth (#158) (Wang Bo)

[b92df7de] - enable docbot for finetuner (#153) (Deepankar Mahapatro)

🏁 Unit Test and CICD

[0d8e0b58] - add gpu test for set embedding (#164) (Wang Bo)

🍹 Other Improvements

[87cdc133] - fix docs css styling (Han Xiao)

[8e3b1fbb] - fix styling (Han Xiao)

[870c5a23] - fill missing docstrings (#162) (Wang Bo)

[67896b97] - fix readme (Han Xiao)

[838ebe35] - update readme (Han Xiao)

[ccf6de1a] - docs: fix docs banner (Han Xiao)

[9e4af657] - version: the next version will be 0.1.1 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.1.0(Oct 20, 2021)
Release Note (0.1.0)

Release time: 2021-10-20 09:04:47

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, 🙇

🐞 Bug fixes

[f6ba40d0] - setup: add MANIFEST.in (Han Xiao)

🍹 Other Improvements

[377959a1] - version: the next version will be 0.0.5 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.0.4(Oct 20, 2021)
Release Note (0.0.4)

Release time: 2021-10-20 08:53:48

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, 🙇

📗 Documentation

[6854ba0b] - fix ecosystem sidebar (Han Xiao)

🍹 Other Improvements

[0007fd84] - fix logos (Han Xiao)

[400e8070] - update readme (Han Xiao)

[73421284] - fix setup.py (Han Xiao)

[db3757d4] - fix readme (Han Xiao)

[1a3002b6] - version: the next version will be 0.0.4 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)
v0.0.3(Oct 19, 2021)
Release Note (0.0.3)

Release time: 2021-10-19 23:04:12

🙇 We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Maximilian Werk, Tadej Svetina, Alex Cureton-Griffiths, Roshan Jossy, Jina Dev Bot, 🙇

🆕 New Features

[84585bee] - refactor head layers (#130) (Tadej Svetina)

[b624a62a] - tuner: allow adjustment of optimizer (#128) (Tadej Svetina)

[98c584e4] - enable saving of models in backend (#115) (Maximilian Werk)

[2296ca09] - tuner: add gpu for paddle and tf (#121) (Wang Bo)

[c60ec838] - tuner: add gpu support for pytorch (#122) (Tadej Svetina)

[c971f824] - logging of train and eval better aligned (#105) (Maximilian Werk)

[a6d16ff2] - tailor: add display and refactor summary (#112) (Han Xiao)

[bd4cfff4] - fit: add tailor to top-level fit function (#108) (Han Xiao)

[82c2cc8d] - tailor: attach a dense layer to tailor (#96) (Wang Bo)

[04de292a] - tailor: add high-level framework-agnostic convert (#97) (Han Xiao)

⚡ Performance Improvements

[68fc7839] - tuner: inference mode for torch evaluation (#89) (Tadej Svetina)

🐞 Bug fixes

[b951426d] - change helper function to private (#151) (Han Xiao)

[bc8b36ef] - demo: fix celeba example docs, logic, code (#145) (Han Xiao)

[ed6d8c67] - frontend layout tweaks (#142) (Han Xiao)

[02852803] - overfit test (#137) (Tadej Svetina)

[5a25a729] - helper: add real progressbar for training (#136) (Han Xiao)

[5196ce2a] - api: add kwargs to fit (#95) (Han Xiao)

[1a8272ca] - threading also for gateway (#83) (Maximilian Werk)

[e170d95b] - cd: fix prerelease script (Han Xiao)

🧼 Code Refactoring

[2916e9f5] - tuner: revert some catalog change before release (#150) (Han Xiao)

[635cd4c2] - adjust type hints (#149) (Wang Bo)

[b67ab1a5] - helper: move get_tailor and get_tunner to helper (#134) (Han Xiao)

[052adbb2] - helper: move get_tailor and get_tunner to helper (#131) (Han Xiao)

[d8ff3a5b] - labeler UI: js file into components (#101) (Roshan Jossy)

[80b5a2a1] - tailor: rename convert function to_embedding_model (#103) (Han Xiao)

[c06292cb] - tailor: use different trim logic (#100) (Han Xiao)

[1956a3d3] - tailor: fix type hint in tailor (#88) (Han Xiao)

[91587d88] - tailor: improve interface (#82) (Han Xiao)

[56eb5e8f] - api: move fit into top-most init (#84) (Han Xiao)

📗 Documentation

[c2584876] - add catalog to docs (#147) (Maximilian Werk)

[6fd3e1ea] - tuner: add docstrings (#148) (Tadej Svetina)

[177a78dd] - fix generate docs (#144) (Maximilian Werk)

[ac2d23de] - polish (#146) (Alex Cureton-Griffiths)

[b0da1bf6] - add celeba example (#143) (Wang Bo)

[475c1d8b] - tuner: add loss function explain for tuner (#138) (Han Xiao)

[f47e27a3] - update banner hide design (Han Xiao)

[11a6a8b9] - add interactive selector (Han Xiao)

[08ba5e06] - add tailor feature image (Han Xiao)

[528c80d5] - tailor: add docs for tailor (#119) (Han Xiao)

[04c22f74] - tailor: add first draft on tailor (Han Xiao)

[e62f77ea] - helper: add docstring for types (#98) (Han Xiao)

🏁 Unit Test and CICD

[6b8eca8c] - use jina git source as test dependencies (#135) (Han Xiao)

[f91f39f5] - add tailor plus tuner integration test (#124) (Wang Bo)

[56c13e59] - add pr labeler (#123) (Han Xiao)

[562c65f5] - tuner: add test for overfitting (#109) (Tadej Svetina)

[b448a611] - tailor: assure weights are preserved after calling to_embedding_model (#106) (Wang Bo)

[47b7a55d] - tailor: add test for name is none (#87) (Wang Bo)

🍹 Other Improvements

[370e5fba] - cd: add tag and release note script (Han Xiao)

[33b1c90b] - update readme (Han Xiao)

[0be69a45] - Introduce catalog + ndcg (#120) (Maximilian Werk)

[8bba726e] - update svg (Han Xiao)

[dfc334f7] - fix emoji (Han Xiao)

[a589a016] - docs: add note from get_framework (Han Xiao)

[d970a2b6] - fix styling (Han Xiao)

[62a0da7e] - version: the next version will be 0.0.3 (Jina Dev Bot)

Source code(tar.gz)
Source code(zip)

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Related tags

Overview

How does it work

Install

Usage

🟠 Have embedding model and labeled data

🟢 Have embedding model and unlabeled data

🟡 Have general model and labeled data

🔵 Have general model and unlabeled data

Finetuning ResNet50 on CelebA

Support

Join Us

Comments

Releases(v0.6.7)

v0.6.7(Nov 25, 2022)

Release Note Finetuner 0.6.7

🆕 Features

Add support for cross-modal evaluation in the EvaluationCallback (#615)

Add support for Multilingual CLIP (#611)

Filter models by task in finetuner.describe_models() (#610)

Configure the num_items_per_class argument in finetuner.fit() (#614)

🤟 Contributors

v0.6.5(Nov 11, 2022)

Release Note Finetuner 0.6.5

🆕 Features

Support loading training data and evaluation data from CSV files (#592)

Support for data in lists when encoding (#598)

Artifact sharing (#602)

Allow access_paths for FinetunerExecutor

Allow logger callback for Weights & Biases during Finetuner runs

Support for image blobs

⚙ Refactoring

Bump Hubble SDK version to 0.23.3 (#594)

Remove connect function (#596)

🐞 Bug Fixes

Fix executor _finetuner import

📗 Documentation Improvements

Document the force argument to finetuner.login() (#596)

Update Image-to-Image example (#599)

🤟 Contributors

v0.6.4(Oct 27, 2022)

Release Note Finetuner 0.6.4

🆕 Features

User-friendly login from Python notebooks (#576)

Change device specification argument in finetuner.fit() (#577)

Validate Finetuner run arguments on the client side (#579)

Update names of OpenCLIP models (#580)

Add method finetuner.build_model() to load pre-trained models without fine-tuning (#584)

Show progress while encoding documents (#586)

🐞 Bug Fixes

Fix GPU-availability issues

📗 Documentation Improvements

Add Colab links to Finetuning Tasks pages (#583)

🤟 Contributors

v0.6.3(Oct 13, 2022)

Release Note

🆕 Features

Allocate more GPU memory in GPU environments

Add WiSE-FT to CLIP finetuning (#571)

🐞 Bug Fixes

Fix Image Normalization for CLIP Models (#569)

Add open_clip to FinetunerExecutor requirements

📗 Documentation Improvements

Add callbacks documentation (#564)

🤟 Contributors

v0.6.2(Sep 29, 2022)

Release Note

What's in this Release?

🆕 Features

Finetuner can now produce PyTorch models

Resubmit jobs automatically

Concise and more readable log messages

🐞 Bug Fixes

Require ONNX runtime version > 1.11.1

🤟 Contributors

v0.6.1(Sep 27, 2022)

[0.6.1] - 2022-09-27

Added

Removed

Add support for cross-modal evaluation in the `EvaluationCallback` (#615)

Filter models by task in `finetuner.describe_models()` (#610)

Configure the `num_items_per_class` argument in `finetuner.fit()` (#614)

Allow `access_paths` for `FinetunerExecutor`

Fix executor `_finetuner` import

Document the `force` argument to `finetuner.login()` (#596)

Change device specification argument in `finetuner.fit()` (#577)

Add method `finetuner.build_model()` to load pre-trained models without fine-tuning (#584)

Add `open_clip` to FinetunerExecutor requirements

Release Note (`0.4.1`)

Release Note (`0.4.1`)

Minor release `0.3.0`

Release Note (`0.2.4`)

Release Note (`0.2.3`)