Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.

Activeloop

Last update: Jan 8, 2023

Related tags

Deep Learning python training data-science machine-learning ai computer-vision deep-learning tensorflow cv image-processing ml collaboration pytorch cloud-computing datasets data-processing data-version-control data-pipelines data-centric mlops

Overview

Dataset Format for AI

Documentation • Getting Started • API Reference • Examples • Blog • Slack Community • Twitter

About Hub

Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size. The hub data layout enables rapid transformations and streaming of data while training models at scale. Hub is used by Google, Waymo, Red Cross, Oxford University, and Omdena.

Hub includes the following features:

Storage agnostic API: Use the same API to upload, download, and stream datasets to/from AWS S3/S3-compatible storage, GCP, Activeloop cloud, local storage, as well as in-memory.
Compressed storage: Store images, audios and videos in their native compression, decompressing them only when needed, for e.g., when training a model.
Lazy NumPy-like slicing: Treat your S3 or GCP datasets as if they are a collection of NumPy arrays in your system's memory. Slice them, index them, or iterate through them. Only the bytes you ask for will be downloaded!
Dataset version control: Commits, branches, checkout - Concepts you are already familiar with in your code repositories can now be applied to your datasets as well.
Third-party integrations: Hub comes with built-in integrations for Pytorch and Tensorflow. Train your model with a few lines of code - we even take care of dataset shuffling. :)
Distributed transforms: Rapidly apply transformations on your datasets using multi-threading, multi-processing, or our built-in Ray integration.
Instant visualization support: Hub datasets are instantly visualized with bounding boxes, masks, annotations, etc. in Activeloop Platform (see below).

Getting Started with Hub

🚀 How to install Hub

Hub is written in 100% Python and can be quickly installed using pip.

pip3 install hub

🧠 Training a PyTorch model on a Hub dataset

Load CIFAR-10, one of the readily available datasets in Hub:

import hub
import torch
from torchvision import transforms, models

ds = hub.load('hub://activeloop/cifar10-train')

Inspect tensors in the dataset:

ds.tensors.keys()    # dict_keys(['images', 'labels'])
ds.labels[0].numpy() # array([6], dtype=uint32)

Train a PyTorch model on the Cifar-10 dataset without the need to download it

First, define a transform for the images and use Hub's built-in PyTorch one-line dataloader to connect the data to the compute:

tform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
])

hub_loader = ds.pytorch(num_workers=0, batch_size=4, transform={
                        'images': tform, 'labels': None}, shuffle=True)

Next, define the model, loss and optimizer:

net = models.resnet18(pretrained=False)
net.fc = torch.nn.Linear(net.fc.in_features, len(ds.labels.info.class_names))
    
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Finally, the training loop for 2 epochs:

for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(hub_loader):
        images, labels = data['images'], data['labels']
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(images)
        loss = criterion(outputs, labels.reshape(-1))
        loss.backward()
        optimizer.step()
        
        # print statistics
        running_loss += loss.item()
        if i % 100 == 99:    # print every 100 mini-batches
            print('[%d, %5d] loss: %.3f' %
                (epoch + 1, i + 1, running_loss / 100))
            running_loss = 0.0

🏗️ How to create a Hub Dataset

A hub dataset can be created in various locations (Storage providers). This is how the paths for each of them would look like:

Storage provider	Example path
Activeloop cloud	hub://user_name/dataset_name
AWS S3 / S3 compatible	s3://bucket_name/dataset_name
GCP	gcp://bucket_name/dataset_name
Local storage	path to local directory
In-memory	mem://dataset_name

Let's create a dataset in the Activeloop cloud. Activeloop cloud provides free storage up to 300 GB per user (more info here). Create a new account with Hub from the terminal using activeloop register if you haven't already. You will be asked for a user name, email ID, and password. The user name you enter here will be used in the dataset path.

$ activeloop register
Enter your details. Your password must be at least 6 characters long.
Username:
Email:
Password:

Initialize an empty dataset in the Activeloop Cloud:

 
  /test-dataset")"> 
  import hub

ds = hub.empty("hub://
    
     /test-dataset"
    )

Next, create a tensor to hold images in the dataset we just initialized:

images = ds.create_tensor("images", htype="image", sample_compression="jpg")

Assuming you have a list of image file paths, let's upload them to the dataset:

image_paths = ...
with ds:
    for image_path in image_paths:
        image = hub.read(image_path)
        ds.images.append(image)

Alternatively, you can also upload numpy arrays. Since the images tensor was created with sample_compression="jpg", the arrays will be compressed with jpeg compression.

import numpy as np

with ds:
    for _ in range(1000):  # 1000 random images
        random_image = np.random.randint(0, 256, (100, 100, 3))  # 100x100 image with 3 channels
        ds.images.append(random_image)

🚀 How to load a Hub Dataset

You can load the dataset you just created with a single line of code:

 
  /test-dataset")"> 
  import hub

ds = hub.load("hub://
    
     /test-dataset"
    )

You can also access other publicly available hub datasets, not just the ones you created. Here is how you would load the Objectron Bikes Dataset:

import hub

ds = hub.load('hub://activeloop/objectron_bike_train')

To get the first image in the Objectron Bikes dataset in numpy format:

image_arr = ds.image[0].numpy()

📚 Documentation

Getting started guides, examples, tutorials, API reference, and other useful information can be found on our documentation page.

🎓 For Students and Educators

Hub users can access and visualize a variety of popular datasets through a free integration with Activeloop's Platform. Users can also create and store their own datasets and make them available to the public. Free storage of up to 300 GB is available for students and educators:


Storage for public datasets hosted by Activeloop	200GB Free
Storage for private datasets hosted by Activeloop	100GB Free

👩‍💻 Comparisons to Familiar Tools

Hub vs DVC

Hub and DVC offer dataset version control similar to git for data, but their methods for storing data differ significantly. Hub converts and stores data as chunked compressed arrays, which enables rapid streaming to ML models, whereas DVC operates on top of data stored in less efficient traditional file structures. The Hub format makes dataset versioning significantly easier compared to traditional file structures by DVC when datasets are composed of many files (i.e., many images). An additional distinction is that DVC primarily uses a command-line interface, whereas Hub is a Python package. Lastly, Hub offers an API to easily connect datasets to ML frameworks and other common ML tools and enables instant dataset visualization through Activeloop's visualization tool.

Activeloop Hub vs TensorFlow Datasets (TFDS)

Hub and TFDS seamlessly connect popular datasets to ML frameworks. Hub datasets are compatible with both PyTorch and TensorFlow, whereas TFDS are only compatible with TensorFlow. A key difference between Hub and TFDS is that Hub datasets are designed for streaming from the cloud, whereas TFDS must be downloaded locally prior to use. As a result, with Hub, one can import datasets directly from TensorFlow Datasets and stream them either to PyTorch or TensorFlow. In addition to providing access to popular publicly available datasets, Hub also offers powerful tools for creating custom datasets, storing them on a variety of cloud storage providers, and collaborating with others via simple API. TFDS is primarily focused on giving the public easy access to commonly available datasets, and management of custom datasets is not the primary focus. A full comparison article can be found here.

Activeloop Hub vs HuggingFace

Hub and HuggingFace offer access to popular datasets, but Hub primarily focuses on computer vision, whereas HuggingFace focuses on natural language processing. HuggingFace Transforms and other computational tools for NLP are not analogous to features offered by Hub.

Community

Join our Slack community to learn more about unstructured dataset management using Hub and to get help from the Activeloop team and other users.

We'd love your feedback by completing our 3-minute survey.

As always, thanks to our amazing contributors!

Made with contributors-img.

Please read CONTRIBUTING.md to get started with making contributions to Hub.

README Badge

Using Hub? Add a README badge to let everyone know:

[![hub](https://img.shields.io/badge/powered%20by-hub%20-ff5a1f.svg)](https://github.com/activeloopai/Hub)

Disclaimers

Dataset Licenses

Hub users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you're a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

Usage Tracking

By default, we collect usage data using Bugout (here's the code that does it). It does not collect user data other than anonymized IP address data, and it only logs the Hub library's own actions. This helps our team understand how the tool is used and how to build features that matter to you! After you register with Activeloop, data is no longer anonymous. You can always opt-out of reporting using the CLI command below:

activeloop reporting --off

Acknowledgment

This technology was inspired by our research work at Princeton University. We would like to thank William Silversmith @SeungLab for his awesome cloud-volume tool.

Comments

[2.0] writing/reading fixed-shape arrays to chunks
support chunked writing (not appending) for np.arrays -> storage providers with the following qualities:

batched/unbatched

fixed shape (all samples have same shape)

also adds:

in .circleci/config.yaml run pytest & pytest-benchmark separately (with --benchmark-skip & --benchmark-only flags)

things this branch does not do:

appending (writing arrays to a key that already had arrays written to it)

caching

index map chunking

compression

enhancement
opened by nollied 415
Create a tutorial on Colab

Create a tutorial on Colab

Users should be able to load a dataset, train a model, and upload the dataset. Feel free to start from a small example and then make the example comprehensive.
good first issue hacktoberfest

opened by davidbuniat 33
v1-alpha candidate
Ability to modify datasets on fly. Datasets are no longer immutable and can be modified over time

Larger datasets can now be uploaded as we removed some RAM limiting components from the hub

Caching is introduced to improve IO performance.

Dynamic shaping enables very large images/data support. You can have large images/data stored in hub.
opened by edogrigqv2 31

[FEATURE] Adding FFHQ dataset

I have the 1024 and 128 scale pngs from the FFHQ dataset. I'd like to upload this as a hub:// dataset so that you can copy it to the activeloop namespace.

Currently I am considering how to structure the dataset, and what splits it should be uploaded as.

Below is the schema I have used so far. It includes all of the metadata from the original dataset including the URLs to the original files, and the pixel_md5 hashes match when looping back over the dataset and recomputing them.

ds = hub.empty("./ffhq-1024", overwrite=True)

with ds:
    ds.create_tensor("metadata/author", htype="text")
    ds.create_tensor("metadata/country", htype="text")
    ds.create_tensor("metadata/date_crawled", htype="text")
    ds.create_tensor("metadata/date_uploaded", htype="text")
    ds.create_tensor("metadata/license", htype="text")
    ds.create_tensor("metadata/license_url", htype="text")
    ds.create_tensor("metadata/photo_title", htype="text")
    ds.create_tensor("metadata/photo_url", htype="text")

    ds.create_tensor("images/image", htype="image", sample_compression="png")
    ds.create_tensor("images/face_landmarks", dtype=np.float32)
    ds.create_tensor("images/file_md5", htype="text")
    ds.create_tensor("images/file_path", htype="text")
    ds.create_tensor("images/file_url", htype="text")
    ds.create_tensor("images/file_size", dtype=np.int32)
    ds.create_tensor("images/pixel_md5", htype="text")

    ds.create_tensor("thumbs/image", htype="image", sample_compression="png")
    ds.create_tensor("thumbs/face_landmarks", dtype=np.float32)
    ds.create_tensor("thumbs/file_md5", htype="text")
    ds.create_tensor("thumbs/file_path", htype="text")
    ds.create_tensor("thumbs/file_url", htype="text")
    ds.create_tensor("thumbs/file_size", dtype=np.int32)
    ds.create_tensor("thumbs/pixel_md5", htype="text")

    ds.create_tensor("wilds/face_landmarks", dtype=np.float32)
    ds.create_tensor("wilds/face_rect", dtype=np.float32)
    ds.create_tensor("wilds/file_md5", htype="text")
    ds.create_tensor("wilds/file_path", htype="text")
    ds.create_tensor("wilds/file_url", htype="text")
    ds.create_tensor("wilds/file_size", dtype=np.int32)
    ds.create_tensor("wilds/pixel_md5", htype="text")
    ds.create_tensor("wilds/pixel_size", dtype=np.int32)

Does this structure abide by Hub best practices?

Would it be a good idea to also upload a "ffhq-128" without the 1024 images, and "ffhq-meta" without the 128 images also?

>>> next(ds.tensorflow().as_numpy_iterator())
{
  'metadata/author': array([b'Jeremy Frumkin'], dtype=object), 
  'metadata/country': array([b''], dtype=object), 
  'metadata/date_crawled': array([b'2018-10-10'], dtype=object), 
  'metadata/date_uploaded': array([b'2007-08-16'], dtype=object), 
  'metadata/license': array([b'Attribution-NonCommercial License'], dtype=object), 
  'metadata/license_url': array([b'https://creativecommons.org/licenses/by-nc/2.0/'], dtype=object), 
  'metadata/photo_title': array([b'DSCF0899.JPG'], dtype=object), 
  'metadata/photo_url': array([b'https://www.flickr.com/photos/frumkin/1133484654/'], dtype=object), 
  
  'images/image': array([[[  0, 133, 147], ..., [132, 157, 164]]], dtype=uint8), 
  'images/face_landmarks': array([[131.62, 453.8 ], ..., [521.04, 715.26]], dtype=float32), 
  'images/file_md5': array([b'ddeaeea6ce59569643715759d537fd1b'], dtype=object), 
  'images/file_path': array([b'images1024x1024/00000/00000.png'], dtype=object), 
  'images/file_size': array([1488194], dtype=int32), 
  'images/file_url': array([b'https://drive.google.com/uc?id=1xJYS4u3p0wMmDtvUE13fOkxFaUGBoH42'], dtype=object), 
  'images/pixel_md5': array([b'47238b44dfb87644460cbdcc4607e289'], dtype=object), 
  
  'thumbs/image': array([[[  0, 130, 146], ..., [134, 157, 163]]], dtype=uint8), 
  'thumbs/face_landmarks': array([[ 16.4525 ,  56.725  ], ..., [ 65.13   ,  89.4075 ]], dtype=float32), 
  'thumbs/file_md5': array([b'bd3e40b2ba20f76b55dc282907b89cd1'], dtype=object), 
  'thumbs/file_path': array([b'thumbnails128x128/00000/00000.png'], dtype=object), 
  'thumbs/file_size': array([29050], dtype=int32), 
  'thumbs/file_url': array([b'https://drive.google.com/uc?id=1fUMlLrNuh5NdcnMsOpSJpKcDfYLG6_7E'], dtype=object), 
  'thumbs/pixel_md5': array([b'38d7e93eb9a796d0e65f8c64de8ba161'], dtype=object), 
  
  'wilds/face_landmarks': array([[ 562.5,  697.5], ..., [1060.5,  996.5]], dtype=float32), 
  'wilds/face_rect': array([ 667.,  410., 1438., 1181.], dtype=float32), 
  'wilds/file_md5': array([b'1dc0287e73e485efb0516a80ce9d42b4'], dtype=object), 
  'wilds/file_path': array([b'in-the-wild-images/00000/00000.png'], dtype=object), 
  'wilds/file_size': array([3991569], dtype=int32), 
  'wilds/file_url': array([b'https://drive.google.com/uc?id=1yT9RlvypPefGnREEbuHLE6zDXEQofw-m'], dtype=object), 
  'wilds/pixel_md5': array([b'86b3470c42e33235d76b979161fb2327'], dtype=object), 
  'wilds/pixel_size': array([2016, 1512], dtype=int32)
}

Getting the 900GB Wilds images, along with the TFRecords that are pre-resized for each intermediate scale is proving to be harder to acquire. But just hosting the 1024 scale images would already be a huge improvement in making the dataset accessible.

enhancement

opened by JossWhittle 28

[Feature] pretty prints of objects

🚨🚨 Feature Request

If your feature will improve `HUB`

To explore the structure of a dataset it is convenient to have nicer and more informative prints of dataset objects and samples

Description of the possible solution

1) show ds

now

> ds
Dataset(path='hub://activeloop/abalone_full_dataset', tensors=['length', 'diameter', 'height', 'weight'])

Something along the lines would work (taken from SQLlite)

> ds.height
path: "hub://activeloop/abalone_full_dataset", samples:  1532596

tensor    htype        dtype    shape       compression
------    ------       ------   ------      -----------
length    image        uint8    256x256x3   jpeg
diameter  image        float32  512x512x3   zstd
height    image        float32  512x512x3   zstd
weight    class_label  int32    32          None

and in jupyter notebook shown as a table similar to pandas

2) show ds.tensor

now

> ds.height
Tensor(key='Length')

at least provide full information about tensor

> ds.height
Tensor(
    key='height', 
    htype='image', 
    dtype='uint8', 
    shape=(256, 256, 3), 
    sample_compression='jpeg'
)

or to make consistent with 1)

> ds.height
tensor    htype    dtype     shape       compression
------    ------   ------    ------      -----------
height    image    float32   512x512x3   zstd

2) show ds[0:5] sample

> ds[0:5]
    length    diameter     height     weight
    ------    --------     ------     ------
0      0.5    [[0.,...,0]] "sent.."      dog   
0      0.5    [[0.,...,0]] "text a"      dog   
0      0.5    [[0.,...,0]] "text b"      dog

and in jupyter notebook visualize images (and other htypes)

Notes

[ ] Feel free to provide a better format for printing dataset, tensor and sample classes
[ ] Feel free to suggest other important classes/objects need to printed properly for exploring the structure

enhancement good first issue

opened by davidbuniat 25

[FEATURE] Benchmarking memory
🚨🚨 Feature Request

[ ] Related to an existing Issue

[X] A new implementation (Improvement, Extension)

We should benchmark memory usage when fetching from a Hub dataset.

If your feature will improve HUB

In the near term, well-scoped memory benchmarks will assess new features. In the long term, it can be used to compare performance with other libraries such as Zarr and Tile.

Description of the possible solution

We could start with a client-side benchmark reading from a local volume, perhaps with memory-profiler.
help wanted good first issue
opened by mynameisvinn 25
[BUG] Tests fail in Windows Enviroment specifically
🐛🐛 Bug Report

In the current test sequence, 11 tests fail with the error AttributeError: module 'numcodecs' has no attribute 'MsgPack', however, this error does not exist in colab environments

⚗️ Current Behavior

When pytest . is run on a Windows 10 environment, 11 tests fail and 6 of them have the error message as described above.

Expected behavior/code These errors should not be thrown

⚙️ Environment

Python version(s):

Python 3.7.9

OS: Windows 10

🖼 Additional context/Screenshots (optional)

Add any other context about the problem here. If applicable, add screenshots to help explain.
opened by DebadityaPal 21
MPII Human Pose Dataset
Describe the dataset

Add MPII Human Pose Dataset dataset to Hub. So this would work.

import hub ds = hub.load("username/mpii-human-pose-dataset")

Steps

Please take a look at the docs on uploading datasets.

Uploading script should be added to examples folder

Example

You can find an example of large dataset loading and upload here:

https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py

good first issue hacktoberfest dataset
opened by kristinagrig06 21
[FEATURE] Append MPL headers on source
🚨🚨 Feature Request

[x] A new implementation (Improvement, Extension)

Is your feature request related to a problem?

Hub currently uses Mozilla Public License (MPL), which requires the following header (from Exhibit A of the license) to be attached to source.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

We need help appending MPL headers on source (where appropriate).
good first issue
opened by mynameisvinn 20
hub-2.0 chunk generator

this is an essential part of the chunk engine. this is a contribution that is narrow in scope (not implementing the whole chunk engine). i also added explicit type checking during pytests using the pytest-mypy package.

this contribution converts bytes -> chunks & has tests to represent as many edge cases as possible.

note: this chunk generator is for chunking with respect to the primary axis. it does not support slicing, but i came up with a modification that will support it.

let's merge this into release/2.0 first to get the ball rolling & i will make another PR with the modification to support slicing.
enhancement v2

opened by nollied 19
Add the the Fine-Grained Visual Categorization IMET 2020 dataset
Describe the dataset

Add IMET 2020 FGVC7 dataset to Hub. So this would work.

import hub ds = hub.load("username/imet-2020-fgvc7")

Steps

Please take a look at the docs on uploading datasets.

Uploading script should be added to examples folder

Example

You can find an example of large dataset loading and upload here:

https://github.com/activeloopai/Hub/blob/master/examples/coco/upload_coco2017.py

good first issue hacktoberfest dataset
opened by mikayelh 19
[DL-943] Nones + transform fix
🚀 🚀 Pull Request

Checklist:

[ ] My code follows the style guidelines of this project and the Contributing document

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have kept the coverage-rate up

[ ] I have performed a self-review of my own code and resolved any problems

[ ] I have checked to ensure there aren't any other open Pull Requests for the same change

[ ] I have described and made corresponding changes to the relevant documentation

[ ] New and existing unit tests pass locally with my changes

Changes
opened by farizrahman4u 1

[BUG] pytorch dataloader index error

🐛🐛 Bug Report

I'm trying to understand an issue that is making the PyTorch data loader from deeplake throw an index error for some samples unexpectedly. When I try to fetch the data directly from the data set, the behaviour is not reproducible.

The error first appeared during model training. I was able to reproduce it with the following code:

def deeplake_transform(sample_in, patch_size: int, num_seg_classes: int):
    seg_indices = sample_in["masks/label"]
    partial_mask = sample_in["masks/mask"].astype("float32")
    full_mask = np.zeros((num_seg_classes, patch_size, patch_size), dtype=np.float32)
    for i, idx in enumerate(seg_indices):
        full_mask[idx] = partial_mask[i]

    return dict(
        inputs=dict(image=T.ToTensor()(sample_in["images"])),
        targets=dict(
            segmentations=full_mask,
            classifications=sample_in["labels"].astype("float32"),
        ),
    )
data_loader = ds.pytorch(
        transform=deeplake_transform,
        decode_method={"images": "numpy"},
        batch_size=1,
        num_workers=1,
        transform_kwargs={"num_seg_classes": 67, "patch_size": 512},
    )
iter_loader = iter(data_loader)


while True:
    try:
        sample = next(iter_loader)
    except Exception as e:
        print(e)
        break

    idx += 1
    if idx == len(ds):
        print("finished")
        break

The following error is thrown without much context.

Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File ".venv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File ".venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data.append(next(self.dataset_iter))
  File "/home/test/.venv/lib/python3.8/site-packages/deeplake/integrations/pytorch/dataset.py", line 472, in __iter__
    for data in stream:
  File ".venv/lib/python3.8/site-packages/deeplake/core/io.py", line 311, in read
    yield from self.stream(block)
  File "/home/test/.venv/lib/python3.8/site-packages/deeplake/core/io.py", line 355, in stream
    data = engine.read_sample_from_chunk(
  File ".venv/lib/python3.8/site-packages/deeplake/core/chunk_engine.py", line 1528, in read_sample_from_chunk
    return chunk.read_sample(
  File ".venv/lib/python3.8/site-packages/deeplake/core/chunk/uncompressed_chunk.py", line 213, in read_sample
    sb, eb = bps[local_index]
  File ".venv/lib/python3.8/site-packages/deeplake/core/meta/encode/base_encoder.py", line 247, in __getitem__
    self._encoded[row_index], row_index, local_sample_index
IndexError: index 7133 is out of bounds for axis 0 with size 7133

But the following code produces no errors and exhausts the iterator.

for sample in ds:
    try: # try to read all the data that is used in the 
        sample["images"].data()['value']
        sample["masks/mask"].data()['value']
        sample["masks/label"].data()['value']
        sample["labels"].data()['value']
    except:
        break

I'm looking for help here since it may be related to the chunk_engine behaviour. It could help if the internal exception handler were more explicit about the error.

⚙️ Environment

Python version(s): 3.8.10
OS: Ubuntu 18.04
IDE: VS-Code
Packages: [torch==1.13.1, deeplake==3.1.7]

bug

opened by lspinheiro 2

Tweaks to readme
🚀 🚀 Pull Request

Checklist:

[ ] My code follows the style guidelines of this project and the Contributing document

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have kept the coverage-rate up

[ ] I have performed a self-review of my own code and resolved any problems

[ ] I have checked to ensure there aren't any other open Pull Requests for the same change

[ ] I have described and made corresponding changes to the relevant documentation

[ ] New and existing unit tests pass locally with my changes

Changes
opened by istranic 1
Parquet reader
🚀 🚀 Pull Request

Checklist:

[ ] My code follows the style guidelines of this project and the Contributing document

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have kept the coverage-rate up

[ ] I have performed a self-review of my own code and resolved any problems

[ ] I have checked to ensure there aren't any other open Pull Requests for the same change

[ ] I have described and made corresponding changes to the relevant documentation

[ ] New and existing unit tests pass locally with my changes

Changes
opened by farizrahman4u 0
Add support for saving query in query.json
🚀 🚀 Pull Request

Checklist:

[ ] My code follows the style guidelines of this project and the Contributing document

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have kept the coverage-rate up

[ ] I have performed a self-review of my own code and resolved any problems

[ ] I have checked to ensure there aren't any other open Pull Requests for the same change

[ ] I have described and made corresponding changes to the relevant documentation

[ ] New and existing unit tests pass locally with my changes

Changes
opened by adolkhan 0
Added support for audio/video support in hub.ingest
🚀 🚀 Pull Request

Checklist:

[ ] My code follows the style guidelines of this project and the Contributing document

[ ] I have commented my code, particularly in hard-to-understand areas

[ ] I have kept the coverage-rate up

[ ] I have performed a self-review of my own code and resolved any problems

[ ] I have checked to ensure there aren't any other open Pull Requests for the same change

[ ] I have described and made corresponding changes to the relevant documentation

[ ] New and existing unit tests pass locally with my changes

Changes

Resolves #1556
opened by aadityasinha-dotcom 1

Releases(v3.1.7)

v3.1.7(Dec 30, 2022)
🧭 What's Changed

[AL-2069] Adds tensorflow support to enterprise dataloader (#2079) @AbhinavTuli

Removed pandas dependency (#2085) @adolkhan

Fix Random split + views issue (#2084) @AbhinavTuli

[CUS-64] Enterprise dataloader support kwargs, fixes issue with pytorch lightning (#2080) @AbhinavTuli

[BUGFIX] [CUS-62] Transform append with empty samples (#2077) @FayazRahman

🗂 Documentation

Added ingest_coco to docs (#2082) @ProgerDav

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav and @adolkhan
Source code(tar.gz)
Source code(zip)
v3.1.6(Dec 28, 2022)
🧭 What's Changed

[AL-2067] Add NIFTI support (#2076) @FayazRahman

Print hint to forward the visualizer port. (#2069) @khustup

Remove torch dependency (#2074) @levongh

🚀 New

[DL-824] Ingestion for COCO format (#2027) @ProgerDav

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav, @khustup and @levongh
Source code(tar.gz)
Source code(zip)
v3.1.5(Dec 22, 2022)
🧭 What's Changed

Tests fix for python 3.8 after numpy update (#2070) @farizrahman4u

[BUGFIX] Fix PIL decode method with multiple workers and shuffling (#2068) @FayazRahman

[AL-2078] Switch random split doc section (#2064) @AbhinavTuli

[AL-1976] Adds downsampling support (#2034) @AbhinavTuli

mmdet_test_fix (#2067) @adolkhan

[CUS-50] MMdet Mask Fix (#2052) @adolkhan

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @farizrahman4u and @istranic
Source code(tar.gz)
Source code(zip)
v3.1.4(Dec 15, 2022)
🧭 What's Changed

[Bug Fix] Pickling fix for DDP + Enterprise loader (#2059) @AbhinavTuli

[AL-1995] Adds ability to randomly split Deep Lake datasets (#2035) @AbhinavTuli

[CUS-48] MMDet DDP test fix (#2040) @farizrahman4u

[AL-2045] Fix corruption caused by pop (#2057) @farizrahman4u

Remove pandas imports (#2053) @farizrahman4u

MMDet + DDP progressbar fix (#2050) @farizrahman4u

[AL-2054] Rechunk bug fix and speedup (#2056) @FayazRahman

[AL-2037] Print error that sequences are not allowed with the pytorch dataloader (#2046) @farizrahman4u

[AL-2053] Log .dataloader instead of .numpy and .pytorch (#2054) @AbhinavTuli

[DL-920] Better version control for views (#2032) @FayazRahman

[DL-805] Groups + Loader fixes (#2045) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v3.1.3(Dec 9, 2022)
🧭 What's Changed

Fix _temp_tensors attribute error (#2044) @FayazRahman

[CUS-57] [CUS-58] In place ds connect (#2041) @ProgerDav

[AL-2002] Cache libdeeplake dataset to speed up repeated use (#2036) @AbhinavTuli

Fix transform readonly tests (#2047) @AbhinavTuli

[DL-761] mesh htype support (#1940) @adolkhan

[DL-815] Unifying src_token and dest_token to token (#2038) @adolkhan

Fixing torch import(#2042) @adolkhan

[CUS-56] Restrict characters in dataset names (#2037) @FayazRahman

[DL-793][CUS-46] Add wandb logging to indra loader (#2039) @farizrahman4u

Allow the use of compute functions on read-only datasets (#2019) @daniel-falk

MMDet Augmentations Fix (#2033) @adolkhan

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav, @adolkhan, @daniel-falk and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v3.1.2(Dec 1, 2022)
🧭 What's Changed

[DL-888] Dataset copying speedup and fixes (#2005) @FayazRahman

Do not hide S3 access errors (#1884) @daniel-falk

[DL-905] [DL-916] Consistent progressbar arg + example for decode_method (#2021) @FayazRahman

⚙️ Who Contributes

@FayazRahman and @daniel-falk
Source code(tar.gz)
Source code(zip)
v3.1.1(Nov 29, 2022)
🧭 What's Changed

Mmdet integration (#2026) @adolkhan

Allow persistent workers in dataloader (#2028) @AbhinavTuli

[AL-2012] speedup pop element from dataset (#2024) @levongh

[AL-2036] remove tiled image extraction (#2017) @levongh

Handle repeated samples in shuffle (#2018) @AbhinavTuli

[DL-910] Tensorflow iteration fix (#2013) @FayazRahman

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan and @levongh
Source code(tar.gz)
Source code(zip)
v3.1.0(Nov 17, 2022)
🧭 What's Changed

[DL-896] pip install deeplake[enterprise] (#2008) @farizrahman4u

[AL-2017] Add decode method to Pytorch API (#1991) @AbhinavTuli

[DL-885] Fix iteration warnings (#1989) @FayazRahman

[CUS-35] Fix merging class labels when class names aren't populated (#2007) @AbhinavTuli

Allow np.array as sampler weights. Update docs. (#1999) @khustup

[DL-893] Fast UUID + speedup sample id tensor (#1988) @farizrahman4u

[AL-2024] Add MPL license to Deep Lake in Pypi (#1998) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @farizrahman4u and @khustup
Source code(tar.gz)
Source code(zip)
v3.0.18(Nov 11, 2022)
🧭 What's Changed

Bump libdeeplake version to fix issue with dataloader crashing over multiple epochs(#2000) @AbhinavTuli

[DL-811] [DL-857] API reference updates (#1977) @FayazRahman

⚙️ Who Contributes

@AbhinavTuli and @FayazRahman
Source code(tar.gz)
Source code(zip)
v3.0.17(Nov 10, 2022)
🧭 What's Changed

[CUS-32] Fix dataloader behaviour for json and list tensors (#1995) @AbhinavTuli

[CUS-30] Add support for bytes in json tensors (#1994) @AbhinavTuli

Add timeout to Pypi version check (#1996) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli
Source code(tar.gz)
Source code(zip)
v3.0.16(Nov 9, 2022)
🧭 What's Changed

Libdeeplake update to fix issue with linked tensors on certain systems (#1992) @levongh

[AL-1850] [CUS-29] Version control diff and merge improvements (#1862) @AbhinavTuli

Adds support for sampling. (#1987) @khustup

[DL-879] Improve download API (#1986) @FayazRahman

[AL-1992] [CUS-18] Fixes token expiration issue using hub:// datasets (#1983) @AbhinavTuli

Mesh & Point Cloud htype's docs (#1979) @adolkhan

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @khustup and @levongh
Source code(tar.gz)
Source code(zip)
v3.0.15(Nov 4, 2022)
🧭 What's Changed

Serve link creds for non deeplake datasets in ds.visualize (#1974) @khustup

[DL-790] Speedup extend (#1936) @farizrahman4u

⚙️ Who Contributes

@farizrahman4u and @khustup
Source code(tar.gz)
Source code(zip)
v3.0.14(Nov 1, 2022)
🧭 What's Changed

[AL-2010] Fixes verification of linked samples during rechunking (#1980) @AbhinavTuli

No Wheels (fix for pip install on Windows) (#1976) @farizrahman4u

[AL-2011] Fixes a bug with popping samples (#1975) @AbhinavTuli

[AL-1964] Expose path for linked tensors (#1963) @AbhinavTuli

[DL-759] Deeplake connect (#1951) @ProgerDav

⚙️ Who Contributes

@AbhinavTuli, @ProgerDav and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v3.0.13(Oct 28, 2022)
🧭 What's Changed

Update libdeeplake version (#1970) @AbhinavTuli

Update shuffle buffer to handle bytes (#1968) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli
Source code(tar.gz)
Source code(zip)
v3.0.12(Oct 27, 2022)
🧭 What's Changed

Libdeeplake fixes and improvements (#1964) @AbhinavTuli

Greatly improves performance when working with compressed jpeg and png data

Experimental dataloader transforms now receive PIL images instead of numpy arrays, ToPILImage transform should not be included

Fixes deadlocking issue when multiple nested dataloaders are created

Fixed unexpected segmentation faults

Added wheels for centOS

Added wheels for arm64 and x86_64 (fixed linking errors during lib import)

[DL]-819 Add error messages related to user not being logged in (#1955) @adolkhan

[DL-804] Dont support group.info (#1960) @FayazRahman

[DL-782] Delete temp tensors in case append fails during transforms (#1924) @FayazRahman

Improves experimental dataloader performance for tensors with jpeg and png images (#1961) @AbhinavTuli

[AL-1999] [Bug fix] lnfo not being updated after using Deep Lake compute on dataset. (#1956) @AbhinavTuli

Fixed shape polygon fix (#1959) @FayazRahman

[DL-821] Fix allowing commit on views (#1953) @farizrahman4u

[DL-814][CUS-14][CUS-17] Pytorch fixes (#1949) @farizrahman4u

[CUS-22] Update query and htypes api reference (#1948) @FayazRahman

[CUS-24] Fix polygons bug with fixed shape inputs (#1950) @farizrahman4u

[DL-756] Log loading creds except in transforms (#1937) @FayazRahman

[Dl 706] Improve speed of materialization (#1902) @adolkhan

[AL-1990] add shuffle argument to .shuffle for experimental dataloader(#1942) @levongh

[DL-726][DL-789] Ignore corrupt tensors + fetch_chunks for .data(), .text() etc (#1932) @farizrahman4u

[DL-798] Fix partial read skip for chunk compressed chunks (#1939) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u, @istranic and @levongh
Source code(tar.gz)
Source code(zip)
v3.0.10(Oct 13, 2022)
🧭 What's Changed

libdeeplake upgrade (#1938) @davidbuniat

Query shape(image) bug fixed

Query regex for contains function deployed. Example: SELECT * WHERE contains(labels, 'an') on imagenet, will return all samples with class names containing. There are two wildcards supported * - any number of characters (including 0) and ? - exactly one character.

fix read for wav compressed audio (#1935) @gorinars

[DL-730] Make sure hub.list does not report the token to bugout (#1917) @adolkhan

Update Deep Lake version after release (#1934) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli, @adolkhan, @davidbuniat, @gorinars and [email protected]
Source code(tar.gz)
Source code(zip)
v3.0.9(Oct 11, 2022)
🧭 What's Changed

Update libdeeplake version (#1933) @AbhinavTuli

[DL-764] API reference updates (#1929) @FayazRahman

Fix region issue with activeloop storage datasets (#1930) @AbhinavTuli

[DL-755] Specify transform kwargs in ds.pytorch call (#1925) @farizrahman4u

[DL-783] Rich compatibility (#1926) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v3.0.8(Oct 6, 2022)
🧭 What's Changed

libdeeplake update to fix memory issues (#1927) @AbhinavTuli

[DL-777] Polygons bug fix (#1922) @farizrahman4u

Variable local cache prefix (#1839) @GMW99

[DL-763] Locking fix (#1921) @farizrahman4u

[DL-701] Columnar views (#1912) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @GMW99 and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v3.0.7(Oct 5, 2022)
🧭 What's Changed

Updated libdeeplake version, removes torch as dependency, fixes issue with strings in dataloader (#1919) @AbhinavTuli

[DL-753] [DL-722] Fix appending linked data with verify=False (#1914) @FayazRahman

Allow tensorflow dataset to fetch chunks (#1887) @daniel-falk

[DL-754] Add reporting for W&B integration (#1918) @FayazRahman

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @daniel-falk, @davidbuniat and @mikayelh
Source code(tar.gz)
Source code(zip)
v3.0.6(Sep 30, 2022)
🧭 What's Changed

Update libdeeplake version to fix issue with distributed mode (#1915) @AbhinavTuli

[AL-1967] Fixes issue with readonly mode error raised despite not trying to write to dataset (#1911) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli and @davidbuniat
Source code(tar.gz)
Source code(zip)
v3.0.5(Sep 29, 2022)
Introducing Deep Lake

We are more than excited to transition into Deep Lake, data lake for deep learning applications. Furthermore we released

an academic paper describing all technical details https://arxiv.org/pdf/2209.10785.pdf.

business white paper you can find on https://deeplake.ai

We also move the api reference to https://docs.deeplake.ai/en/latest/

Behind the scenes those are 5 key stepping stones of Deep Lake.

Version Control: Git for data

Visualize: In-browser visualization engine

Query: Rapid queries with Tensor Query language

Materialize: Format native to deep learning

Stream: Streaming Data Loaders

If you wonder...

Why we renamed Hub to Deep Lake?

Hub originally was a chunked array format which evolved with version control, streaming engine, query capabilities naturally while iterating with community members. The name has been too generic to describe the tool often leading to a confusion with dataset hubs. Inspired from A. Pinhassi’s blogpost we renamed the package from hub to deeplake

> pip3 install deeplake

Where does Deep Lakehouse comes into the place?

While the format including versioning, lineage is fully open-source. Query, streaming and visualization engines built in C++ are yet closed source. They are accessible through Python interface for all users. While committed to open-source principles, we are planning to open-source high performance engines as they commoditize.

🧭 What's Changed

Update README.zh-cn.md (#1910) @tatevikh

Update README.md (#1909) @istranic

Staging 3.0.5 (#1908) @farizrahman4u

Tiling Fix (#1907) @farizrahman4u

3.0.3 (#1906) @farizrahman4u

[DL-746] hub->deeplake (#1895) @farizrahman4u

[DL-747] API Reference updates: new compressions + new Htypes page (#1892) @FayazRahman

Tensor Query Language documentation (#1896) @FayazRahman

Added more file formats for compression (#1597) @aadityasinha-dotcom

Indra import fix (#1891) @farizrahman4u

API Reference updates (#1886) @FayazRahman

Update version to 2.8.6 (#1889) @AbhinavTuli

🐛 Bug Fixes

Passing token down (#1903) @ProgerDav

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @ProgerDav, @aadityasinha-dotcom, @artgish, @davidbuniat, @farizrahman4u, @istranic, @mikayelh and @tatevikh
Source code(tar.gz)
Source code(zip)
v2.8.5(Sep 20, 2022)
🧭 What's Changed

[DL-717] Add installation instructions to API reference (#1882) @FayazRahman

[DL-702] API reference updates (#1883) @FayazRahman

[DL-711] Allow view optimization when read_only=True (#1865) @farizrahman4u

Fixes bug with is_sequence (#1880) @AbhinavTuli

[DL-714] Add Ellipsis support for indexing (#1878) @farizrahman4u

[DL-645] Fix memory leak in transforms (#1871) @adolkhan

[DL-715] Fix wandb integration path issue (#1879) @farizrahman4u

Add docstrings for experimental features(#1876) @levongh

[DL-693] Disable label sync for dataset copy transform (#1875) @FayazRahman

[DL-709] Docker build fix (#1860) @farizrahman4u

Improve indra error message in case of missing dependencies (#1873) @farizrahman4u

[DL-710] Fix locking issue with deepcopy (#1864) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u and @levongh
Source code(tar.gz)
Source code(zip)
v2.8.4(Sep 15, 2022)
🧭 What's Changed

Fixes import issue on Python 3.10 (#1867) @adolkhan

Big speedup for experimental dataloader initialization (#1869) @AbhinavTuli

Adds docstrings for experimenal features (#1868) @levongh

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat and @levongh
Source code(tar.gz)
Source code(zip)
v2.8.3(Sep 14, 2022)
🧭 What's Changed

Fixes type mismatch for expiration(#1858) @levongh

Flag to disable wandb integration (#1863) @farizrahman4u

Fixes wandb+local datasets (#1861) @hakanardo

[DL-668] Make pytorch() work with views (#1855) @farizrahman4u

[AL-1949] Make experimental pytorch dataloader consistent with existing implementation (#1853) @AbhinavTuli

[DL-650] Better error handling when not passing a tensor name to ds.append (#1817) @adolkhan

Update docs URL in readme (#1857) @FayazRahman

Speedup conversion of hub storage datasets->deeplake for experimental features (#1856) @levongh

[DL-611] New API reference (#1830) @FayazRahman

Wandb update: report datasets created with deepcopy (#1848) @farizrahman4u

[Bugfix] 1828 raising UserNotLoggedInException when invalid path is provided (#1829) @adolkhan

[DL-655] Added min and max length options (#1841) @adolkhan

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @adolkhan, @davidbuniat, @farizrahman4u, @hakanardo and @levongh
Source code(tar.gz)
Source code(zip)
v2.8.1(Sep 9, 2022)
🧭 What's Changed

Ensure that new format for chunk id isn't used for encoders with version <= 2.7.6 (#1850) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli and @davidbuniat
Source code(tar.gz)
Source code(zip)
v2.8.0(Sep 7, 2022)
🧭 What's Changed

Release Candidate 0 for new experimental dataloader and queries (#1819) @AbhinavTuli

[AL-1946] Fix delete group + reset bug (#1843) @AbhinavTuli

[DL-652] Add append_empty arg to ds.append (#1846) @farizrahman4u

Avoid printing syncing labels message when no labels were added (#1845) @FayazRahman

[DL-684] Fix ds.reset bug with local datasets (#1842) @FayazRahman

Use staging visualizer in tests. Correct dev visualizer url. (#1838) @khustup

Changes default chunk id size to 8 bits from 4 bits to reduce possibility of collisions (#1835) @AbhinavTuli

wandb integration (#1739) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman, @farizrahman4u and @khustup
Source code(tar.gz)
Source code(zip)
v2.7.5(Aug 24, 2022)
🧭 What's Changed

[AL-1775] Point Cloud htype (#1685) @adolkhan

[AL-1912] Don't allow generic htypes with link (#1824) @AbhinavTuli

[Bugfix] Fixes rechunking with hub link + cloud paths (#1825) @AbhinavTuli

Enable progressbar for syncing labels (#1820) @FayazRahman

[Bug fix] Ensure None/"ENV" isn't added to used_creds_keys for linked data (#1823) @AbhinavTuli

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman and @adolkhan
Source code(tar.gz)
Source code(zip)
v2.7.4(Aug 15, 2022)
🧭 What's Changed

Fix get_incompatible_dtype bug (#1814) @farizrahman4u

[AL-1888] Enable rechunking for text like htypes (#1815) @AbhinavTuli

[AL-1858] Treat empty list as None (#1813) @AbhinavTuli

Older reporting configurations were not properly handling username (#1806) @zomglings

⚙️ Who Contributes

@AbhinavTuli, @farizrahman4u and @zomglings
Source code(tar.gz)
Source code(zip)
v2.7.3(Aug 10, 2022)
🧭 What's Changed

[AL-1884] Fixes bug with ds.reset for newly added/deleted tensors (#1797) @AbhinavTuli

[DL-618] Appending to class labels with text using multiple workers (#1794) @FayazRahman

[AL-1848] New agreements handling (#1796) @AbhinavTuli

[DL-590] S3: Always show retry warnings (#1807) @farizrahman4u

[DL-620] Prevent saving of dataset views for public datasets when user is not logged in (#1803) @farizrahman4u

⚙️ Who Contributes

@AbhinavTuli, @FayazRahman and @farizrahman4u
Source code(tar.gz)
Source code(zip)
v2.7.2(Jul 26, 2022)
🧭 What's Changed

[DL-593] Bugout correctly identifying the user's username when tokens are used (#1792) @adolkhan

Fix double indexing when saving strided views (#1793) @farizrahman4u

🚀 New

Gcp support for connected datasets (#1736) @ProgerDav

⚙️ Who Contributes

@ProgerDav, @adolkhan, @davidbuniat and @farizrahman4u
Source code(tar.gz)
Source code(zip)

Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.

Related tags

Overview

Dataset Format for AI

Documentation • Getting Started • API Reference • Examples • Blog • Slack Community • Twitter

About Hub

Getting Started with Hub

🚀 How to install Hub

🧠 Training a PyTorch model on a Hub dataset

Load CIFAR-10, one of the readily available datasets in Hub:

Inspect tensors in the dataset:

Train a PyTorch model on the Cifar-10 dataset without the need to download it

🏗️ How to create a Hub Dataset

🚀 How to load a Hub Dataset

📚 Documentation

🎓 For Students and Educators

👩‍💻 Comparisons to Familiar Tools

Community

README Badge

Disclaimers

Acknowledgment

Comments

Create a tutorial on Colab

🚨🚨 Feature Request

If your feature will improve HUB

Description of the possible solution

1) show ds

2) show ds.tensor

2) show ds[0:5] sample

Notes

🚨🚨 Feature Request

If your feature will improve HUB

Description of the possible solution

🐛🐛 Bug Report

⚗️ Current Behavior

⚙️ Environment

🖼 Additional context/Screenshots (optional)

Describe the dataset

Steps

Example

🚨🚨 Feature Request

Is your feature request related to a problem?

Describe the dataset

Steps

Example

🚀 🚀 Pull Request

Checklist:

Changes

🐛🐛 Bug Report

⚙️ Environment

🚀 🚀 Pull Request

Checklist:

Changes

🚀 🚀 Pull Request

Checklist:

Changes

🚀 🚀 Pull Request

Checklist:

Changes

🚀 🚀 Pull Request

Checklist:

Changes

Releases(v3.1.7)

v3.1.7(Dec 30, 2022)

🧭 What's Changed

🗂 Documentation

⚙️ Who Contributes

v3.1.6(Dec 28, 2022)

🧭 What's Changed

🚀 New

⚙️ Who Contributes

v3.1.5(Dec 22, 2022)

🧭 What's Changed

⚙️ Who Contributes

v3.1.4(Dec 15, 2022)

🧭 What's Changed

⚙️ Who Contributes

v3.1.3(Dec 9, 2022)

🧭 What's Changed

⚙️ Who Contributes

If your feature will improve `HUB`

If your feature will improve `HUB`