The python SDK for Eto, the AI focused data platform for teams bringing AI models to production

Last update: Apr 21, 2022

Related tags

Third-party APIs Wrappers etosdk

Overview

Eto Labs Python SDK

This is the python SDK for Eto, the AI focused data platform for teams bringing AI models to production. The python SDK makes it easy to integrate Eto's features into your AI training and analysis workflow.

Installation

The Eto python SDK is available on PyPI and can be installed via Pip:

pip install etosdk

Eto SDK is compatible with Python 3.7+

Setup

Before using the SDK for the first time, you must configure it with your Eto API url and the API token.

import eto
eto.configure(url='<eto-api-url>', token='<api-token>')

The above configuration function creates a configuration file under $XDG_CONFIG_HOME/eto/eto.conf, which is usually ~/.config/eto/eto.conf.

Ingesting data

To create an ingestion job to convert raw data in Coco format and create a new dataset:

import eto
job = eto.ingest_coco('<dataset_name>',
                      {'image_dir': '<path/to/images>',
                       'annotations': '<path/to/annotations>',
                       'extras': {'key': 'value'}})

The ingestion job will run asynchronously server-side and convert the data to Rikai (parquet) format. Once complete, you should be able to see it in the data registry:

import eto

eto.list_datasets() # list all datasets

eto.get_dataset('<dataset_name>') # get information about a single dataset

Analysis

Accessing a particular dataset is easy via Pandas:

import eto
import pandas as pd

df = pd.read_eto('<dataset_name>') # Eto SDK adds a pandas extension

Training

To train a pytorch model, you can use the Dataset/DataLoader classes in Rikai:

import eto
from rikai.torch.vision import Dataset

dataset = Dataset('<dataset_name>') # Eto SDK adds an extension to Rikai to resolve dataset references 

for next_record in dataset:
    # training loop
    pass

A plain pytorch dataloader is also available from rikai.torch.data.DataLoader.

Local Spark configuration

For now, the Eto SDK relies on PySpark locally to read some of the custom Rikai types like annotations. While PySpark should be automatically installed as a transitive dependency, you may find that you need to change the Spark configurations to suit your local setup.

Your $SPARK_HOME/conf/spark-defaults.conf file should look something like the following:

spark.sql.extensions               ai.eto.rikai.sql.spark.RikaiSparkSessionExtensions
spark.jars.packages                ai.eto:rikai_2.12:0.0.13,org.apache.hadoop:hadoop-aws:3.2.0

# AWS
spark.executor.extraJavaOptions -Dcom.amazonaws.services.s3.enableV4=true -Dio.netty.tryReflectionSetAccessible=true
spark.driver.extraJavaOptions -Dcom.amazonaws.services.s3.enableV4=true -Dio.netty.tryReflectionSetAccessible=true
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
spark.hadoop.com.amazonaws.services.s3.enableV4 true
fs.AbstractFileSystem.s3a.impl org.apache.hadoop.fs.s3a.S3A
fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.aws.credentials.provider com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain

You might also like...

BeeDrive: Open Source Privacy File Transfering System for Teams and Individual Developers

BeeDrive For privacy and convenience purposes, more and more people try to keep data on their own hardwires instead of third-party cloud services such

8 Oct 31, 2022

Automatic login to Microsoft Teams conferences

1 Jan 24, 2022

To dynamically change the split direction in I3/Sway so as to split new windows automatically based on the width and height of the focused window

To dynamically change the split direction in I3/Sway so as to split new windows automatically based on the width and height of the focused window Insp

6 Mar 11, 2022

This repository will be a draft of a package about the latest total marine fish production in Indonesia. Data will be collected from PIPP (Pusat Informasi Pelabuhan Perikanan).

indomarinefish This package will give us information about the latest total marine fish production in Indonesia. The Name of the fish is written in In

1 Oct 13, 2021

Releases(v0.2)

v0.2(Dec 23, 2021)
Access the Eto dataset registry API

Submit Coco ingestion jobs

Read into pandas dataframes and pytorch datasets/dataloaders

What's Changed

Changhiskhan/sdk fixes by @changhiskhan in https://github.com/eto-ai/etosdk/pull/3

Changhiskhan/normalize uri by @changhiskhan in https://github.com/eto-ai/etosdk/pull/4

user specifies account name and SDK will formulate the url automatically by @changhiskhan in https://github.com/eto-ai/etosdk/pull/5

Sample notebook by @changhiskhan in https://github.com/eto-ai/etosdk/pull/6

minor notebook fix to wrap up M2 by @changhiskhan in https://github.com/eto-ai/etosdk/pull/7

Full Changelog: https://github.com/eto-ai/etosdk/commits/v0.2
Source code(tar.gz)
Source code(zip)

The python SDK for Eto, the AI focused data platform for teams bringing AI models to production

Related tags

Overview

Eto Labs Python SDK

Installation

Setup

Ingesting data

Analysis

Training

Local Spark configuration

You might also like...

BeeDrive: Open Source Privacy File Transfering System for Teams and Individual Developers

Automatic login to Microsoft Teams conferences

To dynamically change the split direction in I3/Sway so as to split new windows automatically based on the width and height of the focused window

This repository will be a draft of a package about the latest total marine fish production in Indonesia. Data will be collected from PIPP (Pusat Informasi Pelabuhan Perikanan).

A wrapper for aqquiring Choice Coin directly through a Python Terminal. Leverages the TinyMan Python-SDK.

AWS SDK for Python

Python SDK for Facebook's Graph API

Box SDK for Python

The Official Dropbox API V2 SDK for Python

Releases(v0.2)

v0.2(Dec 23, 2021)

What's Changed

Owner

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

Graviti-python-sdk - Graviti Data Platform Python SDK

Python SDK for LUSID by FINBOURNE, a bi-temporal investment management data platform with portfolio accounting capabilities.

Bringing Ethereum Virtual Machine to StarkNet at warp speed!

UniHub API is my solution to bringing students and their universities closer

The official Magenta Voice Skill SDK used to develop skills for the Magenta Voice Assistant using Voice Platform!

First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Utility for converting IP Fabric webhooks into a Teams format

Randomly selects two teams based on who is in a voice channel on Discord