Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

Overview

deep_autoviml

Build keras pipelines and models in a single line of code!

banner forthebadge made-with-python ForTheBadge built-with-love standard-readme compliant Python Versions Build Status

Table of Contents

Motivation

deep_autoviml is a powerful new deep learning library with a very simple design goal:

      Make it as easy as possible for novices and 
      experts alike to experiment with and build tensorflow.keras
      preprocessing pipelines and models in as few lines of code
      as possible.

Watch YouTube Video for Demo of Deep_AutoViML

YouTube Demo

deep_autoviml is a tensorflow >2.4-enabled, keras-ready, model and pipeline building utility. deep autoviml is meant for data engineers, data scientists and ml engineers to quickly prototype and build tensorflow 2.4.1+ models and pipelines for any data set, any size using a single line of code. It can build models for structured data, NLP and image datasets. It can also handle time series data sets. You can either choose deep_autoviml to automatically buid a custom Tensorflow model or you can "bring your own model" ("BYOM" option) model to attach keras data pipelines to your model. Additionally, you can choose any Tensorflow Hub model (TFHub) to train on your data. Just see the instructions below in "Tips for using deep_autoviml" section.

why_deep

InnerWorking

These are the main features that distinguish deep_autoviml from other libraries:

  • It uses keras preprocessing layers which are more intuitive, and are included inside your model to simplify deployment
  • The pipeline is available to you to use as inputs in your own functional model (if you so wish - you must specify that option in the input - see below for "pipeline")
  • It can import any csv, txt or gzip file or file patterns (that fit multiple files) and it can scale to any data set of any size due to tf.data.Dataset's superior data pipelining features (such as cache, prefetch, batch, etc.)
  • It uses an amazing new tuner called STORM tuner that quickly searches for the best hyperparameters for your keras model in fewer than 25 trials
  • If you want to fine tune your model even further, you can fiddle with a wide variety of model options or keras options using **kwargs like dictionaries
  • You can import your own custom Sequential model and watch it transform it into a functional model with additional preprocessing and output layers and train the model with your data
  • You can save the model on your local machine or copy it to any cloud provider's storage bucket and serve it from there using tensorflow Serving (TF.Serving)
  • Since your model contains preprocessing layers built-in, you just need to provide your Tensorflow serving model with raw data to test and get back predictions in the same format as your training labels. how_it_works

Technology

deep_autoviml uses the latest in tensorflow (2.4.1+) td.data.Datasets and tf.keras preprocessing technologies: the Keras preprocessing layers enable you to encapsulate feature engineering and preprocessing into the model itself. This makes the process for training and predictions the same: just feed input data (in the form of files or dataframes) and the model will take care of all preprocessing before predictions.

To perform its preprocessing on the model itself, deep_autoviml uses tensorflow (TF 2.4.1+ and later versions) and tf.keras experimental preprocessing layers: these layers are part of your saved model. They become part of the model's computational graph that can be optimized and executed on any device including GPU's and TPU's. By packaging everything as a single unit, we save the effort in reimplementing the preprocessing logic on the production server. The new model can take raw tabular data with numeric and categorical variables or strings text directly without any preprocessing. This avoids missing or incorrect configuration for the preprocesing_layer during production.

In addition, to select the best hyper parameters for the model, it uses a new open source library:

  • storm-tuner - storm-tuner is an amazing new library that enables us to quickly fine tune our keras sequential models with hyperparameters and find a performant model within a few trials. how_deep

Install

deep_autoviml requires tensorflow v2.4.1+ and storm-tuner to run. Don't worry! We will install these libraries when you install deep_autoviml.

pip install deep_autoviml

For your own conda environment...

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install deep_autoviml
or
pip install git+https://github.com/AutoViML/deep_autoviml.git

Usage

deep_usage deep_autoviml can be invoked with a simple import and run statement:

from deep_autoviml import deep_autoviml as deepauto

Load a data set (any .csv or .gzip or .gz or .txt file) into deep_autoviml and it will split it into Train and Validation datasets inside. You only need to provide a target variable, a project_name to store files in your local machine and leave the rest to defaults:

model, cat_vocab_dict = deepauto.fit(train, target, keras_model_type="auto",
            project_name="deep_autoviml", keras_options={}, model_options={}, 
            save_model_flag=True, use_my_model='', model_use_case='', verbose=0)

Once deep_autoviml writes your saved model and cat_vocab_dict files to disk in the project_name directory, you can load it from anywhere (including cloud) for predictions like this using the model and cat_vocab_dict generated above:

There are two kinds of predictions: This is the usual (typical) format.

predictions = deepauto.predict(model, project_name, test_dataset=test,
            keras_model_type=keras_model_type, cat_vocab_dict=cat_vocab_dict)

In case you are performing image classification, then you need to use deepauto.predict_images() for making predictions. See the Image section below for more details.

API

Arguments

deep_autoviml requires only a single line of code to get started. You can however, fine tune the model we build using multiple options using dictionaries named "model_options" and "keras_options". These two dictionaries act like python **kwargs to enable you to fine tune hyperparameters for building our tf.keras model. Instructions on how to use them are provided below.

how_deep

  • train: could be a datapath+filename or a pandas dataframe. Deep Auto_ViML even handles gz or gzip files. You must specify the full path and file name for it find and load it.
  • target: name of the target variable in the data set.
  • keras_model_type: default is "auto" ## But always try "fast1" first, then "fast2", then "auto". If you want to run NLP, use "BERT" and if you want to do image classification, set it to "image". In most structured data sets, your best results will come from "fast", "fast2" and "auto" in that order.
  • project_name: must be a string. Name of the folder where we will save your keras saved model and logs for tensorboard
  • model_options: must be a dictionary. For example: {'max_trials':5} sets the number of trials to run Storm-Tuner to search for the best hyper parameters for your keras model.
  • keras_options: must be a dictionary. You can use it for changing any keras model option you want such as "epochs", "kernel_initializer", "activation", "loss", "metrics", etc.
  • model_use_case: must be a string. You can use it for telling deep_autoviml what kind of use case you will use such as "time series", "seq2seq", modeling etc. This option is currently not used but you should watch this space for more model announcements.
  • save_model_flag: must be True or False. The model will be saved in keras model format.
  • use_my_model: This is where "bring your own model" (BYOM) option comes into play. This BYOM model must be a keras Sequential model with NO input layers and output layers! You can define it and send it as input here. We will add input and preprocessing layers to it automatically. Your custom defined model must contain only hidden layers (Dense, Conv1D, Conv2D, etc.), and dropouts, activations, etc. The default for this argument is "" (empty string) which means we will build your model. If you provide your custom model object here, we will use it instead.
  • verbose: must be 0, 1 or 2. Can also be True or False. You can see more and more outputs as you increase the verbose level. If you want to see a chart of your model, use verbose = 2. But you must have graphviz and pydot installed in your machine to see the model plot.

Image

image_deep Leaf Images referred to here are from Kaggle and are copyright of Kaggle. They are shown for illustrative purposes. Kaggle Leaf Image Classification

deep_autoviml can do image classification. All you need to do is to organize your image_dir folder under train, validation and test sub folders. Train folder for example, can contain images for each label as a sub-folder. All you need to provide is the name of the image directory for example "leaf_classification" and deep_autoviml will automatically read the images and assign them correct labels and the correct dataset (train, test, etc.)

image_dir = "leaf_classification" You also need to provide the height and width of each image as well as the number of channels for each image.

img_height = 224
img_width = 224
img_channels = 3

You then need to set the keras model type argument as "image".

keras_model_type = "image"

You also need to send in the above arguments as model options as follows: model_options = {'image_directory': image_dir, 'image_height': img_height, 'image_width':img_width, 'image_channels':img_channels }

You can then call deep_autoviml for training the model as usual with these inputs: model, dicti = deepauto.fit(trainfile, target, keras_model_type=keras_model_type, project_name='leaf_classification', save_model_flag=False, model_options=model_options, keras_options=keras_options, use_my_model='', verbose=0)

To make predictions, you need to provide the dictionary ("dicti") from above and the trained model. You also need to provide where the test images are stored as follows. test_image_dir = 'leaf_classification/test' predictions = deepauto.predict_images(test_image_dir, model, dicti)

NLP

NLP_deep deep_autoviml can also do NLP text classification. There are two ways to do NLP:

  • 1. Using folders and sub-folders
  • All you need to do is to organize your text_dir folder under train, validation and test sub folders. Train folder for example, can contain Text files for each label as a sub-folder. All you have to do is:

    keras_model_type as "BERT" or keras_model_type as "USE" or and it will use either BERT or Universal Sentence Encoder to preprocess and transform your text into embeddings to feed to a model.

  • 2. Using CSV file
  • Just provide a CSV file with column names and text. If you have multiple text columns, it will handle all of them automatically. If you want to mix numeric and text columns, you can do so in the same CSV file. deep_autoviml will automatically detect which columns are text (NLP) and which columns are numeric and do preprocessing automatically. You can specify whether to use:

    keras_model_type as "BERT" or keras_model_type as "USE" or and it will use either BERT or Universal Sentence Encoder as specified on your text columns. If you want to use neither of them, you can just specify:

    keras_model_type as "auto" and deep_autoviml will automatically choose the best embedding for your model.

    Tips

    You can use the following arguments in your input to make deep_autoviml work best for you:

    • model_options = {"model_use_case":'pipeline'}: If you only want keras preprocessing layers (i.e. keras pipeline) then set the model_use_case input to "pipeline" and Deep Auto_ViML will not build a model but just return the keras input and preprocessing layers. You can use these inputs and output layers to any sequential model you choose and build your own custom model.
    • model_options = {'max_trials':5}: Always start with a small number of max_trials in model_options dictionary or a dataframe. Start with 5 trials and increase it by 20 each time to see if performance improves. Stop when performance of the model doesn't improve any more. This takes time.
    • model_options = {'cat_feat_cross_flag':True}: default is False but change it to True and see if adding feature crosses with your categorical features helps improve the model. However, do not do this for a large data set! This will explode the number of features in your model. Be careful!
    • model_options = {'nlp_char_limit':20}: If you want to run NLP Text preprocessing on any column, set this character limit low and deep_autoviml will then detect that column as an NLP column automatically. The default is 30 chars.
    • keras_options = {"patience":30}: If you want to reduce Early Stopping, then increase the patience to 30 or higher. Your model will train longer but you might get better performance.
    • use_my_model = my_sequential_model: If you want to bring your own custom model for training, then define a Keras Sequential model (you can name it anything but for example purposes, we have named it my_sequential_model) but don't include inputs or output layers! Just define your hidden layers! Deep Auto_ViML will automatically add inputs and output layers to your model and train it. It will also save your model after training. You can use this model for predictions.
    • keras_model_type = "image": If you want to build a model for image classification, then you can use this option. But you must add the following additional options in model_options dictionary: model_options = {"image_height":__, "image_width": __, "image_channels": __, "image_directory": __}.
    • model_options = {"tf_hub_model": "URL"}: If you want to use a pre-trained Tensorflow Hub model such as BERT or a feature extractor for image classification, then you can use its TF Hub model URL by providing it in model_options dictionary as follows: model_options = {"tf_hub_model": "URL of TF hub model"}
    • keras_model_type = "BERT" or keras_model_type = "USE": If you want to use a default BERT model or a Universal Sentence Encoder model, just set this option to either "BERT" or "USE" and we will load a default small pre-trained model from TF Hub, train it on your dataset and give you back a pipeline with BERT/USE in it! If you want to use some other BERT model than the one we have chosen, please go to Tensorflow Hub and find your model's URL and set model_options = {"tf_hub_model": "URL of TF hub model"} and we will train whatever BERT model you have chosen with your data.

    Maintainers

    Contributing

    See the contributing file!

    PRs accepted.

    License

    Apache License 2.0 © 2020 Ram Seshadri

    DISCLAIMER

    This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

    Comments
    • test_size should be either positive and smaller than the number of samples

      test_size should be either positive and smaller than the number of samples

      I am having an error in training a model, it says:

      test_size=-22.10536044362292 should be either positive and smaller than the number of samples 4328 or a float in the (0, 1) range

      opened by miggytrinidad 4
    • Need a clarification on AutoViML

      Need a clarification on AutoViML

      Hi i am currently doing a research on Automated ML, i went through Auto-TS it is really good , but i have a doubt can we use AutoViML for time series will it preform well.

      it will be helpful if i get an expert opinion. Thank you.

      opened by Rohith616 2
    • Include license in source distribution

      Include license in source distribution

      The PyPI source does not have any license file in it (v0.0.78.dev1). This PR ensures inclusion of the license file (LICENSE) in the source distribution (*.tar.gz) file.

      Closes #16

      opened by sugatoray 2
    • ValueError: not enough values to unpack (expected 2, got 0)

      ValueError: not enough values to unpack (expected 2, got 0)

      if we pass only the text column in the deep_autoViML , then i am getting this error.

      ValueError: not enough values to unpack (expected 2, got 0)

      any clues ?

      opened by NeyoxDrago 1
    • TypeError: cannot unpack non-iterable NoneType object

      TypeError: cannot unpack non-iterable NoneType object

      Following the tutorial for Image Classification, I am getting below error

      Screen Shot 2021-10-30 at 12 05 15 AM

      My dataset: https://drive.google.com/drive/folders/1sQpk94eAhq4ADBHpCh3yc2D4YbJxBpCn?usp=sharing

      opened by jocelynbaduria 1
    • Anaconda environment name with spaces causes issues

      Anaconda environment name with spaces causes issues

      Running deepauto.fit, I'm seeing some issues relating to my environment name having spaces in them. Example:

      ################################################################################# ########### C R E A T I N G A K E R A S M O D E L ############ #################################################################################

      Creating a keras Function model... number of outputs = 1, output_activation = sigmoid loss function: SparseCategoricalCrossentropy initial learning rate = 0.05 initial optimizer = SGD Recommended hidden layers (with units in each Dense Layer) = (96, 64, 32)

      creating auto model body...
      

      "dot" with args ['-Tps', 'C:\Users\JIMMY~1.LIA\AppData\Local\Temp\1\tmp53d9l6vd'] returned code: 1

      stdout, stderr: b'' b"'C:\Users\jimmy.liang\Anaconda3\envs\Special' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n"

      Model plot not saved due to error. Continuing...

      opened by jimmyland22 1
    • Seems too memory intensive?

      Seems too memory intensive?

      I'm trying to follow the tutorials to try image and NLP classification. On Colab, the runtime keeps crashing due to running out of memory, so I moved it into my local machine. On a sample Kaggle NLP dataset, I'm getting this message

      MemoryError: Unable to allocate 72.6 GiB for an array with shape (18679895,) and data type <U1043

      Seems like quite a demand. The data I'm trying to use is https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset

      opened by jimmyland22 1
    • predict function

      predict function

      if I'm loading the model from another instance. Can I use the predict function or since my model is a multiclass BERT model do I need to use predict_text? Sorry if this is a very basic question as I'm new to this.

      I tried loading the model like this:

      from deep_autoviml import deep_autoviml as deepauto
      
      model_or_model_path = './Bert_tag_Classifier_IS/BERT/model_2022_02_12-20_44_53'
      project_name='Bert_tag_Classifier_IS'
      test = df
      keras_model_type = 'BERT'
      predictions = deepauto.predict(model_or_model_path, 
                                                        project_name, 
                                                        test_dataset =test, 
                                                        keras_model_type = keras_model_type)
      
      and keep getting this error;
      

      File "C:\Users\yanie\anaconda3\envs\Deep_autoviml_env\lib\site-packages\deep_autoviml\modeling\predict_model.py", line 311, in predict model, cat_vocab_dict = load_model_dict(model_or_model_path, cat_vocab_dict, project_name, keras_model_type)

      TypeError: cannot unpack non-iterable NoneType object

      any thoughts on this?

      opened by BenGraWarBuf 3
    • Feature: Support for Seq2Seq (LSTM) model for next word prediction

      Feature: Support for Seq2Seq (LSTM) model for next word prediction

      Issue

      I would like to take up the task to implement Seq2Seq models on deep_AutoViML. This will allow this library to perform operations like next word prediction, Text summarization etc.

      Proposed approach

      
      keras_model_type = "next_word_prediction" 
      deepauto.fit(train_datafile, target, keras_model_type=keras_model_type,
      		project_name=project_name, keras_options=keras_options, model_options=model_options, 
      		save_model_flag=False, verbose=1)
      
      

      We can use keras_model_type in deep_autoviml.py to check for the string next word prediction, here the data will be preprocessed and an appropriate model will be chosen. After this chosen model will be trained for the given data. Users can either enter or use the default early stopping, epochs and other features.

      if keras_model_type.lower() in ['image', 'images', "image_classification"]:
         # Train Image classification
      
      elif keras_model_type.lower() in ['text classification', "text_classification"]:
         # Train for Text classification
      
      elif keras_model_type.lower() in ['next word prediction', "next_word_prediction"]:
         # Train for next word prediction
      
      

      Similarly, We can create a model for time series prediction.

      @AutoViML: If you have a better approach to solving this problem let me know

      opened by chekoduadarsh 5
    Owner
    AutoViz and Auto_ViML
    Automated Machine Learning: Build Variant Interpretable Machine Learning models. Project Created by Ram Seshadri.
    AutoViz and Auto_ViML
    This repo will contain code to reproduce and build upon understanding transfer learning

    What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

    null 4 Jun 16, 2021
    Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

    Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

    Shuaifeng Zhi 243 Jan 7, 2023
    Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

    Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

    Yasunori Shimura 7 Jul 27, 2022
    MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

    MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

    Microsoft 5.7k Jan 9, 2023
    This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

    Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

    null 14 Sep 13, 2022
    Keras udrl - Keras implementation of Upside Down Reinforcement Learning

    keras_udrl Keras implementation of Upside Down Reinforcement Learning This is me

    Eder Santana 7 Jan 24, 2022
    Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

    Custom Keras ML block example for Edge Impulse This repository is an example on

    Edge Impulse 8 Nov 2, 2022
    Classification models 1D Zoo - Keras and TF.Keras

    Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

    Roman Solovyev 12 Jan 6, 2023
    YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

    YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4. YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

    Adam Van Etten 161 Jan 6, 2023
    A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

    Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

    Kasper Henriksen 6 Dec 5, 2022
    Pytoydl: A toy deep learning framework built upon numpy.

    Documents: https://pytoydl.readthedocs.io/zh/latest/ Pytoydl A toy deep learning framework built upon numpy. You can star this repository to keep trac

    null 28 Dec 10, 2022
    A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

    pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si

    Dmitri Babaev 103 Dec 17, 2022
    An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

    EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

    Ximing Yang 4 Dec 14, 2021
    Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

    Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

    Renato Almeida de Oliveira 18 Aug 31, 2022
    Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

    Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

    Ethyca 44 Dec 6, 2022
    Human head pose estimation using Keras over TensorFlow.

    RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

    Rafael Berral Soler 71 Jan 5, 2023
    Graph Neural Networks with Keras and Tensorflow 2.

    Welcome to Spektral Spektral is a Python library for graph deep learning, based on the Keras API and TensorFlow 2. The main goal of this project is to

    Daniele Grattarola 2.2k Jan 8, 2023