ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Overview

Overview | Tutorials | Examples | Installation | FAQ | How to Cite

PyPI Status ktrain python compatibility license Downloads

Welcome to ktrain

News and Announcements

  • 2020-11-08:
    • ktrain v0.25.x is released and includes out-of-the-box support for text extraction via the textract package . This, for example, can be used in the SimpleQA.index_from_folder method to perform Question-Answering on large collections of PDFs, MS Word documents, or PowerPoint files. See the Question-Answering example notebook for more information.
# End-to-End Question-Answering in ktrain

# index documents of different types into a built-in search engine
from ktrain import text
INDEXDIR = '/tmp/myindex'
text.SimpleQA.initialize_index(INDEXDIR)
corpus_path = '/my/folder/of/documents' # contains .pdf, .docx, .pptx files in addition to .txt files
text.SimpleQA.index_from_folder(corpus_path, INDEXDIR, use_text_extraction=True, # enable text extraction
                              multisegment=True, procs=4, # these args speed up indexing
                              breakup_docs=True)          # this slows indexing but speeds up answer retrieval

# ask questions (setting higher batch size can further speed up answer retrieval)
qa = text.SimpleQA(INDEXDIR)
answers = qa.ask('What is ktrain?', batch_size=8)

# top answer snippet extracted from https://arxiv.org/abs/2004.10703:
#   "ktrain is a low-code platform for machine learning"
  • 2020-11-04
  • 2020-10-16:
    • ktrain v0.23.x is released with updates for compatibility with upcoming release of TensorFlow 2.4.

Overview

ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines of code, ktrain allows you to easily and quickly:

  • employ fast, accurate, and easy-to-use pre-canned models for text, vision, graph, and tabular data:

  • estimate an optimal learning rate for your model given your data using a Learning Rate Finder

  • utilize learning rate schedules such as the triangular policy, the 1cycle policy, and SGDR to effectively minimize loss and improve generalization

  • build text classifiers for any language (e.g., Arabic Sentiment Analysis with BERT, Chinese Sentiment Analysis with NBSVM)

  • easily train NER models for any language (e.g., Dutch NER )

  • load and preprocess text and image data from a variety of formats

  • inspect data points that were misclassified and provide explanations to help improve your model

  • leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data

Tutorials

Please see the following tutorial notebooks for a guide on how to use ktrain on your projects:

Some blog tutorials about ktrain are shown below:

ktrain: A Lightweight Wrapper for Keras to Help Train Neural Networks

BERT Text Classification in 3 Lines of Code

Text Classification with Hugging Face Transformers in TensorFlow 2 (Without Tears)

Build an Open-Domain Question-Answering System With BERT in 3 Lines of Code

Finetuning BERT using ktrain for Disaster Tweets Classification by Hamiz Ahmed

Examples

Tasks such as text classification and image classification can be accomplished easily with only a few lines of code.

Example: Text Classification of IMDb Movie Reviews Using BERT [see notebook]

import ktrain
from ktrain import text as txt

# load data
(x_train, y_train), (x_test, y_test), preproc = txt.texts_from_folder('data/aclImdb', maxlen=500, 
                                                                     preprocess_mode='bert',
                                                                     train_test_names=['train', 'test'],
                                                                     classes=['pos', 'neg'])

# load model
model = txt.text_classifier('bert', (x_train, y_train), preproc=preproc)

# wrap model and data in ktrain.Learner object
learner = ktrain.get_learner(model, 
                             train_data=(x_train, y_train), 
                             val_data=(x_test, y_test), 
                             batch_size=6)

# find good learning rate
learner.lr_find()             # briefly simulate training to find good learning rate
learner.lr_plot()             # visually identify best learning rate

# train using 1cycle learning rate schedule for 3 epochs
learner.fit_onecycle(2e-5, 3) 

Example: Classifying Images of Dogs and Cats Using a Pretrained ResNet50 model [see notebook]

import ktrain
from ktrain import vision as vis

# load data
(train_data, val_data, preproc) = vis.images_from_folder(
                                              datadir='data/dogscats',
                                              data_aug = vis.get_data_aug(horizontal_flip=True),
                                              train_test_names=['train', 'valid'], 
                                              target_size=(224,224), color_mode='rgb')

# load model
model = vis.image_classifier('pretrained_resnet50', train_data, val_data, freeze_layers=80)

# wrap model and data in ktrain.Learner object
learner = ktrain.get_learner(model=model, train_data=train_data, val_data=val_data, 
                             workers=8, use_multiprocessing=False, batch_size=64)

# find good learning rate
learner.lr_find()             # briefly simulate training to find good learning rate
learner.lr_plot()             # visually identify best learning rate

# train using triangular policy with ModelCheckpoint and implicit ReduceLROnPlateau and EarlyStopping
learner.autofit(1e-4, checkpoint_folder='/tmp/saved_weights') 

Example: Sequence Labeling for Named Entity Recognition using a randomly initialized Bidirectional LSTM CRF model [see notebook]

import ktrain
from ktrain import text as txt

# load data
(trn, val, preproc) = txt.entities_from_txt('data/ner_dataset.csv',
                                            sentence_column='Sentence #',
                                            word_column='Word',
                                            tag_column='Tag', 
                                            data_format='gmb',
                                            use_char=True) # enable character embeddings

# load model
model = txt.sequence_tagger('bilstm-crf', preproc)

# wrap model and data in ktrain.Learner object
learner = ktrain.get_learner(model, train_data=trn, val_data=val)


# conventional training for 1 epoch using a learning rate of 0.001 (Keras default for Adam optmizer)
learner.fit(1e-3, 1) 

Example: Node Classification on Cora Citation Graph using a GraphSAGE model [see notbook]

import ktrain
from ktrain import graph as gr

# load data with supervision ratio of 10%
(trn, val, preproc)  = gr.graph_nodes_from_csv(
                                               'cora.content', # node attributes/labels
                                               'cora.cites',   # edge list
                                               sample_size=20, 
                                               holdout_pct=None, 
                                               holdout_for_inductive=False,
                                              train_pct=0.1, sep='\t')

# load model
model=gr.graph_node_classifier('graphsage', trn)

# wrap model and data in ktrain.Learner object
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=64)


# find good learning rate
learner.lr_find(max_epochs=100) # briefly simulate training to find good learning rate
learner.lr_plot()               # visually identify best learning rate

# train using triangular policy with ModelCheckpoint and implicit ReduceLROnPlateau and EarlyStopping
learner.autofit(0.01, checkpoint_folder='/tmp/saved_weights')

Example: Text Classification with Hugging Face Transformers on 20 Newsgroups Dataset Using DistilBERT [see notebook]

# load text data
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
from sklearn.datasets import fetch_20newsgroups
train_b = fetch_20newsgroups(subset='train', categories=categories, shuffle=True)
test_b = fetch_20newsgroups(subset='test',categories=categories, shuffle=True)
(x_train, y_train) = (train_b.data, train_b.target)
(x_test, y_test) = (test_b.data, test_b.target)

# build, train, and validate model (Transformer is wrapper around transformers library)
import ktrain
from ktrain import text
MODEL_NAME = 'distilbert-base-uncased'
t = text.Transformer(MODEL_NAME, maxlen=500, class_names=train_b.target_names)
trn = t.preprocess_train(x_train, y_train)
val = t.preprocess_test(x_test, y_test)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=6)
learner.fit_onecycle(5e-5, 4)
learner.validate(class_names=t.get_classes()) # class_names must be string values

# Output from learner.validate()
#                        precision    recall  f1-score   support
#
#           alt.atheism       0.92      0.93      0.93       319
#         comp.graphics       0.97      0.97      0.97       389
#               sci.med       0.97      0.95      0.96       396
#soc.religion.christian       0.96      0.96      0.96       398
#
#              accuracy                           0.96      1502
#             macro avg       0.95      0.96      0.95      1502
#          weighted avg       0.96      0.96      0.96      1502

Example: Tabular Classification for Titanic Survival Prediction Using an MLP [see notebook]

import ktrain
from ktrain import tabular
import pandas as pd
train_df = pd.read_csv('train.csv', index_col=0)
train_df = train_df.drop(['Name', 'Ticket', 'Cabin'], 1)
trn, val, preproc = tabular.tabular_from_df(train_df, label_columns=['Survived'], random_state=42)
learner = ktrain.get_learner(tabular.tabular_classifier('mlp', trn), train_data=trn, val_data=val)
learner.lr_find(show_plot=True, max_epochs=5) # estimate learning rate
learner.fit_onecycle(5e-3, 10)

# evaluate held-out labeled test set
tst = preproc.preprocess_test(pd.read_csv('heldout.csv', index_col=0))
learner.evaluate(tst, class_names=preproc.get_classes())

Using ktrain on Google Colab? See these Colab examples:

Additional examples can be found here.

Installation

  1. Make sure pip is up-to-date with: pip install -U pip

  2. Install TensorFlow 2 if it is not already installed (e.g., pip install tensorflow)

  3. Install ktrain: pip install ktrain

The above should be all you need on Linux systems and cloud computing environments like Google Colab and AWS EC2. If you are using ktrain on a Windows computer, you can follow these more detailed instructions that include some extra steps.

Some important things to note about installation:

  • If using ktrain with tensorflow<=2.1, you must also downgrade the transformers library to transformers==3.1.
  • As of v0.21.x, ktrain no longer installs TensorFlow 2 automatically. As indicated above, you should install TensorFlow 2 yourself before installing and using ktrain. On Google Colab, TensorFlow 2 should be already installed. You should be able to use ktrain with any version of TensorFlow 2. Note, however, that there is a bug in TensorFlow 2.2 and 2.3 that affects the Learning-Rate-Finder that will not be fixed until TensorFlow 2.4. The bug causes the learning-rate-finder to complete all epochs even after loss has diverged (i.e., no automatic-stopping).
  • If using ktrain on a local machine with a GPU (versus Google Colab, for example), you'll need to install GPU support for TensorFlow 2.
  • Since some ktrain dependencies have not yet been migrated to tf.keras in TensorFlow 2 (or may have other issues), ktrain is temporarily using forked versions of some libraries. Specifically, ktrain uses forked versions of the eli5 and stellargraph libraries. If not installed, ktrain will complain when a method or function needing either of these libraries is invoked. To install these forked versions, you can do the following:
pip install git+https://github.com/amaiya/eli5@tfkeras_0_10_1
pip install git+https://github.com/amaiya/stellargraph@no_tf_dep_082

This code was tested on Ubuntu 18.04 LTS using TensorFlow 2.3.1 and Python 3.6.9.

How to Cite

Please cite the following paper when using ktrain:

@article{maiya2020ktrain,
    title={ktrain: A Low-Code Library for Augmented Machine Learning},
    author={Arun S. Maiya},
    year={2020},
    eprint={2004.10703},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    journal={arXiv preprint arXiv:2004.10703},
}


Creator: Arun S. Maiya

Email: arun [at] maiya [dot] net

Comments
  • Not support for loading pretrained  HuggingFace Transformers model from local path?

    Not support for loading pretrained HuggingFace Transformers model from local path?

    Failed to load pretrained HuggingFace Transformers model from my local machine. It seems only the hard-code models in the code can be loaded.

    MODEL_NAME = "D:\programming\models\tf_rbtl"
    t = text.Transformer(MODEL_NAME, maxlen=500,  
                         classes=["0", "1"])
    
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-15-e20b30887588> in <module>
          1 t = text.Transformer(MODEL_NAME, maxlen=500,  
    ----> 2                      classes=["0", "1"])
    
    d:\anaconda3.5\envs\adverse\lib\site-packages\ktrain\text\preprocessor.py in __init__(self, model_name, maxlen, classes, batch_size, multilabel, use_with_learner)
        838             raise ValueError('classes argument is required when multilabel=True')
        839         super().__init__(model_name,
    --> 840                          maxlen, max_features=10000, classes=classes, multilabel=multilabel)
        841         self.batch_size = batch_size
        842         self.use_with_learner = use_with_learner
    
    d:\anaconda3.5\envs\adverse\lib\site-packages\ktrain\text\preprocessor.py in __init__(self, model_name, maxlen, max_features, classes, lang, ngram_range, multilabel)
        719         self.name = model_name.split('-')[0]
        720         if self.name not in TRANSFORMER_MODELS:
    --> 721             raise ValueError('uknown model name %s' % (model_name))
        722         self.model_type = TRANSFORMER_MODELS[self.name][1]
        723         self.tokenizer_type = TRANSFORMER_MODELS[self.name][2]
    
    ValueError: uknown model name D:\programming\models\tf_rbtl
    
    opened by WangHexie 22
  • How to save SimpleQA trained model?

    How to save SimpleQA trained model?

    Hi,

    I've tried the provided sample for SimpleQA. In my output, it gave me:

    <IPython.core.display.HTML object>

    which I assume is the:

    #qa.display_answers(answers[:5])

    if I re-run the sample code, it complains there's already a directory where it tries to create index (good). If I leave out everything and rerun:

    qa = text.SimpleQA(INDEXDIR)

    it starts training again,.. another three hours :(

    This is my code now so I should get some output:

    `# load 20newsgroups datset into an array #from sklearn.datasets import fetch_20newsgroups #remove = ('headers', 'footers', 'quotes') #newsgroups_train = fetch_20newsgroups(subset='train', remove=remove) #newsgroups_test = fetch_20newsgroups(subset='test', remove=remove) #docs = newsgroups_train.data + newsgroups_test.data

    import ktrain from ktrain import text

    INDEXDIR = '/tmp/qa'

    #text.SimpleQA.initialize_index(INDEXDIR) #text.SimpleQA.index_from_folder('./Philosophy', INDEXDIR)

    qa = text.SimpleQA(INDEXDIR)

    answers = qa.ask('Why are we here?') top_answer = answers[0]['answer'] print(top_answer) top_answer = answers[1]['answer'] print(top_answer) top_answer = answers[2]['answer'] print(top_answer) top_answer = answers[3]['answer'] print(top_answer) top_answer = answers[4]['answer'] print(top_answer)

    #qa.display_answers(answers[:5])`

    How to I reload my already trained model?

    opened by staccDOTsol 19
  • Load and use trained model with ktrain

    Load and use trained model with ktrain

    I have a trained model in .h5 and preproc file format for racial recognition using ktrain library. How do I get to load and use the trained model at a later time.

    user question 
    opened by bulioses 18
  • Is it possible to use a RoBERTa-like model (Microsoft/codebert-base) for NER sequence-tagging

    Is it possible to use a RoBERTa-like model (Microsoft/codebert-base) for NER sequence-tagging

    Right now I am getting a tensor shape error and I feel like this has to be because of the model expecting a BERT like input and this goes wrong, could this be the case?

    enhancement 
    opened by Niekvdplas 17
  • error with text.transformer- roberta-base

    error with text.transformer- roberta-base

    Hi, I try to train a model based on "roberta-base". I try to run it on EC2 (p3.16xl) , and I got this error:

    Traceback (most recent call last):
      File "ktrain_transformer_training.py", line 119, in <module>
        learner, preproc = train_transformer(x_train, y_train, x_test, y_test)
      File "ktrain_transformer_training.py", line 98, in train_transformer
        model = t.get_classifier()
      File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/ktrain/text/preprocessor.py", line 1041, in get_classifier
        model = self._load_pretrained(mname, num_labels)
      File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/ktrain/text/preprocessor.py", line 1006, in _load_pretrained
        raise ValueError('could not load pretrained model %s using both from_pt=False and from_pt=True' % (mname))
    ValueError: could not load pretrained model roberta-base using both from_pt=False and from_pt=True
    

    However, the same code runs prefect on my local machine. can you help with this issue? Thanks

    user question 
    opened by Liranbz 16
  • unhashable type: 'numpy.ndarray'

    unhashable type: 'numpy.ndarray'

    TypeError: unhashable type: 'numpy.ndarray'

    When running this: history = learner.fit(LR, n_cycles=EPOCHS, checkpoint_folder='./') based on: # Preprocess training and validation data train_set = t.preprocess_train(X_train, y_train) val_set = t.preprocess_test(X_test, y_test)

    where X_train and X_test are lists while y_train, y_test are arrays

    user question 
    opened by realonbebeto 13
  • Models trained with ktrain do not work with Flask, uWSGI and NGINX

    Models trained with ktrain do not work with Flask, uWSGI and NGINX

    Hi, I am using ktrain to replace some of my Keras built models in production deployment. I have noticed a problem with models trained with ktrain. I think I might need to set some extra params in my NGINX and uWSGI while using the ktrain because I can't simply replace older models with ktrain models.

    I am using Flask, uWSGI, and NGINX to deploy my models. This setup is already in place with models trained with traditional Keras and TF2. But If I replace my Text Classification model with ktrain then it stops working.

    I have checked it individually with Flask and uWSGI, it is working fine till now. But as soon as I add NGINX server setup it stops working. There something happening inside ktrain APIs which is breaking it because if I do not use ktrain APIs it is working perfectly fine with all setup.

    At the front end, it is saying Server Timeout Error. I have checked internal logs to identify the issue and it is happening because uWSGI is not returning anything to NGINX. Although, If I run only Flask and uWSGI with ktrain model, it runs. I also tried to increase timeout time to 5m, 10m, 30m even after that connection timeout is coming. It is happening only when I can API which uses ktrain models. Other APIs which do not use ktrain models are working perfectly fine. i.e. Only those APIs are not working which uses ktrain models and others are working.

    I had one more problem with ktrain on Flask server but I have resolved that by turning off auto_reload on file change. Because it seems ktrain download or write something in the local directory that is why Flask was reloading when I call prediction. Although, this is resolved.

    I have tried with BERT, DistilBERT, Fast Text, GRU and so many other ways to figure out why It is not working with NGINX. Can you add your thoughts about what could be the reason?

    I have also created a [User Guide] (https://drive.google.com/file/d/1VW421zkmXkiQdoO1NWhVe21QOgVhVUnc/view?usp=sharing) to show you server settings. It will help to identify exact problem. If you can look at it and add your thoughts, it would be really great help. I want to use ktrain in production but got stuck here.

    opened by laxmimerit 13
  • Regarding Deployment on Flask

    Regarding Deployment on Flask

    Hi, i have an issue regarding deployment i am not able to deploy ktrain multi text classification model. I tried to load model and .preproc file but it does not work.

    user question 
    opened by ianuragbhatt 13
  • Cannot get learner from iterator zip object

    Cannot get learner from iterator zip object

    get_learner fails when the training data is a zip of iterators such as when it is used for image segmentation tasks (while augmenting images and masks together).

    EDIT:

    It works by hacking together a custom Iterator class, but it's not a particularly elegant hack...

    image_gen and mask_gen below are keras.preprocessing.image.ImageDataGenerator.flow_from_directory() objects.

    
    class Iterator():
        
        def __init__(self, image_gen, mask_gen):
            self.image_gen = image_gen
            self.mask_gen = mask_gen
            self.batch_size = image_gen.batch_size
            self.target_size = image_gen.target_size
            self.color_mode = image_gen.color_mode
            self.class_mode = image_gen.class_mode
            self.n = image_gen.n
            self.seed = image_gen.seed
            self.total_batches_seen = image_gen.total_batches_seen
        
        def __iter__(self):
            return self
        
        def __next__(self):
            return next(self.image_gen), next(self.mask_gen)
        
        def __getitem__(self, key):
            return self.image_gen[key], self.mask_gen[key]
    

    Any ideas how we could make this more elegant?

    enhancement 
    opened by gkaissis 13
  • Support for long sequences classification (transformer models)

    Support for long sequences classification (transformer models)

    It would be great if support could be added for long sequences over ~500 word pieces for the transformers models.

    Possible methods:

    1. sliding window over each sequence to generate sub-sequences output that is averaged before being feed to the classifier layer
    2. sliding window over each sequence to generate sub-sequences output feed to LSTM layer
    enhancement 
    opened by mdavis95 12
  • Question about external tutorial/example link

    Question about external tutorial/example link

    First of all, thank you for writing this library. Some time ago i finished my thesis/skripsi for Bachelor degree. The thesis is about evaluting ktrain on text domain where i released some of code/jupyter notebook at https://github.com/ilos-vigil/ktrain-assessment-study.

    My question is, would you like to include my repository as tutorial/example on ktrain's README.md?

    user question 
    opened by ilos-vigil 11
  • Only one GPU is used in the training for a mult-GPU machine with mirrored_strategy

    Only one GPU is used in the training for a mult-GPU machine with mirrored_strategy

    @amaiya As per the example in https://github.com/amaiya/ktrain/issues/78, I am trying to train the NER model on custom CONLL dataset(5.5million rows):

    • Number of sentences: 228051
    • Number of words in the dataset: 232634
    • Number of Labels: 29
    • Longest sentence: 106 words

    Model

    with mirrored_strategy.scope():
        model = txt.sequence_tagger('bilstm-transformer', preproc, wv_path_or_url='/home/user1/ktrain_data/cc.en.300.vec')
    learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=128)
    learner.fit(0.003, 5, cycle_len=1)
    

    Only 1st GPU seems to be used as reported by nvidia-smi utilizing most of the times 100%. Other GPUs gets consumer in single digit if they ever are consumed. What is missing? How do take advantage of all the GPUs?

    opened by amir1m 4
Releases(v0.32.3)
  • v0.32.3(Dec 13, 2022)

    0.32.3 (2022-12-12)

    new:

    • N/A

    changed

    • N/A

    fixed:

    • Changed NMF to accept optional parameters nmf_alpha_W and nmf_alpha_H based on changes in scikit-learn==1.2.0.
    • Change ktrain.utils to check for TensorFlow before doing a version check, so that ktrain can be imported without TensorFlow being installed.
    Source code(tar.gz)
    Source code(zip)
  • v0.32.2(Dec 12, 2022)

    0.32.2 (2022-12-12)

    new:

    • N/A

    changed

    • N/A

    fixed:

    • Changed call to NMF to use alpha_W instead of alpha, as alpha parameter was removed in scikit-learn==1.2. (#470)
    Source code(tar.gz)
    Source code(zip)
  • v0.32.1(Dec 12, 2022)

    0.32.1 (2022-12-11)

    new:

    • N/A

    changed

    • N/A

    fixed:

    • In TensorFlow 2.11, the tf.optimizers.Optimizer base class points the new keras optimizer that seems to have problems. Users should use legacy optimizers in tf.keras.optimizers.legacy with ktrain (which evidently will never be deleted). This means that, in TF 2.11, supplying a string representation of an optimizer like "adam" to model.compile uses the new optimizer instead of the legacy optimizers. In these cases, ktrain will issue a warning and automatically recompile the model with the default tf.keras.optimizers.legacy.Adam optimizer.
    Source code(tar.gz)
    Source code(zip)
  • v0.32.0(Dec 9, 2022)

    0.32.0 (2022-12-08)

    new:

    • Support for TensorFlow 2.11. For now, as recommended in the TF release notes, ktrain has been changed to use the legacy optimizers in tf.keras.optimizers.legacy. This means that, when compiling Keras models, you should supply tf.keras.optimizers.legacy.Adam() instead of the string "adam".
    • Support for Python 3.10. Changed references from CountVectorizer.get_field_names to CountVectorizer.get_field_names_out. Updated supported versions in setup.py.

    changed

    • N/A

    fixed:

    • fixed error in docs
    Source code(tar.gz)
    Source code(zip)
  • v0.31.10(Oct 1, 2022)

  • v0.31.9(Sep 24, 2022)

  • v0.31.8(Sep 8, 2022)

  • v0.31.7(Aug 4, 2022)

    0.31.7 (2022-08-04)

    new:

    • N/A

    changed

    • re-arranged dep warnings for TF
    • ktrain now pinned to transformers==4.17.0. Python 3.6 users can downgrade to transformers==4.10.3 and still use ktrain.

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.31.6(Aug 2, 2022)

    0.31.6 (2022-08-02)

    new:

    • N/A

    changed

    • updated dependencies to work with newer versions (but temporarily continue pinning to transformers==4.10.1)

    fixed:

    • fixes for newer networkx
    Source code(tar.gz)
    Source code(zip)
  • v0.31.5(Aug 1, 2022)

  • v0.31.4(Aug 1, 2022)

    0.31.4 (2022-08-01)

    new:

    • N/A

    changed

    • TextPredictor.explain and ImagePredictor.explain now use a different fork of eli5: pip install https://github.com/amaiya/eli5-tf/archive/refs/heads/master.zip

    fixed:

    • Fixed loss_fn_from_model function to work with DISABLE_V2_BEHAVIOR properly
    • TextPredictor.explain and ImagePredictor.explain now work with tensorflow>=2.9 and scipy>=1.9 (due to new eli5-tf fork -- see above)
    Source code(tar.gz)
    Source code(zip)
  • v0.31.3(Jul 16, 2022)

    0.31.3 (2022-07-15)

    new:

    • N/A

    changed

    • added alnum check and period check to KeywordExtractor

    fixed:

    • fixed bug in text.qa.core caused by previous refactoring of paragraph_tokenize and tokenize
    Source code(tar.gz)
    Source code(zip)
  • v0.31.2(May 20, 2022)

    0.31.2 (2022-05-20)

    new:

    • N/A

    changed

    • added truncate_to argument (default:5000) and minchars argument (default:3) argument to KeywordExtractor.extract_keywords method.
    • added score_by argument to KeywordExtractor.extract_keywords. Default is freqpos, which means keywords are now ranked by a combination of frequency and position in document.

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.31.1(May 17, 2022)

    0.31.1 (2022-05-17)

    new:

    • N/A

    changed

    • Allow for returning prediction probabilities when merging tokens in sequence-tagging (PR #445)
    • added basic ML pipeline test to workflow using latest TensorFlow

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.31.0(May 7, 2022)

    0.31.0 (2022-05-07)

    new:

    • The text.ner.models.sequence_tagger now supports word embeddings from non-BERT transformer models (e.g., roberta-base, codebert). Thanks to @Niekvdplas.
    • Custom tokenization can now be used in sequence-tagging even when using transformer word embeddings. See custom_tokenizer argument to NERPredictor.predict.

    changed

    • [breaking change] In the text.ner.models.sequence_tagger function, the bilstm-bert model is now called bilstm-transformer and the bert_model parameter has been renamed to transformer_model.
    • [breaking change] The syntok package is now used as the default tokenizer for NERPredictor (sequence-tagging prediction). To use the tokenization scheme from older versions of ktrain, you can import the re and string packages and supply this function to the custom_tokenizer argument: lambda s: re.compile(f"([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])").sub(r" \1 ", s).split().
    • Code base was reformatted using black and isort
    • ktrain now supports TIKA for text extraction in the text.textractor.TextExtractor package with the use_tika=True argument as default. To use the old-style text extraction based on the textract package, you can supply use_tika=False to TextExtractor.
    • removed warning about sentence pair classification to avoid confusion

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.30.0(Mar 28, 2022)

    0.30.0 (2022-03-28)

    new:

    • ktrain now supports simple, fast, and robust keyphrase extraction with the ktran.text.kw.KeywordExtractor module
    • ktrain now only issues a warning if TensorFlow is not installed, insteading of halting and preventing further use. This means that pre-trained PyTorch models (e.g., text.zsl.ZeroShotClassifier) and sklearn models (e.g., text.eda.TopicModel) in ktrain can now be used without having TensorFlow installed.
    • text.qa.SimpleQA and text.qa.AnswerExtractor now both support PyTorch with optional quantization (use framework='pt' for PyTorch version)
    • text.zsl.ZeroShotClassifier, text.translation.Translator, and text.translation.EnglishTranslator all support a quantize argument.
    • pretrained image-captioning and object-detection via transformers are now supported

    changed

    • reorganized imports
    • localized seqeval
    • The half parameter to text.translation.Translator, and text.translation.EnglishTranslator was changed to quantize and now supports both CPU and GPU.
    • TFDataset and SequenceDataset classes must not be imported as ktrain.dataset.TFDataset and ktrain.dataset.SequenceDataset.

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.29.3(Mar 9, 2022)

    0.29.3 (2022-03-09)

    new:

    • NERPredictor.predict now includes a return_offsets parameter. If True, the results will include character offsets of predicted entities.

    changed

    • In eda.TopicModel, changed lda_max_iter to max_iter and nmf_alpha to alpha
    • Added show_counts parameter to TopicModel.get_topics method
    • Changed qa.core._process_question to qa.core.process_question
    • In qa.core, added remove_english_stopwords and and_np parameters to process_question
    • The valley learning rate suggestion is now returned in learner.lr_estimate and learner.lr_plot (when suggest=True supplied to learner.lr_plot)

    fixed:

    • save TransformerEmbedding model, tokenizer, and configuration when saving NERPredictor and reset te_model to facilitate loading NERPredictors with BERT embeddings offline (#423)
    • switched from keras2onnx to tf2onnx, which supports newer versions of TensorFlow
    Source code(tar.gz)
    Source code(zip)
  • v0.29.2(Feb 9, 2022)

    0.29.2 (2022-02-09)

    new:

    • N/A

    changed

    • N/A

    fixed:

    • added get_tokenizer call to TransformersPreprocessor._load_pretrained to address issue #416
    Source code(tar.gz)
    Source code(zip)
  • v0.29.1(Feb 8, 2022)

    0.29.1 (2022-02-08)

    new:

    • N/A

    changed

    • pin to sklearn==0.24.2 due to breaking changes. This scikit-learn version change only really affects TextPredictor.explain. The eli5 fork supporting tf.keras updated for scikit-learn 0.24.2. To use scikit-learn==0.24.2, users must uninstall and re-install the eli5 fork with: pip install https://github.com/amaiya/eli5/archive/refs/heads/tfkeras_0_10_1.zip.

    fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.29.0(Jan 29, 2022)

    0.29.0 (2022-01-28)

    new:

    • New vision models: added MobileNetV3-Small and EfficientNet. Thanks to @ilos-vigil.

    changed

    • core.Learner.plot now supports plotting of any value that exists in the training History object (e.g., mae if previously specified as metric). Thanks to @ilos-vigil.
    • added raw_confidence parameter to QA.ask method to return raw confidence scores. Thanks to @ilos-vigil.

    fixed:

    • pin to transformers==4.10.3 due to Issue #398
    • pin to syntok==1.3.3 due to bug with syntok==1.4.1 causing paragraph tokenization in qa module to break
    • properly suppress TF/CUDA warnings by default
    • ensure document fed to keras_bert tokenizer to avoid this issue
    Source code(tar.gz)
    Source code(zip)
  • v0.28.3(Nov 5, 2021)

  • v0.28.2(Oct 18, 2021)

  • v0.28.1(Oct 18, 2021)

    0.28.1 (2021-10-17)

    New:

    • N/A

    Changed

    • added extra_requirements to setup.py
    • changed imports for summarization, translation, qa, and zsl in notebooks and tests

    Fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.28.0(Oct 13, 2021)

    0.28.0 (2021-10-13)

    New:

    • text.AnswerExtractor is a universal information extractor powered by a Question-Answering module and capable of extracting user-specfied information from texts.
    • text.TextExtractor is a text extraction pipeline (e.g., convert PDFs to plain text)

    Changed

    • changed transformers pin to transformers>=4.0.0,<=4.10.3

    Fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.27.3(Sep 3, 2021)

    0.27.3 (2021-09-03)

    New:

    • N/A

    Changed

    -N/A

    Fixed:

    • SimpleQA now can load PyTorch question-answering checkpoints
    • change API call to support newest causalnlp
    Source code(tar.gz)
    Source code(zip)
  • v0.27.2(Jul 28, 2021)

    0.27.2 (2021-07-28)

    New:

    • N/A

    Changed

    • N/A

    Fixed:

    • check for logits attribute when predicting using transformers
    • change raised Exception to warning for longer sequence lengths for transformers
    Source code(tar.gz)
    Source code(zip)
  • v0.27.1(Jul 20, 2021)

  • v0.27.0(Jul 20, 2021)

  • v0.26.5(Jul 16, 2021)

    0.26.5 (2021-07-15)

    New:

    • N/A

    Changed

    • added query parameter to SimpleQA.ask so that an alternative query can be used to retrieve contexts from corpus
    • added chardet as dependency for stellargraph

    Fixed:

    • fixed issue with TopicModel.build when threshold=None
    Source code(tar.gz)
    Source code(zip)
  • v0.26.4(Jun 23, 2021)

    0.26.4 (2021-06-23)

    New:

    • API documenation index

    Changed

    • Added warning when a TensorFlow version of selected transformers model is not available and the PyTorch version is being downloaded and converted instead using from_pt=True.

    Fixed:

    • Fixed utils.metrics_from_model to support alternative metrics
    • Check for AUC ktrain.utils "inspect" function
    Source code(tar.gz)
    Source code(zip)
Owner
Arun S. Maiya
computer scientist
Arun S. Maiya
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 1, 2023
A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

Sajjad Aemmi 10 Nov 23, 2021
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

null 4 Aug 2, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

null 55 Dec 5, 2022
piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

piSTAR Lab WARNING: This is an early release. Overview piSTAR Lab is a modular deep reinforcement learning platform built to make AI experimentation a

piSTAR Lab 0 Aug 1, 2022
Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad to your characters in Modo.

Applicator Kit for Modo Applicator Kit for Modo allow you to apply Apple ARKit Face Tracking data from your iPhone or iPad with a TrueDepth camera to

Andrew Buttigieg 3 Aug 24, 2021
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
Apply AnimeGAN-v2 across frames of a video clip

title emoji colorFrom colorTo sdk app_file pinned AnimeGAN-v2 For Videos ?? blue red gradio app.py false AnimeGAN-v2 For Videos Apply AnimeGAN-v2 acro

Nathan Raw 36 Oct 18, 2022
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper

null 1 Jan 25, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
Deep learning library for solving differential equations and more

DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need

Lu Lu 1.4k Dec 29, 2022
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

Python Experiments A Repository which contains python scripts to automate things

Vivek Kumar Singh 11 Sep 25, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022