Header-only library for using Keras models in C++.

Tobias Hermann

Last update: Jan 5, 2023

Related tags

Deep Learning python c-plus-plus machine-learning library deep-learning cpp tensorflow cpp14 keras prediction c-plus-plus-14 header-only convolutional-neural-networks keras-models edge-computing tinyml

Overview

frugally-deep

Use Keras models in C++ with ease

Introduction
Usage
Performance
Requirements and Installation
FAQ

Introduction

Would you like to build/train a model using Keras/Python? And would you like to run the prediction (forward pass) on your model in C++ without linking your application against TensorFlow? Then frugally-deep is exactly for you.

frugally-deep

is a small header-only library written in modern and pure C++.
is very easy to integrate and use.
depends only on FunctionalPlus, Eigen and json - also header-only libraries.
supports inference (model.predict) not only for sequential models but also for computational graphs with a more complex topology, created with the functional API.
re-implements a (small) subset of TensorFlow, i.e., the operations needed to support prediction.
results in a much smaller binary size than linking against TensorFlow.
works out-of-the-box also when compiled into a 32-bit executable. (Of course, 64 bit is fine too.)
utterly ignores even the most powerful GPU in your system and uses only one CPU core per prediction. ;-)
but is quite fast on one CPU core compared to TensorFlow, and you can run multiple predictions in parallel, thus utilizing as many CPUs as you like to improve the overall prediction throughput of your application/pipeline.

Supported layer types

Layer types typically used in image recognition/generation are supported, making many popular model architectures possible (see Performance section).

Add, Concatenate, Subtract, Multiply, Average, Maximum
AveragePooling1D/2D, GlobalAveragePooling1D/2D
Bidirectional, TimeDistributed, GRU, LSTM, CuDNNGRU, CuDNNLSTM
Conv1D/2D, SeparableConv2D, DepthwiseConv2D
Cropping1D/2D, ZeroPadding1D/2D
BatchNormalization, Dense, Flatten, Normalization
Dropout, AlphaDropout, GaussianDropout, GaussianNoise
SpatialDropout1D, SpatialDropout2D, SpatialDropout3D
RandomContrast, RandomFlip, RandomHeight
RandomRotation, RandomTranslation, RandomWidth, RandomZoom
MaxPooling1D/2D, GlobalMaxPooling1D/2D
ELU, LeakyReLU, ReLU, SeLU, PReLU
Sigmoid, Softmax, Softplus, Tanh
Exponential, GELU, Softsign
UpSampling1D/2D
Reshape, Permute, RepeatVector
Embedding

Also supported

multiple inputs and outputs
nested models
residual connections
shared layers
variable input shapes
arbitrary complex model architectures / computational graphs
custom layers (by passing custom factory functions to load_model)

Currently not supported are the following:

ActivityRegularization, AveragePooling3D, Conv2DTranspose (why), Conv3D, ConvLSTM2D, Cropping3D, Dot, GRUCell, LocallyConnected1D, LocallyConnected2D, LSTMCell, Masking, MaxPooling3D, RepeatVector, RNN, SimpleRNN, SimpleRNNCell, StackedRNNCells, ThresholdedReLU, Upsampling3D, temporal models

Usage

Use Keras/Python to build (model.compile(...)), train (model.fit(...)) and test (model.evaluate(...)) your model as usual. Then save it to a single HDF5 file using model.save('....h5', include_optimizer=False). The image_data_format in your model must be channels_last, which is the default when using the TensorFlow backend. Models created with a different image_data_format and other backends are not supported.
Now convert it to the frugally-deep file format with keras_export/convert_model.py
Finally load it in C++ (fdeep::load_model(...)) and use model.predict(...) to invoke a forward pass with your data.

The following minimal example shows the full workflow:

# create_model.py
import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

inputs = Input(shape=(4,))
x = Dense(5, activation='relu')(inputs)
predictions = Dense(3, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(loss='categorical_crossentropy', optimizer='nadam')

model.fit(
    np.asarray([[1, 2, 3, 4], [2, 3, 4, 5]]),
    np.asarray([[1, 0, 0], [0, 0, 1]]), epochs=10)

model.save('keras_model.h5', include_optimizer=False)

python3 keras_export/convert_model.py keras_model.h5 fdeep_model.json

// main.cpp
#include <fdeep/fdeep.hpp>
int main()
{
    const auto model = fdeep::load_model("fdeep_model.json");
    const auto result = model.predict(
        {fdeep::tensor(fdeep::tensor_shape(static_cast<std::size_t>(4)),
        std::vector<float>{1, 2, 3, 4})});
    std::cout << fdeep::show_tensors(result) << std::endl;
}

When using convert_model.py a test case (input and corresponding output values) is generated automatically and saved along with your model. fdeep::load_model runs this test to make sure the results of a forward pass in frugally-deep are the same as in Keras.

For more integration examples please have a look at the FAQ.

Performance

Below you can find the average durations of multiple consecutive forward passes for some popular models ran on a single core of an Intel Core i5-6600 CPU @ 3.30GHz. frugally-deep and TensorFlow were compiled (GCC ver. 7.1) with g++ -O3 -march=native. The processes were started with CUDA_VISIBLE_DEVICES='' taskset --cpu-list 1 ... to disable the GPU and to only allow usage of one CPU. (see used Dockerfile)

Model	Keras + TF	frugally-deep
`DenseNet121`	0.12 s	0.25 s
`DenseNet169`	0.13 s	0.28 s
`DenseNet201`	0.16 s	0.39 s
`InceptionV3`	0.21 s	0.32 s
`MobileNet`	0.05 s	0.15 s
`MobileNetV2`	0.05 s	0.17 s
`NASNetLarge`	0.83 s	4.03 s
`NASNetMobile`	0.08 s	0.32 s
`ResNet101`	0.22 s	0.45 s
`ResNet101V2`	0.21 s	0.42 s
`ResNet152`	0.31 s	0.65 s
`ResNet152V2`	0.29 s	0.61 s
`ResNet50`	0.13 s	0.26 s
`ResNet50V2`	0.12 s	0.22 s
`VGG16`	0.40 s	0.56 s
`VGG19`	0.49 s	0.68 s
`Xception`	0.25 s	1.20 s

Requirements and Installation

A C++14-compatible compiler: Compilers from these versions on are fine: GCC 4.9, Clang 3.7 (libc++ 3.7) and Visual C++ 2015
Python 3.7 or higher
TensorFlow and Keras 2.7.0 (This is the tested version, but somewhat older ones might work too.)

Guides for different ways to install frugally-deep can be found in INSTALL.md.

FAQ

See FAQ.md

Disclaimer

The API of this library still might change in the future. If you have any suggestions, find errors or want to give general feedback/criticism, I'd love to hear from you. Of course, contributions are also very welcome.

License

Distributed under the MIT License. (See accompanying file LICENSE or at https://opensource.org/licenses/MIT)

Comments

Problem with results of siamese CNN using EfficientNet

Hi there,

First of all let me thank you for this fantastic library!

Recently I got stuck on converting a siamese network that utilizes functional model and EfficientNetB0 architecture. I'm strictly following this repo for my development: https://github.com/sajadamouei/Person-Re-ID-with-light-weight-network. Since EfficientNetB0 uses FixedDropout and reduce layers that shrink the dimensionality (requires multiplying tensors by 1x1xDEPTH Conv) I had to implement them myself in the library. When I convert EfficientNetB0 on its own and load it in my C++ app, the output is EXACTLY as expected on both python and C++ side - no problems there. However, When I try to create siamese network out of them like presented here: https://github.com/sajadamouei/Person-Re-ID-with-light-weight-network/blob/master/model.py I get totally different results. In anticipation to your question - yes, I made super sure that the inputs to the network are EXACTLY the same on both sides - python and C++. I've tried everything to fix this and concluded that there must be something wrong with either the way frugally-deep deals with functional models OR the converter itself. What I also noticed is that tensors look completely different when they reach both Flatten layers in the architecture. Any ideas why this may be happening? Please look at the below screenshots to better understand the problem.

opened by pavel123 37
Hash value for json/net loaded?

I think it would be handy for us to have a hash over a loaded model (so I could store, together with the results, some indication of how they were generated - particularly handy for encodings, which tend to be incompatible). I could simply calculate a hash over the file/string used to initialise the net, but since many files could potentially result in the same net it would be nicer if the net itself could provide such a hash. Is such a function implemented or, if not, do you see an easy way to get such a hash?

Thanks

Sven

opened by utcke 36
How to convert model with "relu6" layer?

My Keras model uses "rule6" layer , how to change convert_model.py to make the json file? and any examples for adding custom layer in fdeep::load_model?

Thank you very much!

opened by binlbl 32
Using Eigen Unsupported modules to improve convolutions

I noticed that Eigen 3.3 has unsupported modules, including modules for Tensors and gemm operations.

https://bitbucket.org/eigen/eigen/src/9b065de03d016d802a25366ff5f0055df6318121/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default#markdown-header-convolutions

I noticed you implement your own gemm operation in fdeep/convolution.hpp in function convolve_im2col. This could be improved by using gemm functions from the eigen unsupported modules.

I ran a test by inferring the UNet model from pix2pix in frugally deep. It took 18s compared to a model converted from onnx and inferred in OpenCV which took 3s. I think this shows that convolutions in frugally could be improved.

Thanks

opened by pfeatherstone 32
Slow-ish run time on MSVC

Hi!

First of all thank you for this great library! :-) I've got a fairly small model (18 layers) for real-time applications, basically mainly consisting of 5 blocks of Conv2D/ReLu/MaxPool2D, and input size 64x64x3. I'm unfortunately seeing some speed problems with fdeep. A forward pass takes around 11ms in Keras, and it's taking 60ms in fdeep. (I've measured by calling predict 100x in a for-loop and then averaging - a bit crude but should do the trick for this purpose). I've compiled with the latest VS2017 15.5.5, Release mode, and default compiler flags (/O2). If I enable AVX2 and instrinsics, it goes down to 50ms, but still way too slow. (I've tried without im2col but it's even slower, around >10x).

I've run the VS profiler, but I'm not 100% sure I'm interpreting the results correctly. I think around 30%+5% of the total time is spent in Eigen's gebp and gemm functions, where we probably can't do much. Except maybe: I think I've seen you're using RowMajor storage for the Eigen matrices. Eigen is supposedly more optimised for its default, ColMajor storage. Would it be hard to change that in fdeep? Another 30% seems to be spent in convolve_im2col. But I'm not 100% sure where. I first thought it was the memcpy in eigen_mat_to_values but eigen_mat_to_values itself contains very few profiler samples only. There's also a lot of internal::transform and std::transform showing up in the profiler as well (internal::transform<ContainerOut>(reuse_t{}, f, std::forward<ContainerIn>(xs));) but I couldn't really figure out what the actual code is that this executes. I also saw that I think you pre-instantiate some convolution functions for common kernels. Most of my convolution kernels are 3x3, and it looks like you only instantiate n x m kernels for n and m equals 1 and 2. Could it help adding 3x3 there? So yea I'm really not sure about all of it. If indeed the majority of time is spent in Eigen's functions, then the RowMajor thing could indeed be a major problem.

I'm happy to send you the model and an example input via email if you wanted to have a look.

Here's some screenshots of the profiler:

Thank you very much!
enhancement

opened by patrikhuber 32
Input to model
If I have RGB Image and i want it to pass it to the model , what should i do ?

what i've made is flatten the input image into vector of float , i appened the r , g , b values after each others to get just 1 vector called "input_vector"

and then this is the next step.

typedef fplus::shared_ref<std::vector<float>> shared_float_vec; shared_float_vec x(fplus::make_shared_ref<vector<float>>(std::move(input_vector))); const auto result = decision_model.predict({fdeep::tensor3(fdeep::shape3(3,60,60),x)});

then the output is incorrect , what should i do then ? or what i've done wrong ?
opened by rmmal 32
lambda layer using tf.image

I am using Lambda layer which includes this function to extract patches in image

patch_one = tf.image.extract_glimpse(inputs[0], [26, 26], inputs[1][:, j, :], centered=False, normalized=False, noise='zero')

Is it possible to implement this custom layer in your library and load model?

opened by katmatus 30

Stop at the 'Loading json ...'

Hi Tobias, thanks for this great library! I trained a ResNet50 network using Keras. I was able to convert the .h5 model to a .json. However, when I run the program as follows:

#include <fdeep/fdeep.hpp>
#include <opencv2/opencv.hpp>

int main()
{
	const cv::Mat image = cv::imread("Image_1_2.jpg");
	cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
	assert(image.isContinuous());
	const auto model = fdeep::load_model("train7.json");
	// Use the correct scaling, i.e., low and high.
	const auto input = fdeep::tensor5_from_bytes(image.ptr(),
		static_cast<std::size_t>(image.rows),
		static_cast<std::size_t>(image.cols),
		static_cast<std::size_t>(image.channels()),
		0.0f, 1.0f);
	const auto result = model.predict_class({ input });
	std::cout << result << std::endl;
	system("pause");
}

It likes the example in the FAQ--How to use images loaded with OpenCV as input for a model? But it doesn't work with my Keras model. It just spent about 236s to load json, and then stop here. My CPU is Core i5-3230M, which is not a good CPU. My model is used to classify 7 kinds of algae cells, which used transfer learning based on ResNet50.
The python program for trainning model as follows:

import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import Model, layers
from keras.models import load_model

input_path = "data/LvsRod/"

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_directory(
    input_path + 'train',
    batch_size=10,
    class_mode='binary',
    target_size=(224, 224))

validation_datagen = ImageDataGenerator(
    rescale=1. / 255,
    preprocessing_function=preprocess_input)

validation_generator = validation_datagen.flow_from_directory(
    input_path + 'validation',
    shuffle=False,
    class_mode='binary',
    target_size=(224, 224))

conv_base = ResNet50(include_top=False, weights='imagenet', input_shape=(224, 224, 3))

for layer in conv_base.layers:
    layer.trainable = False

x = conv_base.output
x = layers.Flatten()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
predictions = layers.Dense(7, activation='softmax')(x)
model = Model(conv_base.input, predictions)

optimizer = keras.optimizers.SGD(lr=1e-4, momentum=0.9)
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

history = model.fit_generator(generator=train_generator,
                              steps_per_epoch=10,  # added in Kaggle
                              epochs=30,
                              validation_data=validation_generator,
                              validation_steps=10  # added in Kaggle
                             )

# save
model.save('train7.h5')

The h5model can download from this

URL:https://pan.baidu.com/s/1YkuBHBkjjUs2dcpc8XTLqA
Extraction code：1od1

Because the file is too big, so I cannot upload it here. I really want to know how to solve the problem.

opened by callmefish 28

Bad performance.

Hi Tobias,

I am getting a bad performance when using frugally-deep and I wanted to ask you about some advice. Of course I've read FAQ about the performance so I got that covered.

Here is what I've tested so far:

| Environment| Description | Time | |----------|:-------------|------:| | Python | Default settings (GPU ON) | 35ms| | Python | os.environ['CUDA_VISIBLE_DEVICES']='-1' | 45ms| | Python | NO GPU and tf.config.threading.set_intra_op_parallelism_threads(1) | 75ms| | Visual Studio 2017 | Default (Release -O2, whole program optimization) | 310ms| | Visual Studio 2017 | Compiled with AVX2 | 280ms|

It is quite interesting that single switch (AVX2) gave me 10% boost! but it is still far, very far from what you have advocated.

I did run a benchmark and here is what I've got:

Any ideas? Could I send you my model and example code? (privately as this is for the job, I will be happy to support you if I get paid for the project :) ).

opened by TrueWodzu 27
Cannot load InceptionV3 model

So, I successfully loaded some models and predicted them.

Yet, when I tried to load InceptionV3 model, I get an error. There was not any errors when I converted the model from 'h5' to 'json' but the code below does not work.

The error I got

opened by Terminou 25
Frugally LSTM Encoder-Decoder results different from Keras/Tensorflow LSTM Encoder-Decoder (missing support for initial_state)
Hi @Dobiasd

I have been working on the Encoder-Decoder model for Vehicle Path Forecasting since you added support for returned_states and show_tensor5 on LSTM-based models. The workflow of the project was described on this past issue. After some experiments, the LSTM-Based encoder and decoder models are not giving me any problem related to returned_states = True or show_tensor5, confirming frugally-deep fixes worked. However, I have been trying to replicate the results I obtained using the Keras/Tensorflow models without success.

The frugally-deep fdeep_encoder_model_NT is returning the exact same encoder_hidden_state and encoder_cell_state states compared to its Tf + Keras counterparts using the encoder_model.hdf5. However, the fdeep_decoder_model_NT is not giving me the same decoder_hidden_state and decoder_cell_state output states (compared to the results using Tf + Keras encoder_model.hdf5) :(

Specifically, I develop the decoder inference model using TF + Keras (please refer yourself to past comments in this issue to see the corresponding code), and then converted it from .hdf5 to .json, ready to be ported into the C++ application (same as with the encoder model). Validating the encoder states: However, both frugally-deep decoder_hidden_state and decoder_cell_state differ from corresponding Keras-based decoder_hidden_state and decoder_cell_state: Resulting, as expected, in a wrong bounding box prediction: which does not match with the corresponding Keras Results: I do not really know about what is happening with fdeep_decoder_model_NT, so I have various options in mind:

I have trained another model using LSTM instead CuDNNLSTM layers in order to check if the problem is with CuDNNLSTM layer implementation. However, the problem is still present when using other LSTM-based cells like CuDNNLSTM and LSTM. The fdeep_encoder_model works well but fdeep_decoder_model is still making wrong predictions (both at states returned and next bbox prediction).

Now I am working in the main.cpp file. Maybe the problem is inside my internal manipulation of fdeep::tensor5 and fdeep::tensor5s when feeding the data into the ported models. However, both models are working well, except that the decoder's model is making (inaccurate) predictions of future bounding boxes, but it did not crash in any step of the script execution.

I am puzzled about the following fact: At main.cpp the decoders predictions is made with the following command: auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(0), encoder_states.at(1)});, where encoder_states.at(0) and encoder_states.at(1) represent h_enc and c_enc respectively. However, I tried by interchanging the encoder states at the input of the decoder prediction line like this: auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(1), encoder_states.at(0)}); and obtaining the exact same predicted_next_box (even though I interchanged the input order of decoder_states at the prediction function).

Finally, apart from the wrong values of h_dec and c_dec returned by fdeep_decoder_model, I noticed both h_dec hidden states (from frugally AND Keras) are in the range [-1, 1], but that does not occur to c_dec hidden states. In Keras, c_dec have values from [-11, 11] but, in frugally, c_dec takes values from [-1, 1]. In addition, based on your suggestion about internal scaling causing this kind of issues, by inspecting the fdeep_encoder_model.json, there are some initializers parameters that are using Variance_Scaling parameter inside that maybe are the cause of errors at inference-time. I think maybe this at the root of the problem but I have no idea of how to get the correct h_enc and c_enc, both between the same ranges used in Keras and with the correct values as well.

Here is the main.cpp file I am running to test the results. Any comment or suggestion about the code would be welcomed!

#include <fdeep/fdeep.hpp> #include <vector> #include <fstream> #include <iostream> int main() { // Loading the previously trained models const auto encoder_model = fdeep::load_model("fdeep_encoder_model_NT.json"); std::cout << "Encoder Model Loaded!" << std::endl; const auto decoder_model = fdeep::load_model("fdeep_decoder_model_NT.json"); std::cout << "Decoder Model Loaded!" << std::endl; // Batch_size = 1, num_timesteps = 10 and num_features = 4 fdeep::shape5 in_traj_shape(1,1,1,10,4); // Loading a sample sequence trajectory into tensor5 data structure const std::vector<float> src_traj = {1728, 715, 191, 221, 1717, 710, 202, 215, 1706, 704, 206, 198, 1695, 700, 217, 196, 1687, 696, 228, 183, 1680, 689, 240, 181, 1668, 668, 240, 198, 1661, 668, 243, 194, 1650, 664, 251, 189, 1635, 660, 266, 181}; // Input trajectory from vector to tensor5 data structure const fdeep::shared_float_vec shared_traj(fplus::make_shared_ref<fdeep::float_vec>(src_traj)); const fdeep::tensor5 encoder_inputs(in_traj_shape, shared_traj); std::cout << "Trajectory #0!" << fdeep::show_tensor5(encoder_inputs) << std::endl; // Using loaded encoder model to predict encoder output states // Then encoder_states can be feed as input tensors into decoder_model const auto encoder_states = encoder_model.predict({encoder_inputs}); // Printing for debbuging purposes std::cout << "h_enc: "<< fdeep::show_tensor5(encoder_states.at(0)) << std::endl; std::cout << "c_enc: "<< fdeep::show_tensor5(encoder_states.at(1)) << std::endl; // Creating a SOS input sequence token to signal decoder model to start making predictions fdeep::shape5 bbox_shape(1,1,1,1,4); // Loading a sample sequence trajectory into tensor5 data structure const std::vector<float> SOS_token = {9999.0, 9999.0, 9999.0, 9999.0}; const fdeep::shared_float_vec shared_SOS_token(fplus::make_shared_ref<fdeep::float_vec>(SOS_token)); fdeep::tensor5 target_seq(bbox_shape, shared_SOS_token); // In Python we have: Prediction, h, c = decoder_model.predict([target_seq] + state) auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(1), encoder_states.at(0)}); // Printing for debugging purposes std::cout << "h_dec: "<< fdeep::show_tensor5(decoder_outputs.at(1)) << std::endl; std::cout << "c_dec: "<< fdeep::show_tensor5(decoder_outputs.at(2)) << std::endl; std::cout << "Predicted next bounding box!" << fdeep::show_tensor5(decoder_outputs.at(0)) << std::endl; }

The fdeep_encoder_model_NT.json model imported into the C++ application is avaliable to download and inspect from this past comment. The fdeep_decoder_model_NT.json can be downloaded from the following link: Decoder model: https://drive.google.com/open?id=1hwrjcnNfWaqQI0o8TmJKtfsAwj6zd9aq I would really appreciate any help with this issue. I am puzzled because the encoder model is working perfectly but the decoder model does not, specifically, the results between the Keras vs Frugally decoder models differ, giving me wrong output predictions that cannot be used at all.
opened by MarlonCajamarca 25
`visualize_layers.py` uses `scipy.misc.imsave` which no longer exists

The documentation suggests switching to imageio.imwrite instead: https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.misc.imsave.html

There's even a migration guide: https://imageio.readthedocs.io/en/v2.6.1/scipy.html

Another alternative would be keras.preprocessing.image.save_img.

opened by torokati44 0
Modify Unit Tests CmakeLists and INSTALL.md

Modify Unit Tests CmakeLists.txt to let Cmake detect Python to execute command instead of using "python3 xxxx", because not all user can use "python3" to run python scripts. The command to convert h5 to json may be failed because of command "python3". I add find_package to detect Python and try to check pip.exe. pip3.exe etc. to check Tensorflow using "pip show tensorflow" to make sure user has install tensorflow. The requirment of Python and tensorflow is written in INSTALL.md.

opened by sirius-william 4
Thanks !

Thank the project author very much! My graduate design project is a one-dimensional convolutional neural network. After training with Python's TensorFlow 2.10, I have been looking for ways to deploy the model in my Qt project. I have tried to compile TensorFlow C++(compilation always fails), TensorFlow C API (TensorFlow 2.10 is not supported), TensorRT (AMD graphics driver is not supported), OpenVino (the network architecture I choose is not supported). By chance, I found this library in Google. It is easy to use and does not require much dependence. It only requires header files. It perfectly solves my project needs. Thank you! PS. When using, the python script part is executed in CMakeList.txt in the test, using python3 xxxx. However, not all users can run Python scripts through the command 'python3'. It is recommended to find Python in CMakeLists.txt, or let users specify Python paths. In addition, Mingw will report Fatal error: can't write 286 bytes to section. text when compiling unittest. It is recommended to add: target_ compile_ options(PROJECT_NAME PRIVATE $<$<CXX_ COMPILER_ ID:MSVC>:/bigobj> $<$<CXX_ COMPILER_ ID:GNU>:-Wa,-mbig-obj>) This problem also arises when the library is used in other projects. #

opened by sirius-william 2
Consider having different convolution implementations available and choosing the fastest one at runtime
Different convolution implementations might perform differently depending on the convolution settings (input size/depth, kernel size/count) and depending on the hardware (mostly CPU/memory) used.

Right now, for example, we have a special implementation used for 2D convolutions in case strides = (1, 1) (which utilized not only by the Conv2D layer, but also by DepthwiseConv2D, and SeparableConv2D).

I wonder if it would make sense to provide a function to the user, that when called on a model, tries out different implementations and remembers which one performed best for future calls of model.predict. (Maybe in some settings, event a naive non-im2col convolution is the fastest one.)

Pros:

potentially faster forward passes

Cons:

increased code complexity

potentially wrong settings in case the background load on the user's machine varies too much during the evaluation
opened by Dobiasd 0
Feature Suggestion: Support Transformer Models

First off, I would like to say that this is a really great piece of work! I have been using it with LSTMs for time-series data and have found frugally-deep to be invaluable. I am starting to investigate Transformers in order to see how they stack up to LSTMs and it would be wonderful if support for Transformer models could be added. I am in the early stages of working with Transformers, but the specific layers that I currently do not see supported are: MultiHeadAttention and LayerNormalization.
help wanted

opened by jonathan-lazzaro-nnl 11
Feature suggestion: Support ONNX models?

How about supporting ONNX in frugally? You could have a protobuf importer for ONNX models or add a tool which converts ONNX to the JSON format you use? Just a thought. A header only ONNX inference engine would be very very useful.

opened by pfeatherstone 24

Releases(v0.15.19-p0)

v0.15.19-p0(Jul 22, 2022)
Fixes interpretation of axis value in Normalization layer (issue 357)

Source code(tar.gz)
Source code(zip)
v0.15.18-p0(Jun 11, 2022)
Improved performance (with strides==(1, 1)) of Conv2D, DepthwiseConv2D, and SeparableConv2D.

JSON export: Reduced memory usage and output.file size.

Source code(tar.gz)
Source code(zip)
v0.15.17-p0(May 12, 2022)
Added support for the Rescaling layer.

Source code(tar.gz)
Source code(zip)
v0.15.16-p0(Mar 21, 2022)
Fixed missing inline keyword.

Source code(tar.gz)
Source code(zip)
v0.15.15-p0(Mar 20, 2022)
Improved performance of SeparableConv2D and DepthwiseConv2D layers

Source code(tar.gz)
Source code(zip)
v0.15.14-p0(Mar 13, 2022)
improved checks and docs

added support for FixedDropout (noop)

added support for tensor expansion in Multiply layer

Source code(tar.gz)
Source code(zip)
v0.15.13-p0(Nov 27, 2021)
Added support for Normalization layer.

Source code(tar.gz)
Source code(zip)
v0.15.12-p0(Aug 26, 2021)
Added support for negative_slope and threshold in ReLU layers

Source code(tar.gz)
Source code(zip)
v0.15.11-p0(Aug 26, 2021)
Added support for relu6 activation

Source code(tar.gz)
Source code(zip)
v0.15.10-p0(Aug 11, 2021)
Added support for layers RandomContrast, RandomFlip, RandomHeight, RandomTranslation, RandomWidth, and RandomZoom (all no-ops during prediction)

Source code(tar.gz)
Source code(zip)
v0.15.9-p0(Aug 7, 2021)
Added support for activations exponential, gelu, and softsign

Source code(tar.gz)
Source code(zip)
v0.15.8-p0(Aug 1, 2021)
Added support for RepeatVector layer

Source code(tar.gz)
Source code(zip)
v0.15.7-p0(Jul 8, 2021)
Added support for swish activation.

Added support for full models inside TimeDistributed layers.

Source code(tar.gz)
Source code(zip)
v0.15.6-p0(Jul 7, 2021)
Removed the broken workaround (for training=true) introduced in release v0.15.6-p0

Source code(tar.gz)
Source code(zip)
v0.15.5-p0(Jul 6, 2021)
Added compatibility for models including layers with training=True by ignoring the flag.

Source code(tar.gz)
Source code(zip)
v0.15.4-p0(Jul 3, 2021)
Added support for using BatchNormalization layers as the inner layer in TimeDistributed layers. :zany_face:

Source code(tar.gz)
Source code(zip)
v0.15.3-p0(Jul 2, 2021)
Added support for shape inference in Reshape layers.

Improved some docs.

Source code(tar.gz)
Source code(zip)
v0.15.2-p0(Feb 23, 2021)
Fix average_pooling_2d_layer and max_pooling_2d_layer for channels_first

Fix MSVC Compiler Warning C4701 "potentially uninitialized local variable" in time_distributed_layer.hpp

Add support for RandomRotation layer (no-op in prediction)

Improved documentation

Source code(tar.gz)
Source code(zip)
v0.15.1-p0(Aug 15, 2020)
Adds support for duplicate layer names in nested models (see issue #237)

Source code(tar.gz)
Source code(zip)
v0.15.0-p0(Aug 13, 2020)

Update TensorFlow to version 2.3
Source code(tar.gz)
Source code(zip)
v0.14.4-p0(Jul 2, 2020)
Use the latest version of Use FunctionalPlus (0.2.8).

Some CMake improvements.

Source code(tar.gz)
Source code(zip)
v0.14.3-p0(Jun 2, 2020)

Further performance improvements for 2d convolutions with big input tensors (see issue 226 and issue 227), while also reducing memory usage.
Source code(tar.gz)
Source code(zip)
v0.14.2-p0(May 30, 2020)

Fix some edge case for very small convolutions.
Source code(tar.gz)
Source code(zip)
v0.14.1-p0(May 30, 2020)
Improved performance of convolution (Conv2D) on large input tensors, while also reducing memory usage.

Source code(tar.gz)
Source code(zip)
v0.14.0-p0(May 21, 2020)
Tensors are now stored in aligned memory blocks according to Eigen::aligned_allocator<T> for performance.

Source code(tar.gz)
Source code(zip)
v0.13.1-p0(Apr 21, 2020)
Improve performance of LSTM and GRU

Source code(tar.gz)
Source code(zip)
v0.13.0-p0(Apr 10, 2020)
Update Tensorflow to version 2.1.

Subsequent adjustments of RNN layers.

Source code(tar.gz)
Source code(zip)
v0.12.1-p0(Feb 28, 2020)

Improved performance for LTSM and GRU.
Source code(tar.gz)
Source code(zip)
v0.12.0-p0(Feb 26, 2020)
Tensor shapes and positions now explicitly track the tensor's rank.

breaking changes:

fdeep::tensor5 has been renamed to fdeep::tensor

fdeep::tensor5_pos has been renamed to fdeep::tensor_pos

fdeep::shape5 has been renamed to fdeep::tensor_shape

dropped support for shape inference in reshape layers

deprecated functions (will likely be removed from the API soon)

float_type fdeep::tensor5::get(std::size_t, std::size_t, std::size_t, std::size_t, std::size_t) const: Please use float_type fdeep::tensor5::get(const tensor_pos&) const or float_type fdeep::tensor5::get_ignore_rank(const tensor_pos&) const instead.

void fdeep::tensor5::set(std::size_t, std::size_t, std::size_t, std::size_t, std::size_t, float_type): Please use float_type fdeep::tensor5::set(const tensor_pos, float_type) or float_type fdeep::tensor5::set_ignore_rank(const tensor_pos&, float_type) instead.

Source code(tar.gz)
Source code(zip)
v0.11.1-p0(Dec 17, 2019)
Support for batch normalization on arbitrary axes

Improved error messages

Source code(tar.gz)
Source code(zip)