🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Overview

Thinc: A refreshing functional take on deep learning, compatible with your favorite libraries

From the makers of spaCy, Prodigy and FastAPI

Thinc is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow and MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models. Previous versions of Thinc have been running quietly in production in thousands of companies, via both spaCy and Prodigy. We wrote the new version to let users compose, configure and deploy custom models built with their favorite framework.

Azure Pipelines codecov Current Release Version PyPi Version conda Version Python wheels Code style: black Open demo in Colab

🔥 Features

  • Type-check your model definitions with custom types and mypy plugin.
  • Wrap PyTorch, TensorFlow and MXNet models for use in your network.
  • Concise functional-programming approach to model definition, using composition rather than inheritance.
  • Optional custom infix notation via operator overloading.
  • Integrated config system to describe trees of objects and hyperparameters.
  • Choice of extensible backends.
  • Read more →

🚀 Quickstart

Thinc is compatible with Python 3.6+ and runs on Linux, macOS and Windows. The latest releases with binary wheels are available from pip. Before you install Thinc and its dependencies, make sure that your pip, setuptools and wheel are up to date. For the most recent releases, pip 19.3 or newer is recommended.

pip install -U pip setuptools wheel
pip install thinc --pre

⚠️ Note that Thinc 8.0 is currently in alpha preview and not necessarily ready for production yet.

See the extended installation docs for details on optional dependencies for different backends and GPU. You might also want to set up static type checking to take advantage of Thinc's type system.

⚠️ If you have installed PyTorch and you are using Python 3.7+, uninstall the package dataclasses with pip uninstall dataclasses, since it may have been installed by PyTorch and is incompatible with Python 3.7+.

📓 Selected examples and notebooks

Also see the /examples directory and usage documentation for more examples. Most examples are Jupyter notebooks – to launch them on Google Colab (with GPU support!) click on the button next to the notebook name.

Notebook Description
intro_to_thinc
Open in Colab
Everything you need to know to get started. Composing and training a model on the MNIST data, using config files, registering custom functions and wrapping PyTorch, TensorFlow and MXNet models.
transformers_tagger_bert
Open in Colab
How to use Thinc, transformers and PyTorch to train a part-of-speech tagger. From model definition and config to the training loop.
pos_tagger_basic_cnn
Open in Colab
Implementing and training a basic CNN for part-of-speech tagging model without external dependencies and using different levels of Thinc's config system.
parallel_training_ray
Open in Colab
How to set up synchronous and asynchronous parameter server training with Thinc and Ray.

View more →

📖 Documentation & usage guides

Introduction Everything you need to know.
Concept & Design Thinc's conceptual model and how it works.
Defining and using models How to compose models and update state.
Configuration system Thinc's config system and function registry.
Integrating PyTorch, TensorFlow & MXNet Interoperability with machine learning frameworks
Layers API Weights layers, transforms, combinators and wrappers.
Type Checking Type-check your model definitions and more.

🗺 What's where

Module Description
thinc.api User-facing API. All classes and functions should be imported from here.
thinc.types Custom types and dataclasses.
thinc.model The Model class. All Thinc models are an instance (not a subclass) of Model.
thinc.layers The layers. Each layer is implemented in its own module.
thinc.shims Interface for external models implemented in PyTorch, TensorFlow etc.
thinc.loss Functions to calculate losses.
thinc.optimizers Functions to create optimizers. Currently supports "vanilla" SGD, Adam and RAdam.
thinc.schedules Generators for different rates, schedules, decays or series.
thinc.backends Backends for numpy and cupy.
thinc.config Config parsing and validation and function registry system.
thinc.util Utilities and helper functions.

🐍 Development notes

Thinc uses black for auto-formatting, flake8 for linting and mypy for type checking. All code is written compatible with Python 3.6+, with type hints wherever possible. See the type reference for more details on Thinc's custom types.

👷‍♀️ Building Thinc from source

Building Thinc from source requires the full dependencies listed in requirements.txt to be installed. You'll also need a compiler to build the C extensions.

git clone https://github.com/explosion/thinc
cd thinc
python -m venv .env
source .env/bin/activate
pip install -U pip setuptools wheel
pip install -r requirements.txt
pip install --no-build-isolation .

Alternatively, install in editable mode:

pip install -r requirements.txt
pip install --no-build-isolation --editable .

Or by setting PYTHONPATH:

export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace

🚦 Running tests

Thinc comes with an extensive test suite. The following should all pass and not report any warnings or errors:

python -m pytest thinc    # test suite
python -m mypy thinc      # type checks
python -m flake8 thinc    # linting

To view test coverage, you can run python -m pytest thinc --cov=thinc. We aim for a 100% test coverage. This doesn't mean that we meticulously write tests for every single line – we ignore blocks that are not relevant or difficult to test and make sure that the tests execute all code paths.

Comments
  • thinc_gpu_ops not built properly

    thinc_gpu_ops not built properly

    windows 10, cuda10, spacy100, python3.6

    I tried to check the folder as https://github.com/explosion/thinc/issues/79#issuecomment-461230144 and did not see the cpython file.

    Then I proceed to build gpu_ops by running "pip install --force-reinstall --no-binary :all: thinc-gpu-ops"

    But still there is no cpython file being generated.

    thinc_gpu_ops.AVAILABLE evaluates to False

    Thank you, D

    opened by ghost 39
  • module 'thinc_gpu_ops' has no attribute 'hash'

    module 'thinc_gpu_ops' has no attribute 'hash'

    On Win10, thinc 6.12.0, spacy 2.0.16, cupy 4.1.0

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-5-2432a7701a48> in <module>()
          1 # defining doc
    ----> 2 doc = nlp("Jill laughed at John Johnson.")
          3 spacy.displacy.render(doc, style='dep', options={'distance' : 140}, jupyter=True)
    
    E:\Anaconda3python\lib\site-packages\spacy\language.py in __call__(self, text, disable)
        344             if not hasattr(proc, '__call__'):
        345                 raise ValueError(Errors.E003.format(component=type(proc), name=name))
    --> 346             doc = proc(doc)
        347             if doc is None:
        348                 raise ValueError(Errors.E005.format(name=name))
    
    pipeline.pyx in spacy.pipeline.Tagger.__call__()
    
    pipeline.pyx in spacy.pipeline.Tagger.predict()
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(self, X)
         53     def predict(self, X):
         54         for layer in self._layers:
    ---> 55             X = layer(X)
         56         return X
         57 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(seqs_in)
        291     def predict(seqs_in):
        292         lengths = layer.ops.asarray([len(seq) for seq in seqs_in])
    --> 293         X = layer(layer.ops.flatten(seqs_in, pad=pad))
        294         return layer.ops.unflatten(X, lengths, pad=pad)
        295 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in predict(self, X)
         53     def predict(self, X):
         54         for layer in self._layers:
    ---> 55             X = layer(X)
         56         return X
         57 
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in __call__(self, x)
        159             Must match expected shape
        160         '''
    --> 161         return self.predict(x)
        162 
        163     def pipe(self, stream, batch_size=128):
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\model.py in predict(self, X)
        123 
        124     def predict(self, X):
    --> 125         y, _ = self.begin_update(X)
        126         return y
        127 
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in uniqued_fwd(X, drop)
        372                                                     return_counts=True)
        373         X_uniq = layer.ops.xp.ascontiguousarray(X[ind])
    --> 374         Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
        375         Y = Y_uniq[inv].reshape((X.shape[0],) + Y_uniq.shape[1:])
        376         def uniqued_bwd(dY, sgd=None):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(self, X, drop)
         59         callbacks = []
         60         for layer in self._layers:
    ---> 61             X, inc_layer_grad = layer.begin_update(X, drop=drop)
         62             callbacks.append(inc_layer_grad)
         63         def continue_update(gradient, sgd=None):
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in begin_update(X, *a, **k)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in <listcomp>(.0)
        174     def begin_update(X, *a, **k):
        175         forward, backward = split_backward(layers)
    --> 176         values = [fwd(X, *a, **k) for fwd in forward]
        177 
        178         output = ops.xp.hstack(values)
    
    E:\Anaconda3python\lib\site-packages\thinc\api.py in wrap(*args, **kwargs)
        256     '''
        257     def wrap(*args, **kwargs):
    --> 258         output = func(*args, **kwargs)
        259         if splitter is None:
        260             to_keep, to_sink = output
    
    E:\Anaconda3python\lib\site-packages\thinc\neural\_classes\hash_embed.py in begin_update(self, ids, drop)
         49         if ids.ndim >= 2:
         50             ids = self.ops.xp.ascontiguousarray(ids[:, self.column], dtype='uint64')
    ---> 51         keys = self.ops.hash(ids, self.seed) % self.nV
         52         vectors = self.vectors[keys].sum(axis=1)
         53         mask = self.ops.get_dropout_mask((vectors.shape[1],), drop)
    
    ops.pyx in thinc.neural.ops.CupyOps.hash()
    
    AttributeError: module 'thinc_gpu_ops' has no attribute 'hash'
    
    opened by j2l 32
  • More general remap_ids

    More general remap_ids

    The remap_ids layer in spaCy is never used as far as I can tell, but I am implementing a MultiEmbed version of MultiHashEmbed. This uses the more conventional Embed layer instead of the HashEmbed.

    Before running Embed it runs remap_ids to convert token.i into positions in the embedding table: chain(remap_ids(), Embed().

    To be able to use it as a replacement for MultiHashEmbed a couple modifications had to be made. HashEmbed takes a column attribute, because in spaCy the FeatureExtractor returns an Ints2d which is number of tokens x number of features. To be able to run the same idea with Embed the remap_ids now also implements column: chain(remap_ids(column=column), Embed(column=0)).

    Furthermore in the common case where the mapping_table is a Dict[int, int] there was a bit of a strange error: whereas with numpy this works {3: 'q'}[np.array([3])[0]] with cupy we get:

    ----> 1 {3: 'q'}[cp.array([3])[0]]
    
    TypeError: unhashable type: 'cupy._core.core.ndarray'
    

    In the new version when we detect a cupy integer array we cast the elements to int.

    Finally, the backprop used to return a [], but when the input is an xp array then this can lead to confusing errors e.g.: if the remap_ids is within a with_array block as is the case with Embed. Now it returns an empty array instead.

    enhancement feat / layers 
    opened by kadarakos 21
  • Add `SparseLinear_v2`, fixing indexing issues

    Add `SparseLinear_v2`, fixing indexing issues

    Introduce SparseLinear_v2 to fix indexing issues SparseLinear does not correctly index the gradient/weight matrix (#752). This change fixes the indexing, so that the full matrix is used.

    To retain compatibility with existing models that use SparseLinear, which works relatively well if there are not too many hash collisions, the fixed version is renamed to SparseLinear_v2.


    While at it, fix another indexing-related bug:

    The output of MurMur hashes were mapped to array indices as follows:

    idx = hash & (nr_weight-1)
    

    This works well when nr_weight is a power of two. For instance, if we have 16 buckets:

    idx = hash & 15
    idx = hash & 0b1111
    

    However, when the user uses a bucket count that is not a power of two, this breaks down. For instance, if we have 15 buckets:

    idx = hash & 14
    idx = hash & 0b1110
    

    This would mask out all odd indices. We fix this by using the modulus instead. To preserve compatibility with existing models, this change is only added to SparseLinear_v2.

    bug feat / layers 
    opened by danieldk 16
  • Add assert that size hasn't changed for reduce mean backprop

    Add assert that size hasn't changed for reduce mean backprop

    Behavior of this with numpyops seems wrong, as instead of giving an error it produces nans, as though it's accessing out of bounds memory or something. See https://github.com/explosion/spaCy/pull/9669.

    Some more general issues / questions around this:

    1. Do we want to add this kind of check for all cases where appropriate?
    2. Is there a better place (like in numpyops) for this check or an equivalent check?
    3. Other asserts in Thinc seem to not use messages, should the message here be removed?
    feat / layers feat / ux 
    opened by polm 14
  • numpy.ndarray size changed   Expected 88 from C header, got 80 from PyObject

    numpy.ndarray size changed Expected 88 from C header, got 80 from PyObject

    error showed in terminal:
    File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

    numpy with version 1.19.0 is installed in my server Could you please show me how to solve this problem ? thx!

    install 
    opened by hzg456 13
  • from_dlpack in spacy-transformers

    from_dlpack in spacy-transformers

    The test suite for spacy-transformers breaks in my local environment since PR https://github.com/explosion/thinc/pull/686

    numpy                             1.23.1
    spacy                             3.4.1        (current master)
    spacy-transformers                1.1.2        (current master)
    thinc                             8.1.0        (current master)
    torch                             1.9.0+cu111
    

    e.g. when I run test_model_sequence_classification, it crashes at https://github.com/explosion/thinc/blob/master/thinc/util.py#L365:

    >           torch_tensor = torch.utils.dlpack.from_dlpack(xp_tensor)
    E           RuntimeError: from_dlpack received an invalid capsule. Note that DLTensor capsules can be consumed only once, so you might have already constructed a tensor from it once.
    

    But, when I look at the code behind from_dlpack and run this directly (having confirmed my system doesn't hit the if statement checking the device):

    dlpack = xp_tensor.__dlpack__()
    torch_tensor = torch._C._from_dlpack(dlpack)
    

    Then the test flies through.

    Can anyone reproduce the failing tests on the spacy-transformers test suite with a relatively new numpy?

    opened by svlandeg 10
  • Cross entropy fix

    Cross entropy fix

    This PR makes the CategoricalCrossentropy loss more strict only allowing guesses and truths that represent exclusive classes and it fixes the computation of the loss value. It also changes all the tests that involve the CategoricalCrossentropy and SequenceCategoricalCrossentropy.

    It also separates CategoricalCrossentropy from SparseCategoricalCrossentropy and CategoricalCrossentropy takes only Floats2d as truths whereas SparseCategoricalCrossentropy takes Union[Ints1d, Sequence[int], Sequence[str]].

    It also adds a way of handling legacy modules in thinc.

    bug feat / loss 
    opened by kadarakos 10
  • Using list2padded for sequences

    Using list2padded for sequences

    I have referred to https://github.com/explosion/thinc/blob/master/examples/03_pos_tagger_basic_cnn.ipynb to get a better understanding of the thinc layers. The model is as follows in the example

    model = strings2arrays() >> with_array(
            HashEmbed(nO=width, nV=vector_width, column=0)
            >> expand_window(window_size=1)
            >> ReLu(nO=width, nI=width * 3)
            >> ReLu(nO=width, nI=width)
            >> Softmax(nO=nr_classes, nI=width)
        )
    

    Can someone please explain what a string2arrays does? The documentation says that it takes a sequence of sequence of string and produces a list[int2D].

    The input X_train is something like [["this","is","awesome"],["thinc","is","cool"]]. What does the strings2arrays transform this example to? I am unable to wrap my head around what exactly strings2arrays does and how it transforms the input from a List[List] (2D lists) to List[Int2D] technically (3D lists/sequence).

    feat / layers 
    opened by naveenjafer 10
  • Fixes for slow tests

    Fixes for slow tests

    Will leave this in draft mode until explosionbot is updated to support the Buildkite thinc slow test suite.

    Requirements

    • explosion/explosion-bot#3
    • #672
    tests 
    opened by shadeMe 9
  • Fix reductions when applied to zero-length sequences

    Fix reductions when applied to zero-length sequences

    • A bug was introduced in NumpyOps that caused the output pointer not to be incremented for zero-length sequences.
    • When using uninitialized arrays, the sum or mean for a zero-length array was not correctly set to zero in Ops.

    We should also do something about zero-length sequences for reduce_max. However, it's unclear what the corresponding which should be set to (since 0 is not a valid index for a zero-sized sequence). Maybe we should throw an exception when trying to apply reduce_max to a zero-length sequence?

    bug feat / ops 
    opened by danieldk 9
  • Fix required maximum version of typing-extensions

    Fix required maximum version of typing-extensions

    This PR fixes the required maximum version of typing-extensions.

    Currently it is bounded to <4.2.0: typing_extensions>=3.7.4.1,<4.2.0; python_version < "3.8"

    This PR sets the upper bound to all compatible versions, until the next major release <5.0.0.

    Required:

    • [ ] https://github.com/explosion/confection/pull/20

    See:

    • https://github.com/explosion/spaCy/issues/12034

    See issue in pydantic:

    • https://github.com/pydantic/pydantic/issues/4885

    See fixing PR in pydantic (typing-extensions>=4.2.0), which will be incompatible with your requirement typing_extensions>=3.7.4,<4.2.0; python_version < "3.8":

    • https://github.com/pydantic/pydantic/pull/4886
    install 
    opened by albertvillanova 2
  • add class weights

    add class weights

    I closed the old class-weights PR #614, because it was way outdated compared to the completely refactored v9 cross-entropy. This new PR adds the same functionality and a bit more flexibility.

    All cross-entropy loss functions take a class_weights now, which is is used to compute sample_weights for each sample i.e.: the gradient and loss for each sample is multiplied by the class_weight corresponding to the target class.

    The CategoricalCrossentropy can take class_weights as a Dict[int, float] or a Floats1d and if its a Dict it checks the types and converts to Floats1d. The SparseCategoricalCrossentropy in addition also allows for Dict[str, float] in which case it requires names to be provided and checks against them.

    It also includes a convenience function to compute class_weights based on a commonly used formula e.g. also used in sklearn. I also went for a dynamic approach here where it takes label_data: Dict[Hashable, int], where the keys are the labels and values are the label counts. It returns Dict[Hashable, float] the weight for each label.

    Its not expected for class_weights to always be useful, but can be helpful sometimes when there is label imbalance and the relevant performance metric is balance sensitive. I think it makes spaCy a more complete library if we can engage this functionality.

    feat / loss 
    opened by kadarakos 2
  • Smooth one hot fix

    Smooth one hot fix

    The smooth_one_hot function in thinc.util is being called in the CategoricalCrossentropy to apply label smoothing. This PR fixes the check for valid label-smoothing parameter, which was fixed before for to_categorical, but not for smooth_one_hot. Also added a test for smooth_one_hot which was also missing.

    feat / loss 
    opened by kadarakos 1
  • `with_flatten`: make the layer input/output types symmetric

    `with_flatten`: make the layer input/output types symmetric

    The idea of the with_flatten layer is that it flattens a nested sequence, passes it to the wrapped layer, and then unflattens the output of the wrapped layer.

    However, the layer was asymmetric in that it passes a list to the wrapped layer, but expects back an XP array. This breaks composition with other Thinc layers, such as with_array.

    This change makes with_flatten symmetric, in that the inputs/outputs of the with_flatten and the wrapped layer are symmetric.

    It seems that this layer is not used in Thinc or spaCy, so maybe it never worked correctly? At any rate, I needed to flatten a nested list in distillation with with_flatten(with_array(...)) in distillation and found that it doesn't actually work.

    opened by danieldk 0
  • Make resizable layer work with textcat and transformers

    Make resizable layer work with textcat and transformers

    This PR is an attempted fix of the issue outlined in https://github.com/explosion/spaCy/issues/11968. I don't completely understand how the resizable textcat is supposed to work yet, and am not sure if this is the right approach, but locally it got rid of errors and I was able to train a model with reasonable performance.

    There are also no tests yet.

    bug feat / layers feat / transformer 
    opened by polm 3
Releases(v8.1.6)
  • v8.1.6(Dec 20, 2022)

    ✨ New features and improvements

    • Update to mypy 0.990 (#801).
    • Extend to wasabi v1.1 (#813).
    • Add SparseLinear.v2, to fix indexing issues (#754).
    • Add TorchScriptWrapper_v1 (#802).
    • Add callbacks to facilitate lazy-loading models in PyTorchShim (#796).
    • Make all layer defaults serializable (#808).

    🔴 Bug fixes

    • Add missing packaging requirement (#799).
    • Correct sequence length error messages for reduce_first/last (#807).
    • Update CupyOps.asarray to always copy cupy arrays to the current device (#812).
    • Fix types for sequences passed to Ops.asarray* (#819).

    👥 Contributors

    @adrianeboyd, @danieldk, @frobnitzem, @honnibal, @ines, @richardpaulhudson, @ryndaniels, @shadeMe, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.1.5(Oct 19, 2022)

    ✨ New features and improvements

    • Updates and binary wheels for Python 3.11 (#793).
    • Make __all__ static to support type checking (#780).

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @rmitsch

    Source code(tar.gz)
    Source code(zip)
  • v7.4.6(Oct 18, 2022)

    ✨ New features and improvements

    • Updates for Python 3.10 and 3.11 (#791):
      • Update vendored wrapt to v1.14.1.
      • Update dev requirements.
      • Add wheels for Python 3.10 and 3.11.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines

    Source code(tar.gz)
    Source code(zip)
  • v8.1.4(Oct 12, 2022)

    🔴 Bug fixes

    • Fix issue #785: Revert change to return type for Ops.alloc from #779.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.1.3(Oct 7, 2022)

    ✨ New features and improvements

    • Extend pydantic support to v1.10.x (#778).
    • Support mypy 0.98x, drop mypy support for Python 3.6 (#776).

    🔴 Bug fixes

    • Fix issue #775: Fix fix_random_seed entry point in setup.cfg.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @pawamoy, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.1.2(Sep 27, 2022)

    ✨ New features and improvements

    • Update CuPy extras to add cuda116, cuda117, cuda11x and cuda-autodetect, which uses the new cupy-wheel package (#740).
    • Add a pytest-randomly entry point for fix_random_seed (#748).

    🔴 Bug fixes

    • Fix issue #772: Restrict supported blis versions to ~=0.7.8 to avoid bugs in BLIS 0.9.0.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @rmitsch, @svlandeg, @willfrey

    Source code(tar.gz)
    Source code(zip)
  • v8.1.1(Sep 9, 2022)

    ✨ New features and improvements

    • Use confection for configurations (#745).
    • Add the Dish activation function and layer (#719).
    • Add the with_signpost_interval layer to support layer profiling with macOS Instruments (#711).
    • Add remap_ids.v2 layer which allows more types of inputs (#726).
    • Extend BLIS support to version 0.9.x (#736).
    • Improve performance when gradient scaling is used (#746).
    • Improve MaxOut performance by unrolling argmax in maxout (#702).

    🔴 Bug fixes

    • Fix issue #720: Improve type inference by replacing FloatsType in Ops by a TypeVar.
    • Fix issue #739: Fix typing of Ops.asarrayDf methods.
    • Fix issue #757: Improve compatibility with supported Tensorflow versions.

    👥 Contributors

    @adrianeboyd, @cclauss, @danieldk, @honnibal, @ines, @kadarakos, @polm, @rmitsch, @shadeMe

    Source code(tar.gz)
    Source code(zip)
  • v8.1.0(Jul 8, 2022)

    ✨ New features and improvements

    • Added support for mypy 0.950 and pydantic v1.9.0, added bound types throughout layers and ops (#599).
    • Made all NumpyOps CPU kernels generic (#627).
    • Made all custom CUDA kernels generic (#603).
    • Added bounds checks for NumpyOps (#618).
    • Fixed out-of-bounds writes in NumpyOps and CupyOps (#664).
    • Reduced unnecessary zero-init allocations (#632).
    • Fixed reductions when applied to zero-length sequences (#637).
    • Added NumpyOps.cblas to get a table of C BLAS functions (#643, #700).
    • Improved type-casting in NumpyOps.asarray (#656).
    • Simplified CupyOps.asarray (#661).
    • Fixed Model.copy() for layers used more than once (#659).
    • Fixed potential race in Shim (#677).
    • Convert numpy arrays using dlpack in xp2tensorflow and xp2torch when possible (#686).
    • Improved speed of HashEmbed by avoiding large temporary arrays (#696).
    • Added Ops.reduce_last and Ops.reduce_first (#710).
    • Numerous test suite improvements.
    • Experimental: Add support for Metal Performance Shaders with PyTorch nightlies (#685).

    🔴 Bug fixes

    • Fix issue #707: Fix label smoothing threshold for to_categorical.

    ⚠️ Backwards incompatibilities

    • In most cases the typing updates allow many casts and ignores to be removed, but types may also need minor modifications following the updates for mypy and pydantic.

    • get_array_module now returns None for non-numpy/cupy array input rather than returning numpy by default.

    • The prefer_gpu and require_gpu functions no longer set the default PyTorch torch.Tensor type to torch.cuda.FloatTensor. This means that wrapped PyTorch models cannot assume that Tensors are allocated on a CUDA GPU after calling these functions. For example:

      # Before Thinc v8.1.0, this Tensor would be allocated on the GPU after
      # {prefer,require}_gpu. Now it will be allocated as a CPU tensor by default.
      token_mask = torch.arange(max_seq_len)
      
      # To ensure correct allocation, specify the device where the Tensor should be allocated. 
      # `input` refers to the input of the model. 
      token_mask = torch.arange(max_seq_len, device=input.device) 
      

      This change brings Thinc's behavior in line with how device memory allocation is normally handled in PyTorch.

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @kadarakos, @koaning, @richardpaulhudson, @shadeMe, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.17(Jun 2, 2022)

    ✨ New features and improvements

    • Extend support for typing_extensions up to v4.1.x (for Python 3.7 and earlier).
    • Various fixes in the test suite.

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @shadeMe

    Source code(tar.gz)
    Source code(zip)
  • v8.0.16(May 19, 2022)

    ✨ New features and improvements

    🔴 Bug fixes

    • Fix issue #624: Support CPU inference for models trained with gradient scaling.
    • Fix issue #633: Fix invalid indexing in Beam when no states have valid transitions.
    • Fix issue #639: Improve PyTorch Tensor handling in CupyOps.asarray.
    • Fix issue #649: Clamp inputs in Ops.sigmoid to prevent overflow.
    • Fix issue #651: Fix type safety issue with model ID assignment.
    • Fix issue #653: Correctly handle Tensorflow GPU tensors in tests.
    • Fix issue #660: Make is_torch_array work without PyTorch installed.
    • Fix issue #664: Fix out of-bounds writes in CupyOps.adam and NumpyOps.adam.

    ⚠️ Backwards incompatibilities

    • The init implementations for layers no longer return Model.

    📖 Documentation and examples

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @kadarakos, @koaning, @notplus, @richardpaulhudson, @shadeMe

    Source code(tar.gz)
    Source code(zip)
  • v8.0.15(Mar 15, 2022)

  • v8.0.14(Mar 14, 2022)

    ✨ New features and improvements

    🔴 Bug fixes

    • Fix issue #552: Do not backpropagate Inf/NaN out of PyTorch layers when using mixed-precision training.
    • Fix issue #578: Correctly cast the threshold argument of CupyOps.mish and correct an equation in Ops.backprop_mish.
    • Fix issue #587: Correct invariant checks in CategoricalCrossentropy.get_grad.
    • Fix issue #592: Update murmurhashrequirement.
    • Fix issue #594: Do not sort positional arguments in Config.

    ⚠️ Backwards incompatibilities

    • The out keyword argument of Ops.mish and Ops.backprop_mish is replaced by inplace for consistency with other activations.

    📖Documentation and examples

    👥 Contributors

    @adrianeboyd, @andrewsi-z, @danieldk, @honnibal, @ines, @Jette16, @kadarakos, @kianmeng, @polm, @svlandeg, @thatbudakguy

    Source code(tar.gz)
    Source code(zip)
  • v8.0.12(Oct 28, 2021)

    🔴 Bug fixes

    • Fix issue #553: Switch torch tensor type with set_ops and use_ops.
    • Fix issue #554: Always restore original ops after use_ops.

    👥 Contributors

    @adrianeboyd, @danieldk, @ryndaniels, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.11(Oct 20, 2021)

    ✨ New features and improvements

    • Speed up GPU training time with up to ~25% by using cuBLAS for computing Frobenius norms in gradient clipping.
    • Give preference to AppleOps (if available) when calling get_ops("cpu").
    • Support missing values in CategoricalCrossEntropy when the labels are integers.
    • Provide the option to run model.walk with depth-first traversal.
    • Wrap forward/init callbacks of a Model in with_debug and with_nvtx_range to facilitate recursively instrumenting models.

    🔴 Bug fixes

    • Fix issue #537: Fix replace_node on nodes with indirect node refs.

    👥 Contributors

    @adrianeboyd, @danieldk, @honnibal, @ines, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.10(Sep 7, 2021)

  • v8.0.9(Sep 3, 2021)

    ✨ New features and improvements

    • Add ops registry.
    • Enable config overrides to add new keys.
    • Allow newer releases of nbconvert and nbformat.
    • Layer for marking NVTX ranges.
    • Support mixed-precision training in the PyTorch shim (experimental).

    🔴 Bug fixes

    • Fix issue #521: Fix numpy_ops gemm output.
    • Fix issue #525: Fix mypy plugin crash on variadic arguments.

    👥 Contributors

    @adrianeboyd, @connorbrinton, @danieldk, @honnibal, @ines, @svlandeg

    Source code(tar.gz)
    Source code(zip)
  • v8.0.8(Jul 19, 2021)

  • v8.0.7(Jul 1, 2021)

  • v8.0.6(Jul 1, 2021)

  • v8.0.5(Jun 16, 2021)

  • v8.0.4(Jun 11, 2021)

    ✨ New features and improvements

    • Add tuplify layer.
    • More generic implementation of the concatenate layer.
    • Add resizable layer.
    • Introduce force parameter for model.set_dim().
    • Improve UX when setting the GPU allocator.

    🔴 Bug fixes

    • Fix issue #492: Fix backpropagation in with_getitem.
    • Fix issue #494: Resolve forward refs issue with Pydantic.
    • Fix issue #496: Avoid Pydantic versions with security vulnerabilities.

    👥 Contributors

    @adrianeboyd, @honnibal, @ines, @kludex, @polm, @svlandeg, @thomashacker

    Source code(tar.gz)
    Source code(zip)
  • v8.0.3(Apr 19, 2021)

    🔴 Bug fixes

    • Fix issue #486: Fix expand_window for empty docs on GPU
    • Fix issue #487: Require catalogue>=2.0.3 due to performance regressions related to importlib-metadata
    • Fix issue #488: Fix config override & interpolate interaction
    Source code(tar.gz)
    Source code(zip)
  • v8.0.2(Mar 9, 2021)

    ✨ New features and improvements

    • Add map_list layer (#472)

    🔴 Bug fixes

    • Fix issue #465: Fix saving models to Pathy paths
    • Fix issue #466: Avoid initializing with Y if X is set
    • Fix issue #470: Reset torch tensor type in require_cpu
    • Fix issue #484: Ensure consistency of nO dim for BiLSTM
    Source code(tar.gz)
    Source code(zip)
  • v8.0.1(Mar 9, 2021)

  • v8.0.0(Jan 24, 2021)

    🔮 This version of Thinc has been rewritten from the ground up and will be used to power the upcoming spaCy v3.0. The new Thinc v8.0 is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow or MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models. For more details, see the documentation.

    ✨ New features and improvements

    • Use any framework: Switch between PyTorch, TensorFlow and MXNet models without changing your application, or even create mutant hybrids using zero-copy array interchange.
    • Type checking: Develop faster and catch bugs sooner with sophisticated type checking. Trying to pass a 1-dimensional array into a model that expects two dimensions? That’s a type error. Your editor can pick it up as the code leaves your fingers.
    • Config system: Configuration is a major pain for ML. Thinc lets you describe trees of objects with references to your own functions, so you can stop passing around blobs of settings. It's simple, clean, and it works for both research and production.
    • Super lightweight: Small and easy to install with very few required dependencies, available on pip and conda for Linux, macOS and Windows. Simple source with a consistent API.
    • Concise functional-programming approach to model definition using composition rather than inheritance.
    • First-class support for variable-length sequences: multiple built-in sequence representations and your layers can use any object.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.5(Dec 11, 2020)

  • v7.4.4(Dec 10, 2020)

    🔴 Bug fixes

    • Update for compatibility with cupy v8.
    • Remove f-strings from PyTorchWrapper.
    • Remove detailed numpy build constraints from pyproject.toml.
    • Update Cython extension setup.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.3(Dec 10, 2020)

    ✨ New features and improvements

    • Add seed argument to ParametricAttention.
    • Dynamically include numpy headers and add numpy build constraints.
    • Update tests to support hypothesis v5.

    🔴 Bug fixes

    • Fix memory leak in Beam.
    Source code(tar.gz)
    Source code(zip)
  • v7.4.2(Dec 10, 2020)

  • v7.4.1(May 24, 2020)

    🔴 Bug fixes

    • Use 0-vector for OOV in StaticVectors to fix similarity bug in spaCy
    • Fix murmurhash on platforms where long type was not 64 bit
    Source code(tar.gz)
    Source code(zip)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

null 1 Nov 27, 2021
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M

Aniket Maurya 210 Dec 21, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
Let Python optimize the best stop loss and take profits for your TradingView strategy.

TradingView Machine Learning TradeView is a free and open source Trading View bot written in Python. It is designed to support all major exchanges. It

Robert Roman 473 Jan 9, 2023
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

Med-AIR@CUHK 55 Jan 3, 2023
fklearn: Functional Machine Learning

fklearn: Functional Machine Learning fklearn uses functional programming principles to make it easier to solve real problems with Machine Learning. Th

nubank 1.4k Dec 7, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Qiang Meng 25 Dec 17, 2022
Python project to take sound as input and output as RGB + Brightness values suitable for DMX

sound-to-light Python project to take sound as input and output as RGB + Brightness values suitable for DMX Current goals: Get one pixel working: Vary

Bobby Cox 1 Nov 17, 2021
In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model training, like the model indices and unexpected interrupts. Then you can do something in time for your work.

GrayLee 8 Dec 27, 2022
This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

Colorizer The point of this project is to write a program capable of taking a black and white / grayscale image, and generating a realistic or plausib

Maitri Shah 1 Jan 6, 2022
TakeInfoatNistforICS - Take Information in NIST NVD for ICS

Take Information in NIST NVD for ICS This project developed with Python. When yo

null 5 Sep 5, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 2, 2023
Data from "HateCheck: Functional Tests for Hate Speech Detection Models" (Röttger et al., ACL 2021)

In this repo, you can find the data from our ACL 2021 paper "HateCheck: Functional Tests for Hate Speech Detection Models". "test_suite_cases.csv" con

Paul Röttger 43 Nov 11, 2022
Recovering Brain Structure Network Using Functional Connectivity

Recovering-Brain-Structure-Network-Using-Functional-Connectivity Framework: Papers: This repository provides a PyTorch implementation of the models ad

null 5 Nov 30, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Dec 31, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 3.8k Feb 13, 2021
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

null 4.9k Jan 3, 2023