🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Explosion

Last update: Nov 26, 2022

Related tags

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

⏳ Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

⏱ Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini (M1)	6492	27676	5
MacBook Air Core i5 2020	9790	10983	9
AMD Ryzen 5900X	22568	N/A	52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini M1 2020	3.34	10.07	5
MacBook Air Core i5 2020	3.10	3.27	10
AMD Ryzen 5900X	6.53	N/A	53

Comments

Pass through Accelerate sgemm/saxpy in Ops.cblas

This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

opened by danieldk 5

IndexError: Out of bounds on buffer access (axis 1)

Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

[2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
[2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
[2021-09-28 21:09:01,505] [INFO] Created vocabulary
[2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
Traceback (most recent call last):
  File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
    sys.exit(setup_cli())
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
    command(prog_name=COMMAND)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
    nlp = init_nlp(config, use_gpu=use_gpu)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
    proc.initialize(get_examples, nlp=self, **p_settings)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
    self.model.initialize(X=doc_sample)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
    layer.initialize(X=curr_input, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
    curr_input = layer.predict(curr_input)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
    vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
    C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
  File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
  File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
IndexError: Out of bounds on buffer access (axis 1)

Info about spaCy

spaCy version: 3.1.3
Platform: macOS-11.6-arm64-arm-64bit
Python version: 3.9.7
Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)

opened by Joozty 2

Can't compile thinc on Macbook Air M1

Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

------------------------------------------ END---------------------------------------------------------------------------

Any help would be greatly appreciated, thanks!
duplicate

opened by amal1us 1
AppleOps.gemm: write in-place when `output` is given

NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.
enhancement

opened by danieldk 0
Change thinc upper bound to <8.1.0

thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

Also bump the version to v0.0.7 to prepare for the release.

opened by danieldk 0
Fix 0-size arrays

Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

opened by honnibal 0
Require thinc with ops registry

Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

opened by adrianeboyd 0

Releases(v0.1.3)

v0.1.3(Dec 16, 2022)

Relax Thinc upper bound to <9.1.0 to support current Thinc 9.0.0 development builds.
Source code(tar.gz)
Source code(zip)
v0.1.2(Oct 17, 2022)
Updates and binary wheels for python 3.11.

Source code(tar.gz)
Source code(zip)
v0.1.1(Sep 27, 2022)
🔴 Bug fixes

Fix issue #27: Add numpy build constraints for PyPI wheels.

Source code(tar.gz)
Source code(zip)
v0.0.8(Sep 27, 2022)
🔴 Bug fixes

Fix issue #27: Add numpy build constraints for PyPI wheels.

Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 19, 2022)
✨ New features and improvements

Pass through Accelerate's saxpy/sgemm in AppleOps.cblas (#15, #21).

Write in-place in AppleOps.gemm when the output argument is given (#19).

🔴 Bug fixes

Fix issue #17: avoid cyclic imports in Thinc.

Source code(tar.gz)
Source code(zip)
v0.0.7(May 27, 2022)
Restrict Thinc to v8.0.x in preparation for Thinc v8.1.

Source code(tar.gz)
Source code(zip)
v0.0.6(May 18, 2022)
Fix issue #12: Check shape in AppleOps.gemm.

Source code(tar.gz)
Source code(zip)

Owner

Explosion

A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing

GitHub https://github.com/explosion/thinc

This is a simple SV calling package for diploid assemblies.

dipdiff This is a simple SV calling package for diploid assemblies. It uses a modified version of svim-asm. The package includes its own version minim

11 Jan 5, 2023

Cloud Native sample microservices showcasing Full Stack Observability using AppDynamics and ThousandEyes

Cloud Native Sample Bookinfo App Observability Bookinfo is a sample application composed of four Microservices written in different languages.

13 Jul 21, 2022

Python package for reference counting native pointers

refcount master: testing: This package is primarily for managing resources in native libraries, written for instance in C++, from Python. While it boi

2 Nov 3, 2022

A PowSyBl and Python integration based on GraalVM native image

PyPowSyBl The PyPowSyBl project gives access PowSyBl Java framework to Python developers. This Python integration relies on GraalVM to compile Java co

23 Dec 14, 2022

HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures

HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures.

8 Nov 9, 2022

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

342 Dec 27, 2022

Runs macOS on linux with qemu.

mac-on-linux-with-qemu Runs macOS on linux with qemu. Pre-requisites qemu-system-x86_64 dmg2img pulseaudio python[click] Usage After cloning the repos

177 Dec 26, 2022

Tomador de ramos UC automatico para Windows, Linux y macOS

auto-ramos v2.0 Tomador de ramos UC automatico para Windows, Linux y macOS Funcion Este script de Python tiene como principal objetivo hacer que la to

13 Jun 29, 2022

ColabFold / AlphaFold2_advanced on your local PC (or macOS)

LocalColabFold ColabFold / AlphaFold2_advanced on your local PC (or macOS) Installation For Linux Make sure curl and wget commands are already install

207 Dec 22, 2022

String Spy is a project aimed at improving MacOS defenses.

String Spy is a project aimed at improving MacOS defenses. It allows users to constantly monitor all running processes for user-defined strings, and if it detects a process with such a string it will log the PID, process path, and user running the process. It will also (optionally) kill the process. For certain default C2s and other malicious software, this tool can quickly log and stop malicious behavior that normal AV does not recognize, and allows for customization.

10 Dec 13, 2022

TextColor - An easy to use Python library which allows you to make your terminal outputs more colorful and therefore easier to read and understand

This is an easy to use Python library which allows you to make your terminal out

1 Feb 5, 2022

🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Related tags

Overview

thinc-apple-ops

⏳ Install

🏫 Motivation

⏱ Benchmarks

Prediction

Training

Comments

Pass through Accelerate sgemm/saxpy in Ops.cblas

IndexError: Out of bounds on buffer access (axis 1)

Info about spaCy

Can't compile thinc on Macbook Air M1

AppleOps.gemm: write in-place when `output` is given

Change thinc upper bound to <8.1.0

Fix 0-size arrays

Require thinc with ops registry

Releases(v0.1.3)

v0.1.3(Dec 16, 2022)

v0.1.2(Oct 17, 2022)

v0.1.1(Sep 27, 2022)

🔴 Bug fixes

v0.0.8(Sep 27, 2022)

🔴 Bug fixes

v0.1.0(Jul 19, 2022)

✨ New features and improvements

🔴 Bug fixes

v0.0.7(May 27, 2022)

v0.0.6(May 18, 2022)

Owner

Explosion

This is a simple SV calling package for diploid assemblies.

Cloud Native sample microservices showcasing Full Stack Observability using AppDynamics and ThousandEyes

Python package for reference counting native pointers

A PowSyBl and Python integration based on GraalVM native image

HatAsm - a HatSploit native powerful assembler and disassembler that provides support for all common architectures

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

Runs macOS on linux with qemu.

Tomador de ramos UC automatico para Windows, Linux y macOS

ColabFold / AlphaFold2_advanced on your local PC (or macOS)

String Spy is a project aimed at improving MacOS defenses.

Dump Data from FTDI Serial Port to Binary File on MacOS

Simple macOS StatusBar app to remind you to unplug your laptop when sufficiently charged

TinyBar - Tiny MacOS menu bar utility to track price dynamics for assets on TinyMan.org

[Cython] Vs [Python] Which one is Faster ?

Providing a working, flexible, easier and faster installer than the one officially provided by Arch Linux

A faster Python generator that get function results from multi-process workers

A faster copy of nell's comet nuker

Stopmagic gives you the power of creating amazing Stop Motion animations faster and easier than ever before.

TextColor - An easy to use Python library which allows you to make your terminal outputs more colorful and therefore easier to read and understand