xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Overview

PyPI PyPI - License Documentation Status CircleCI PRs Welcome

Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Getting started

The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.

Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv or conda for instance. There are two ways you can install it:

Directly from the pip package

You can also fetch the latest release from PyPi. This will not contain the wheels for the sparse attention kernels, for which you will need to build from source.

conda create --name xformer_env
conda activate xformer_env
pip install xformers

Build from source (dev mode)

These commands will fetch the latest version of the code, create a dedicated conda environment, activate it then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

git clone [email protected]:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Sparse attention kernels

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e . and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop, so similarly wipe the build folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

  • NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
  • the version of GCC that you're using matches the current NVCC capabilities
  • the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"

Triton

Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmnark_triton_softmax.py

Triton will cache the compiled kernels to /tmp/triton by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR environment variable.

Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot. If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to be able to go through : pip install -r requirements-benchmark.txt.

Once this is done, you can run this particular benchmark as follows:

python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16

Using xFormers

Transformers key concepts

Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")

You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.

Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.

Repo map

├── components                  # Parts zoo, any of which can be used directly
│   └── attention
│        └ ...                  # all the supported attentions
│   └── feedforward             #
│        └ ...                  # all the supported feedforwards
│   └─- positional_embedding    #
│        └ ...                  # all the supported positional embeddings
│   ├── activations.py          #
│   └── multi_head_dispatch.py  # (optional) multihead wrap
d├── factory
│   ├── block_factory.py        # (optional) helper to programatically generate layers
│   └── model_factory.py        # (optional) helper to programatically generate models
├── models
...                             # Full models, ready to be used
Attention mechanisms

Feed forward mechanisms

Positional embedding

Key Features

  1. Many attention mechanisms, interchangeables
  2. Optimized building blocks, beyond PyTorch primitives
    1. sparse attention
    2. block-sparse attention
    3. fused softmax
    4. fused linear layer
    5. fused layer norm
  3. Benchmarking and testing tools
    1. micro benchnmarks
    2. transformer block benchmark
    3. LRA, with SLURM suppot
  4. Programatic and sweep friendly layer and model construction
  5. Hackable
    1. Not using monolithic CUDA kernels, composable building blocks
    2. Using Triton for some optimized parts, explicit, pythonic and user-accessible

FAQ ?

We've tried to collect a relatively exhaustive list of explanations in the HOWTO

License

xFormers has a BSD-style license, as found in the LICENSE file.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2021,
  author =       {Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang },
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2021}
}
Comments
  • [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    What does this PR do?

    This was a long time in the making.. Fusing the BW part of the activation/bias/dropout kernel. Not quite perfect but in some places the speed goes really bananas (like x3 or x4 the naive calls). Fusing this implied flipping the whole problem upside down, basically the seeds have to be per collum, and the kernels (fw and bw) also work that way. This allows us to fuse the bias gradient computations, since it's a sum over that direction

    TODO:

    • [x] add more unit tests to check that the dropout drops are respected on average
    • [x] possibly make sure that the rand mask does not repeat (may or may not be a big deal). Ok this is doable by making the kernels cooperate on the same col, like Phil does on LayerNorm
    • [x] improve on the scheduling for small buffers
    • [x] Fix the atomic add funkiness (works for now but this does not look completely right, num_warps dependent)

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 26
  • Support Windows and ideally build wheels for it

    Support Windows and ideally build wheels for it

    🚀 Feature

    Supporting Windows in xformers.

    Motivation

    xformers provide excellent tools to increase the speed of inference, for example close to 2x in Stable Diffusion. Sadly, it lacks Windows support. This has barred us from using it on https://github.com/AUTOMATIC1111/stable-diffusion-webui as most users and developers (including myself) use Windows.

    Pitch

    Currently, xformers will fail to compile on Windows for a multitude of errors, some of which are trivial but most are not. Enabling Windows usage by fixing these errors and ideally distributing Windows wheels would allow projects to make xformers a necessary requirement & use it.

    Alternatives

    Additional context

    cc. @fmassa

    opened by C43H66N12O12S2 23
  • triton 2.0 changes

    triton 2.0 changes

    What does this PR do?

    Fixes triton to work with version 2.0.0.

    TODOs:

    • [x] Move the syntax to triton2
    • [x] Fix fused dropout
    • [ ] Fix the blocksparse op API having changed
    • [x] Fix fused linear layer
    • [x] Update the benchmarks

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 23
  • Pip installation fails, `CUTCLASS` not found

    Pip installation fails, `CUTCLASS` not found

    🐛 Bug

    pip installation fails in a docker container, CUTCLASS not found, git submodule update --init --recursive not executed

    To Reproduce

    Dockerfile

    FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    RUN pip install xformers
    

    then

    docker build .
    

    Error Trace

    open
    #1 [internal] load build definition from Dockerfile
    #1 sha256:bc3772a9760c6470030d3506e7afa0b9caa2a77f63376fe30fc296a334d5c980
    #1 transferring dockerfile: 116B done
    #1 DONE 0.0s
    
    #2 [internal] load .dockerignore
    #2 sha256:5b674e66e988c8852edbf605c0d0921ac6eed40841cd55d9112e0d92242091a1
    #2 transferring context: 2B done
    #2 DONE 0.0s
    
    #3 [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    #3 sha256:409f78a4f3551ef4b6d7a4b064ff72bb54f0677d599351b4d0dcdff08b926834
    #3 DONE 0.8s
    
    #4 [1/2] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75
    #4 sha256:2e3e89abd93f2e7b42b070196f0e6be4ce38a2d360c98232440e1d90189bdb02
    #4 CACHED
    
    #5 [2/2] RUN pip install xformers
    #5 sha256:ef3133015f56a22d509f2aa1ef730afdcaa2591838105ba332650ff73ceb9ff9
    #5 1.012 Collecting xformers
    #5 1.313   Downloading xformers-0.0.13.tar.gz (292 kB)
    #5 1.429      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 2.6 MB/s eta 0:00:00
    #5 1.534   Preparing metadata (setup.py): started
    #5 2.952   Preparing metadata (setup.py): finished with status 'error'
    #5 2.961   error: subprocess-exited-with-error
    #5 2.961
    #5 2.961   × python setup.py egg_info did not run successfully.
    #5 2.961   │ exit code: 1
    #5 2.961   ╰─> [8 lines of output]
    #5 2.961       Traceback (most recent call last):
    #5 2.961         File "<string>", line 36, in <module>
    #5 2.961         File "<pip-setuptools-caller>", line 34, in <module>
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 239, in <module>
    #5 2.961           ext_modules=get_extensions(),
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 158, in get_extensions
    #5 2.961           "CUTLASS submodule not found. Did you forget "
    #5 2.961       RuntimeError: CUTLASS submodule not found. Did you forget to run `git submodule update --init --recursive` ?
    #5 2.961       [end of output]
    #5 2.961
    #5 2.961   note: This error originates from a subprocess, and is likely not a problem with pip.
    #5 2.965 error: metadata-generation-failed
    #5 2.965
    #5 2.965 × Encountered error while generating package metadata.
    #5 2.965 ╰─> See above for output.
    #5 2.965
    #5 2.965 note: This is an issue with the package mentioned above, not pip.
    #5 2.965 hint: See above for details.
    #5 ERROR: executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    ------
     > [2/2] RUN pip install xformers:
    ------
    executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    

    Expected behavior

    installation should work.

    Environment

    in the container, running docker on windows

    open
    PyTorch version: 1.12.1
    Is debug build: False
    CUDA used to build PyTorch: 11.3
    ROCM used to build PyTorch: N/A 
    
    OS: Ubuntu 18.04.6 LTS (x86_64) 
    GCC version: Could not collect  
    Clang version: Could not collect
    CMake version: Could not collect
    Libc version: glibc-2.17
    
    Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
    Is CUDA available: True
    CUDA runtime version: Could not collect
    GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060
    Nvidia driver version: 517.48
    cuDNN version: Could not collect
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] numpy==1.21.5
    [pip3] torch==1.12.1
    [pip3] torchtext==0.13.1
    [pip3] torchvision==0.13.1
    [conda] blas                      1.0                         mkl
    [conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
    [conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
    [conda] mkl-service               2.4.0            py37h7f8727e_0
    [conda] mkl_fft                   1.3.1            py37hd3c417c_0
    [conda] mkl_random                1.2.2            py37h51133e4_0
    [conda] numpy                     1.21.5           py37he7a7128_2
    [conda] numpy-base                1.21.5           py37hf524024_2
    [conda] pytorch                   1.12.1          py3.7_cuda11.3_cudnn8.3.2_0    pytorch
    [conda] pytorch-mutex             1.0                        cuda    pytorch
    [conda] torchtext                 0.13.1                     py37    pytorch
    [conda] torchvision               0.13.1               py37_cu113    pytorch
    

    Additional context

    I don't think this problem has anything to do with os/python/pytorch/cuda/nvcc versions, the setup.py seems to be tailored for local / manual install, and fails in the context.

    opened by AbdBarho 20
  • [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    This PR removes the repeated chunk and cat operations in xformers' RevNet code. This way, the RevNet implementation will become a little bit faster.
    I'd strongly recommend calling a library like MemCNN or RevLib directly as they make it easier to switch the coupling function and generally give the user more freedom.

    Unfortunately, I can't sign the CLA at the moment, as it keeps saying

    Sorry, something went wrong. We're working on getting this fixed as soon as we can.

    CLA Signed 
    opened by ClashLuke 20
  • Added SmeLU

    Added SmeLU

    What does this PR do?

    Fixes #262 .

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [x] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 17
  • [chore] release v0.0.13

    [chore] release v0.0.13

    What does this PR do?

    bump the dev version number to be able to release v0.0.13, see #402

    Before submitting

    • [ ] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • [feat] Compositional attention

    [feat] Compositional attention

    What does this PR do?

    Implements Compositional Attention (based on the reference implementation), as mentioned in https://github.com/facebookresearch/xformers/issues/41

    Paper

    TODOs

    • [x] Sane defaults
    • [x] Speedup wherever possible. Looks like it takes a lot of memory also at the moment, probably some dummy mistakes
    • [x] Maybe self-attention optimization (single proj) -> doable if moving the projections within the attention to the inproj class, worth it?
    • [x] Add a lot of explanations/documentations
    • [0] Some IR results ? -> that would be for another task probably ?

    cc @sarthmit if interested

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [x] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • Is xformers still not support cuda 12.0?

    Is xformers still not support cuda 12.0?

    ❓ Questions and Help

    I got the following error while installing...

    Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Obtaining file:///F:/Stable_Diffusion/stable-diffusion-webui-master/repositories/xformers Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [9 lines of output] No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 293, in symlink_package( File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 83, in symlink_package os.symlink(src=path_from, dst=path_to) OSError: [WinError 1314] A required privilege is not held by the client: 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\third_party\flash-attention\flash_attn' -> 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\xformers_flash_attn' [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

    × Encountered error while generating package metadata. ╰─> See above for output.

    Is it for cuda 12. Should I degrade the cuda version?

    or what is the problem, can anyone help?

    opened by debdip 13
  • Cannot install xformers on linux server

    Cannot install xformers on linux server

    ❓ Questions and Help

    When I tried either pip install or build from source, I get this issue:

     × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [18 lines of output]
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "/home/username/xformers/setup.py", line 239, in <module>
              ext_modules=get_extensions(),
            File "/home/username/xformers/setup.py", line 187, in get_extensions
              cuda_version = get_cuda_version(CUDA_HOME)
            File "/home/username/xformers/setup.py", line 51, in get_cuda_version
              raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True)
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 424, in check_output
              return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 505, in run
              with Popen(*popenargs, **kwargs) as process:
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 951, in __init__
              self._execute_child(args, executable, preexec_fn, close_fds,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 1821, in _execute_child
              raise child_exception_type(errno_num, err_msg, err_filename)
          FileNotFoundError: [Errno 2] No such file or directory: '/home/username/anaconda3/envs/test_env/bin/nvcc'
          [end of output]
    

    here's the output of nvcc --version

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Tue_Mar__8_18:18:20_PST_2022
    Cuda compilation tools, release 11.6, V11.6.124
    Build cuda_11.6.r11.6/compiler.31057947_0
    

    and as additional information, I was able to install pytorch the usual way/verify that cuda is available.

    opened by fedshyvana 13
  • Encoder decoder arch doesnt work when sequence lengths are different

    Encoder decoder arch doesnt work when sequence lengths are different

    🐛 Bug

    I get an error when the sequence lengths to the encoder and decoder are different, e.g. in the code snippet below:

    Command

    EMB = 384
    SEQ_ENC = 128
    SEQ_DEC = 64
    BATCH = 16
    VOCAB = 64
    
    my_config = [
        # A list of the encoder or decoder blocks which constitute the Transformer.
        # Note that a sequence of different encoder blocks can be used, same for decoders
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
                "block_type": "encoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_ENC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "linformer",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": False,
                        "seq_len": SEQ_ENC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
    
                "block_type": "decoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_DEC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config_masked": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "nystrom",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "multi_head_config_cross": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "favor",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
    ]
    
    # This part of xFormers is entirely type checked and needs a config object,
    # could be changed in the future
    config = xFormerConfig(my_config)
    model = xFormer.from_config(config)
    
    #  Test out with dummy inputs
    src = (torch.rand((BATCH, SEQ_ENC)) * VOCAB).abs().to(torch.int)
    tgt = (torch.rand((BATCH, SEQ_DEC)) * VOCAB).abs().to(torch.int)
    y = model(src=src, tgt=tgt)
    
    print(y.shape)
    

    Expected behavior

    torch.Size([16, 64, 384])
    

    however, I get:

    RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [64, 128, 96, 96]->[64, 128, 96, 96] [64, 64, 96]->[64, 64, 1, 96]
    
    ongoing 
    opened by kashif 13
  • How to set random seeds fixed

    How to set random seeds fixed

    ❓ Questions and Help

    Different results occur when I run the same code twice. And the set_seed func run before all.

    def set_seed(seed):
        random.seed(seed)  # Python random module.
        np.random.seed(seed)  # Numpy module.
        set_seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)
        torch.random.manual_seed(seed)
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
        torch.backends.cudnn.benchmark = cudnn_benchmark
        torch.backends.cudnn.deterministic = cudnn_deterministic
    
    opened by scp92 1
  • Allowing decoder only definition

    Allowing decoder only definition

    🚀 Feature

    Allow only a decoder config to be defined.

    Motivation

    I want to define only a decoder and pass in a memory vector from another source.

    Pitch

    I tried this change locally and it allows me to do what I want it to do: https://github.com/facebookresearch/xformers/compare/main...nh2liu:xformers:patch-1

    Not sure if this has extending implications because it seems this code has been around for a while but the comments # If decoder: either use the encoder ouput, or just decode, both options are possible indicate that this may be a bug.

    Alternatives

    • NOOP encoder will also allow this functionality.
    opened by nh2liu 2
  • build from source failed

    build from source failed

    🐛 Bug

    Command

    pip install ninja pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

    ERROR INFO

    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:650:66:   required from ‘void _GLOBAL__N__7fac2228_12_attention_cu_724ba955_12677::launch_attention(at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, float, at::PhiloxCudaState) [with bool compute_logsumexp = true]’
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:793:92:   required from here
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:612:58: warning: ‘at::GenericPackedTensorAccessor<T, N, PtrTraits, index_t> at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 3; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations]
      612 |     return attn_bias.packed_accessor<scalar_t, 3>();
          |                                                          ^
    /usr/local/lib/python3.8/dist-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
      247 |   GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() const & {
          | ^ ~~~~~~~~~~~~~
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1901, in _run_ninja_build
        subprocess.run(
      File "/usr/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py", line 301, in <module>
        setuptools.setup(
      File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py", line 68, in run
        return orig.install.run(self)
      File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
        build_ext.build_extensions(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
        _build_ext.build_extension(self, ext)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    Running setup.py install for xformers: finished with status 'error'
    

    ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"'; file='"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-lj6j_c0s/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/oppoer/.local/include/python3.8/xformers Check the logs for full command output. WARNING: You are using pip version 21.2.4; however, version 22.3.1 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

    Environment

    My docker image is: nvcr.io/nvidia/pytorch:22.06-py3

    Collecting environment information... PyTorch version: 1.13.0a0+936e930 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

    OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.24.1 Libc version: glibc-2.31

    Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 470.129.06 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

    Versions of relevant libraries: [pip3] functorch==1.13.0a0+936e930 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.13.0a0+936e930 [pip3] torch-tensorrt==1.3.0a0 [pip3] torchtext==0.13.0a0+fae8e8c [pip3] torchvision==0.15.0a0 [conda] Could not collect

    opened by GxjGit 7
  • Unable to Build from latest

    Unable to Build from latest

    🐛 Bug

    Command

    cd xformers
    git pull
    git submobule update --recursive --remote
    pip install -e .
    

    To Reproduce

    Steps to reproduce the behavior:

    1. pull latest from git, (at hash f82722f61f972c02ebc54431e3e4717f21b3e9b9)
    2. pull latest submodules
    3. build

    Expected behavior

    Building to run successfully

    Environment

    Collecting environment information...
    PyTorch version: 1.12.1+cu116
    Is debug build: False
    CUDA used to build PyTorch: 11.6
    ROCM used to build PyTorch: N/A
    
    OS: Ubuntu 20.04.5 LTS (x86_64)
    GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    Clang version: 10.0.0-4ubuntu1
    CMake version: version 3.25.0
    Libc version: glibc-2.31
    
    Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
    Is CUDA available: True
    CUDA runtime version: 11.6.124
    GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060 SUPER
    Nvidia driver version: 526.47
    cuDNN version: Probably one of the following:
    /usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] mypy-extensions==0.4.3
    [pip3] numpy==1.23.2
    [pip3] pytorch-lightning==1.7.5
    [pip3] torch==1.12.1+cu116
    [pip3] torchaudio==0.12.1+cu116
    [pip3] torchdynamo==1.12.0
    [pip3] torchmetrics==0.9.3
    [pip3] torchvision==0.13.1+cu116
    [conda] numpy                     1.23.2                   pypi_0    pypi
    [conda] pytorch-lightning         1.7.5                    pypi_0    pypi
    [conda] torch                     1.12.1+cu116             pypi_0    pypi
    [conda] torchaudio                0.12.1+cu116             pypi_0    pypi
    [conda] torchdynamo               1.12.0                   pypi_0    pypi
    [conda] torchmetrics              0.9.3                    pypi_0    pypi
    [conda] torchvision               0.13.1+cu116             pypi_0    pypi
    
    • PyTorch Version (e.g., 1.0): 1.12.1+cu116
    • OS (e.g., Linux): WSL
    • How you installed PyTorch (conda, pip, source): pip install -e .
    • Build command you used (if compiling from source): pip install -e .
    • Python version: 3.8.13
    • CUDA/cuDNN version: 11.6
    • GPU models and configuration: NVIDIA GeForce RTX 2060 SUPER
    • Any other relevant information: It worked on a previous commit

    Additional context

    Error message from compiler:

        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: namespace "cutlass::gemm::warp" has no member "WarpSize"
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: type name is not allowed
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: the global scope has no "value"
    
        3 errors detected in the compilation of "/home/jonno/xformers/xformers/csrc/attention/cuda/fmha/attention_forward_generic.cu".
        /home/jonno/anaconda3/envs/dyn/lib/python3.8/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
          warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
        error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    
    opened by JonnoFTW 3
Releases(v0.0.13)
  • v0.0.13(Sep 26, 2022)

  • v0.0.12(Aug 8, 2022)

    [0.0.12] - 2022-08-08

    Fixed

    • Removed duplicated biases in the FusedMLP layers [#317]
    • Rotary embeddings respecting input types [#326]
    • Poolformer style instantiating useless projection layers [#349]
    • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
    • Pass use_triton flag to LayerNorm module [#336]

    Added

    • Four blocksparsity layouts from DeepSpeed [#320]
    • Support several initialization options [#312]
    • Conv2DFeedforward feedforward part [#321]
    • VisualAttention [#329]
    • Automatic blocksparse for causal attention [#334]
    • Better hierarchical transformer generation [#345]
    • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
    • Refactor LRA code to use Pytorch Lightning [#343]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.11(May 30, 2022)

    [0.0.11] - 2022-05-30

    Fixed

    • Fix some torchscriptability [#246]
    • Fix FourierMix being compatible with AMP [#258]
    • Better asserts on QKV dimensions [#264]
    • Better perfs for FusedMLP and FusedLinearLayer [#283]
    • Deepnorm init missing self-attention [#284]

    Added

    • Simplicial Embeddings [#259]
    • Mem efficient attention, FW pass [#267]
    • MHA benchmark
    • MLP benchmark
    • Move all triton kernels to triton v2 [#272]
    • Mem efficient attention, BW pass [#281]
    • Metaformer support [#294]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.10(Mar 15, 2022)

    Fixed

    • Expose bias flag for feedforwards, same default as Timm [#220]
    • Update eps value for layernormm, same default as torch [#221]
    • PreNorm bugfix, only one input was normalized [#233]

    Added

    • Add DeepNet (DeepNorm) residual path and init [#227]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.9(Feb 9, 2022)

    Added

    • Compositional Attention [#41]
    • Experimental Ragged attention [#189]
    • Mixture of Experts [#181]
    • BlockSparseTensor [#202]
    • nd-tensor support for triton softmax [#210]

    Fixed

    • bugfix Favor, single feature map [#183]
    • sanity check blocksparse settings [#207]
    • fixed some pickability [#204]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.8(Jan 7, 2022)

  • v0.0.7(Nov 30, 2021)

  • v0.0.6(Nov 24, 2021)

    Fixed

    • Fix self attention optimization not being triggered, broken residual path [#119]
    • Improve speed by not using contiguous Tensors when not needed [#119]

    Added

    • Attention mask wrapper [#113]
    • ViT comparison benchmark [#117]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.5(Nov 18, 2021)

  • v0.0.4(Nov 17, 2021)

    • Fixing causality not being respected by the scaled dot product attention
    • Fixing Favor causal trainability
    • Enabling FusedLayerNorm by default if Triton is available
    • Fixing Favor with fp16
    Source code(tar.gz)
    Source code(zip)
  • v0.03(Nov 5, 2021)

  • v0.0.2(Nov 1, 2021)

    [0.0.2] - 2021-11-01

    Fixed

    • More robust blocksparse [#24]

    Added

    • Rotary embeddings [#32]
    • More flexible layernorm [#50]
    • More flexible blockfactory config (key deduplication)
    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
Flaxformer: transformer architectures in JAX/Flax

Flaxformer: transformer architectures in JAX/Flax Flaxformer is a transformer library for primarily NLP and multimodal research at Google. It is used

Google 114 Dec 29, 2022
A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

Manage your exceptions in Python like a PRO Currently in BETA. Inspired by this blog post. I shared the building process of this tool here. “For those

Guilherme Latrova 353 Dec 31, 2022
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The training loop implements the learner design pattern from fast.ai in pure PyTorch, with access to the loop provided through callbacks. Detailed logging and graphs are also provided with python logging and wandb. Additional implementations will be added.

Brando Koch 11 Mar 28, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 4.3k Feb 18, 2021
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

Facebook Research 3.2k Jan 4, 2023
Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

This repository is the official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

vanint 101 Dec 30, 2022
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

null 60 Dec 12, 2022
A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

Unpacker Karton Service A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework. This project is

c3rb3ru5 45 Jan 5, 2023
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 4, 2021
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

Dani El-Ayyass 57 Dec 7, 2022
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

Šarūnas Navickas 60 Sep 26, 2022
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

EleutherAI 96 Dec 21, 2022