Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Overview

Gradient Cache

Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means training that used to take heavy hardware, e.g. 8 V100 GPU, can be done on a single GPU. In addition, Gradient Cache allow users to replace big RAM GPU with much more cost efficient high FLOP low RAM cards.

This repo holds a generic Pytorch implementation of Gradient Cache described in our paper Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup .

@inproceedings{gao2021scaling,
     title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
     author={Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan},
     booktitle ={Proceedings of the 6th Workshop on Representation Learning for NLP},
     year={2021},
}

Gradient Cache has also been integrated into dense passage retrieval (DPR). Checkout our GC-DPR toolkit.

Installation

The package depends only on pytorch>=1.6. To install, clone this repo and run pip.

git clone https://github.com/luyug/GradCache
cd GradCache
pip install .

For development,

pip install --editable .

Usage

Gradient caching functionalities are implemented in GradCache class. If you are developing a new project instead of patching an old one, also checkout our functional approach for a effort reduced approach.

Initialization

The class's __init__ method defines the cache and has several functional parameters *_fn for easy adjust of model behaviors. Alternatively you can also sub-class GradCache.

grad_cache.GradCache(  
  models: List[nn.Module],  
  chunk_sizes: Union[int, List[int]],  
  loss_fn: Callable[..., Tensor],  
  split_input_fn: Callable[[Any, int], Any] = None,  
  get_rep_fn: Callable[..., Tensor] = None,  
  fp16: bool = False,  
  scaler: GradScaler = None,  
)

models - A list of encoder models to be updated with with the Gradient Cache.

chunk_sizes - An integer indicating chunk size. Or a list of integers of chunk size for each model. This controls for each model the sub-batch size to run forward-backward pass and should be set based on available GPU memory. A value too small will leave the GPU under utilized.

loss_fn - A loss function that takes representation tensors of number equal to number of models in models and arbitrary numbers of keyword arguments. It should compute loss based on the input tensors, and in no case modify the input tensors' relations in the autograd graph, which are later relied upon to create the gradient cache.

split_input_fn - An optional function that split generic model input into chunks based on defined chunk_sizes. If not provided, this class will try its best to split the inputs of supported types. See split_inputs function.

get_rep_fn - An optional function that takes generic model output and return representation tensors. If not provided, the generic output is assumed to be the representation tensor.

fp16 - If True, run mixed precision training, which requires scaler to also be set.

scaler - A GradScaler object for automatic mixed precision training.

Cache Gradient Step

To run a cached gradient computatoin step, call cache_step function,

cache_step(  
  *model_inputs,  
  no_sync_except_last: bool = False,  
  **loss_kwargs  
)

Run a single gradient cache step. Upon function return, updates are computed for each model in self.models with gradient populated on the weights, as if the model_inputs are run as a huge single batch on sufficiently large hardware. Calling an GradCache object with __call__ will also invoke this function.

model_inputs - List of inputs to each encoder model. Should be in similar order as self.models.

no_sync_except_last - If True, under distributed setup, for each model, only trigger gradient reduction across processes for the last sub-batch's forward-backward pass. This could come in handy when dealing with a) large model, and/or b) non trivial number of sub-batches.

loss_kwargs - Additional keyword arguments to the loss function loss_fn. This is intended to enable flexible loss computation (thanks to dynamic graph in Pytorch) such as reduction, weighting, etc. Potentially, using loss_kwargs you can incorporate outputs from those encoder models not tracked by the cache.

Return - loss, the current steps loss scaler tensor (detached from the graph).

Natively Supported Input Types

  • x: Tensor - will be passed in as model(x)
  • x: List[Tensor] - will be passed in as model(*x)
  • x: Dict[str, Tensor] (or UserDict[str, Tensor]) - will be passed in as model(**x)
  • x: Tuple[List[Tensor], Dict[str, Tensor]] - will be passed in as model(*x[0], **x[1])

Other generic input are not fully supported, we perform model call using the following heuristics,

  • x: List[Any] - will be passed in as model(*x)
  • x: Dict[str, Any] - will be passed in as model(**x)
  • x: Tuple[List[Any], Dict[str, Any]] - will be passed in as model(*x[0], **x[1])

To run with them, split_input_fn should be specified during cache initialization to break these inputs into smaller batches. In some rare cases, you may also need to override get_input_tensors when its heuristic can not grab enough tensors that covers all cuda devices that hold some tensors in the input.

Example Usage with Huggingface Transformers

Learning a Bi-encoder

Say we want to learn a embedding space of labels and text. Consider the following four pairs. (In practice, you will have many more and much longer text entries.)

labels = ['fruit', 'meat', 'school', 'company']
texts = [
  'this is an apple', 
  'steak should be cooked medium rare', 
  'cmu is pittsburgh', 
  'apple sells laptop'
]

Initialize our encoder models,

from transformers import AutoTokenizer, TFAutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
encoder1 = AutoModel.from_pretrained("bert-base-uncased").cuda()
encoder2 = AutoModel.from_pretrained("bert-base-uncased").cuda()

Initialize the GradCache object,

from grad_cache import GradCache
from grad_cache.loss import SimpleContrastiveLoss

loss_fn = SimpleContrastiveLoss()
gc = GradCache(
  models=[encoder1, encoder2], 
  chunk_sizes=2, 
  loss_fn=loss_fn, 
  get_rep_fn=lambda v: v.pooler_output
)

Here we use the get_rep_fn argument to specify a function that takes generic Huggingface model output and return the actual representation tensor.

Create model input,

xx = tokenizer(tt, return_tensors='pt', padding=True)
yy = tokenizer(tt2, return_tensors='pt', padding=True)

Run a cache step,

gc(xx, yy, reduction='mean')

Here we use reduction='mean' as a loss_kwargs to control loss behavior. With a defined optimizer, the full gradient update can be done as,

optimizer.zero_grad()
gc(xx, yy, reduction='mean')
optimizer.step()

Use Tied Encoder?

This is naturally handled by the (magic of) dynamic graph. You pass shallow copies of the same encoder model to the GradCache init method.

tied_encoder = AutoModel.from_pretrained("bert-base-uncased").cuda()
gc = GradCache(
  models=[tied_encoder , tied_encoder], 
  chunk_sizes=2, 
  loss_fn=loss_fn, 
  get_rep_fn=lambda v: v.pooler_output
)

Under the hood, distinct hooks will be registered to make correct gradient computation.

Distributed Training with Multiple GPUs?

We expect cross process communication of representations to be handled by the loss_fn.

from grad_cache.loss import DistributedContrastiveLoss
loss_fn_dist = DistributedContrastiveLoss()

Properly wrap the the encoder models for gradient reduction,

encoder1_ddp = DistributedDataParallel(
	encoder1, device_ids=[local_rank], output_device=local_rank, find_unused_parameters=True)
encoder2_ddp = DistributedDataParallel(
	encoder2, device_ids=[local_rank], output_device=local_rank, find_unused_parameters=True)

You can initialize the cache use the distributed loss and the DDP models,

gc = GradCache(
  models=[encoder1_ddp, encoder2_ddp], 
  chunk_sizes=2, 
  loss_fn=loss_fn_dist, 
  get_rep_fn=lambda v: v.pooler_output
)

Run a cache step,

gc(xx, yy, no_sync_except_last=True, reduction='mean')

Set no_sync_except_last=True to avoid unnecessary gradient reduction.

Functional Approach

Decorators

If you are developing a new project, we recommend also checking out the decorators we have provided to create higher order functions for cache.

grad_cache.functional.cached(func: Callable[..., Tensor])

A decorator that takes a model call function into a cached compatible version.

func - A function that calls the model and return representation tensor.

Return - A function that returns 1) representation leaf tensors for cache construction, 2) a closure function for the 2nd forward and the cached backward. Call 2) with 1) as argument after calling backward on the loss Tensor.

grad_cache.functional.cat_input_tensor(func: Callable[..., Tensor])

A decorator that concatenates positional and keyword arguments of type List[Tensor] into a single Tensor on the 0th dimension. This can come in handy dealing with results of representation tensors from multiple cached forward.

func - A loss function

Return - Decorated loss function for cached results.

Usage

The functional decorators are particular useful if your data loader is emitting small batches, from which you can construct the big batch. Say you also want to do automatic mixed precision, we first define the model call function and loss function,

from grad_cache.functional import cached, cat_input_tensor

import torch
import torch.nn.functional as F
from torch.cuda.amp import autocast

@cached
@autocast()
def  call_model(model, input):
	return model(**input).pooler_output

@cat_input_tensor
@autocast()
def  contrastive_loss(x, y):
	target = torch.arange(0, y.size(0), int(y.size(0) / x.size(0)), device=x.device)
	scores = torch.matmul(x, y.transpose(0, 1))
	return F.cross_entropy(scores, target=target)

Say you have a DataLoader loader emitting small batches of tuple (xx, yy) of size (M * N) and that you want to train by aggregating 16 small batches to get a batch of (16M * 16N),

cache_x = []
cache_y = []
closures_x = []
closures_y = []

for step, sub_batch in enumerate(loader):  
    xx, yy = sub_batch
    rx, cx = call_model(bert, xx)
    ry, cy = call_model(bert, yy)
    
    cache_x.append(rx)
    cache_y.append(ry)
    closuresx.append(cx)
    closuresy.append(cy)
    
    if (step + 1) % 16 == 0:
        loss = contrastive_loss(cache_x, cache_y)
        scaler.scale(loss).backward()
        
	for f, r in zip(closuresx, cache_x):
            f(r)
        for f, r in zip(closuresy, cache_y):
            f(r)

        cache_x = []
        cache_y = []
        closures_x = []
        closures_y = []
	
        scaler.step(optimizer)
        scaler.update()
        optimizer.zero_grad()

Code Structure

grad_cache/grad_cache.py - Define the GradCache class. The code is under 300 lines including comments. For development, we encourage you to read through it.

grad_cache/functional.py - Define decorators to create higher order function for gradient caching from ordinary model call functions and loss functions.

Comments
  • TypeError at grad_cache/functional.py:39

    TypeError at grad_cache/functional.py:39

    I tried to follow your example in the functional approach.

    But, I got the following TypeError.


    TypeError Traceback (most recent call last) Input In [102], in <cell line: 1>() 1 for f, r in zip(closures_x, cache_x): ----> 2 f(r)

    File ~/git/GradCache/src/grad_cache/functional.py:39, in cached..cache_func..forward_backward_func(cache_reps) 36 cache_reps = (cache_reps,) 37 assert len(reps) == len(cache_reps) ---> 39 surrogate = sum(map(lambda u, v: torch.dot(u.flatten(), v.grad.flatten()), zip(reps, cache_reps)), 0) 40 surrogate.backward()

    TypeError: () missing 1 required positional argument: 'v'

    Is it right to change the 39th line as like this ? surrogate = sum(map(lambda u, v: torch.dot(u.flatten(), v.grad.flatten()), reps, cache_reps), 0)

    opened by syoungbaak 4
  • Compatibility with Huggingface Trainer

    Compatibility with Huggingface Trainer

    Hi,

    First, congratulation of your nice and clear work.

    I just wonder this code could be used with huggingface trainer.

    I think it is bit tricky..

    Thanks!

    opened by sh0416 2
  • AttributeError: 'GCTrainer' object has no attribute 'scaler'

    AttributeError: 'GCTrainer' object has no attribute 'scaler'

    Hi @luyug, any idea on how to fix this?

    04/14/2022 15:48:04 - INFO - tevatron.trainer - Initializing Gradient Cache Trainer Traceback (most recent call last): File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 103, in main() File "/lustre07/scratch/odunayo/tevatron/src/tevatron/driver/train.py", line 84, in main trainer = trainer_cls( File "/lustre07/scratch/odunayo/tevatron/src/tevatron/trainer.py", line 105, in init scaler=self.scaler AttributeError: 'GCTrainer' object has no attribute 'scaler'

    opened by ToluClassics 1
  • Requirements of the python env?

    Requirements of the python env?

    Hi Luyu,

    thank you very much for the very nice and clean repository! :-)

    I was playing with GradCache to get it working with a CLIP setup and I needed some trail and error to find the right python version. With 3.6 from contextlib import nullcontext didn't work but with 3.8 I was able to get it to run.

    Therefore, I just wanted to ask you, if you have more information on your development environment, e.g., python version, etc., so I can compare it to mine.

    Kind regards Michael

    opened by MicPie 1
  • Great work! Helped creating sota embeddings

    Great work! Helped creating sota embeddings

    Just wanted to thank you for your great work! I used GradCache to build state of the art sentence embeddings (https://arxiv.org/abs/2202.08904). Thanks to GradCache, I could scale up batch sizes from 48 to 1024 for the model trained on NLI improving its average performance on USEB by 4%~

    opened by Muennighoff 0
  • Tiny numerical differences, Weight updates not perfectly matching

    Tiny numerical differences, Weight updates not perfectly matching

    Hi, thanks for this amazing library.

    I saw one tiny issue which is that the final weights of the model is different when training with multiple sub_batches per step vs 1 big_batch per step. I'm not sure if such numerical differences are expected when using this library.

    I'm using clip with contrastive loss, here's my quick experimental code that I made sure to run multiple times and it results in exactly the same output each time: (note: I'm using CLIP with 151 million parameters and a dataset of only 32 samples for experimental purposes)

    model1 = train_clip_normally(epochs=1, batch_size=16)
    model2 = train_clip_gradcache(epochs=1, batch_size=8, batches_per_backward=2)
    print(calc_model_param_difference(model1, model2)) # RETURN: 0.3163
    

    Above we see that training for two sub_batches of 8 vs training for 1 batch of 16 gives a tiny different in the norm of the weights of the two models.

    model1 = train_clip_normally(epochs=1, batch_size=16)
    model2 = train_clip_gradcache(epochs=1, batch_size=16, batches_per_backward=1)
    print(calc_model_param_difference(model1, model2)) # RETURN: 0
    

    Above we see that the models are equivalent when making gradcache perform a backward every batch

    model1 = train_clip_gradcache(epochs=1, batch_size=4, batches_per_backward=4)
    model2 = train_clip_gradcache(epochs=1, batch_size=8, batches_per_backward=2)
    print(calc_model_param_difference(model1, model2)) # RETURN: 0.3105
    

    Above we see the difference still exists for two different gradcache batch sizes

    However this library is still working amazingly as if I compare it with normally training with whatever maximum batch size fits in my GPU, I get a huge difference (which is expected and exactly why I need this library) as seen below

    model1 = train_clip_normally(epochs=1, batch_size=8)
    model2 = train_clip_gradcache(epochs=1, batch_size=8, batches_per_backward=2)
    print(calc_model_param_difference(model1, model2)) # RETURN: 363.2708
    

    Below is my code in case the problem is with it:

    def train_clip_normally(epochs, batch_size):
        dl = torch.utils.data.DataLoader(d, batch_size=batch_size, shuffle=False)
        model = MyCLIPModel("openai/clip-vit-base-patch32").to('cuda:1')
        optimizer = torch.optim.Adam(model.parameters())
        for e in range(epochs):
            cliptrain.train_epoch(model, optimizer, processor, dl)
        return model
    
    def train_clip_gradcache(epochs, batch_size, batches_per_backward):
        dl = torch.utils.data.DataLoader(d, batch_size=batch_size, shuffle=False)
        model = MyCLIPModel("openai/clip-vit-base-patch32").to('cuda:1')
        optimizer = torch.optim.Adam(model.parameters())
        for e in range(epochs):
            cliptrain.ClipModelClone.grad_cache_train(model, optimizer, processor, dl, batches_per_backward=batches_per_backward)
        return model
    
    def calc_model_param_difference(model1, model2):
        diff = 0
        for p1, p2 in zip(model1.parameters(), model2.parameters()):
            diff += torch.norm(p1.data - p2.data)
        return diff
    
    from grad_cache.functional import cached, cat_input_tensor
    
    def grad_cache_train(model, optimizer, processor, dataloader, batches_per_backward):
        cache_x = []
        cache_y = []
        closures_x = []
        closures_y = []
    
        for step, sub_batch in enumerate(dataloader):  
            inputs = processor(text=sub_batch['text'], return_tensors="pt", padding=True, truncation=True)
            inputs['input_ids'] = inputs['input_ids'].to(model.device)
            inputs['attention_mask'] = inputs['attention_mask'].to(model.device)
            inputs['pixel_values'] = sub_batch['image'].to(model.device)
            inputs['return_loss'] = True
    
            print('step', step)
            rx, cx = call_text_model(model, inputs)
            ry, cy = call_vision_model(model, inputs)
            
            cache_x.append(rx)
            cache_y.append(ry)
            closures_x.append(cx)
            closures_y.append(cy)
            
            if (step + 1) % batches_per_backward == 0:
                print('BACKWARD!')
                loss = grad_cat_loss(cache_x, cache_y, model.logit_scale)
                loss.backward()
                
                for f, r in zip(closures_x, cache_x):
                    f(r)
                for f, r in zip(closures_y, cache_y):
                    f(r)
    
                cache_x = []
                cache_y = []
                closures_x = []
                closures_y = []
            
                optimizer.step()
                optimizer.zero_grad()
    
    @cat_input_tensor
    def grad_cat_loss(text_embeds, image_embeds, logit_scale):
        sim = torch.matmul(text_embeds, image_embeds.t()) * logit_scale.exp()
        return clip_loss(sim)
    
    @cached
    def  call_text_model(model, input):
        return model.forward_text(**input)
    
    @cached
    def  call_vision_model(model, input):
        return model.forward_visual(**input)
    
    def clip_loss(similarity: torch.Tensor) -> torch.Tensor:
        caption_loss = contrastive_loss(similarity)
        image_loss = contrastive_loss(similarity.t())
        return (caption_loss + image_loss) / 2.0
    
    def contrastive_loss(logits: torch.Tensor) -> torch.Tensor:
        return torch.nn.functional.cross_entropy(logits, torch.arange(len(logits), device=logits.device))
    
    
    opened by Ar-Kareem 0
  • the batchsize with the gradcache

    the batchsize with the gradcache

    Dear writer, Your work is very good to me, I want to mix the SimCLR,but I don't know how to do because I find the gradcache without batchsize, but the SimCLR compute the loss function need the batchsize, So I don't how to deal the probelum. please give me some solutions or some tips if you are free.

    Thanks advance! Anyway,thanks your work, it solve me a difficulty!

    opened by here101 8
  • How does this provide the same gradient as a larger batch size?

    How does this provide the same gradient as a larger batch size?

    Looking through the code, I notice that there are mini-batches consisting of just negative examples that appear to be ignored entirely. If the code ignores certain combinations, how does using GradCache do the same thing as running larger batches on larger GPUs?

    I also ran an experiment where I developed an image-text contrastive learning example with a batch size of 64. I tested using the batch size of 64 directly, and tested using GradCache with a mini-batch of 16. The batch size of 64 directly had a much better performance via linear eval than using GradCache.

    opened by sameerkhanna786 6
Owner
Luyu Gao
NLP Research Master@LTI, CMU
Luyu Gao
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

null 1 Jan 27, 2022
This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

CGPN This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels]. Req

null 10 Sep 12, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

Jiarui Fang 9 Nov 14, 2022
This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust.

Demo BERT ONNX pipeline written in rust This demo showcase the use of onnxruntime-rs with a GPU on CUDA 11 to run Bert in a data pipeline with Rust. R

Xavier Tao 14 Dec 17, 2022
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 7, 2023
Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

null 3 Feb 18, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 5, 2023
Saeed Lotfi 28 Dec 12, 2022
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022