This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Overview

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Random Sample

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis
Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt

Project Page | Video | Demo | Paper | Data

Abstract: We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.

Requirements

The codebase is tested on

  • Python 3.7
  • PyTorch 1.7.1
  • 8 Nvidia GPU (Tesla V100 32GB) with CUDA version 11.0

For additional python libraries, please install by:

pip install -r requirements.txt

Please refer to https://github.com/NVlabs/stylegan2-ada-pytorch for additional software/hardware requirements.

Dataset

We follow the same dataset format as StyleGAN2-ADA supported, which can be either an image folder, or a zipped file.

Pretrained Checkpoints

You can download the pre-trained checkpoints (used in our paper) and some recent variants trained with current codebase as follows:

Dataset Resolution #Params(M) Config Download
FFHQ 256 128 Default Hugging Face 🤗
FFHQ 512 148 Default Hugging Face 🤗
FFHQ 1024 184 Default Hugging Face 🤗

(I am slowly adding more checkpoints. Thanks for your very kind patience!)

Train a new StyleNeRF model

python run_train.py outdir=${OUTDIR} data=${DATASET} spec=paper512 model=stylenerf_ffhq

It will automatically detect all usable GPUs.

Please check configuration files at conf/model and conf/spec. You can always add your own model config. More details on how to use hydra configuration please follow https://hydra.cc/docs/intro/.

Render the pretrained model

python generate.py --outdir=${OUTDIR} --trunc=0.7 --seeds=${SEEDS} --network=${CHECKPOINT_PATH} --render-program="rotation_camera"

It supports different rotation trajectories for rendering new videos.

Run a demo page

python web_demo.py 21111

It will in default run a Gradio-powered demo on https://localhost:21111

[NEW] The demo is also integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

Web demo

Run a GUI visualizer

python visualizer.py

An interative application will show up for users to play with. GUI demo

Citation

@inproceedings{
    gu2022stylenerf,
    title={StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis},
    author={Jiatao Gu and Lingjie Liu and Peng Wang and Christian Theobalt},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=iUuzzTMUw9K}
}

License

Copyright © Facebook, Inc. All Rights Reserved.

The majority of StyleNeRF is licensed under CC-BY-NC, however, portions of this project are available under a separate license terms: all codes used or modified from stylegan2-ada-pytorch is under the Nvidia Source Code License.

Issues
  • Training fails on multi-gpu setup

    Training fails on multi-gpu setup

    Hello StyleNeRF folks, thank you so much for releasing the code!

    I am trying to train the model on a 8xA6000 box with no success so far.

    python run_train.py outdir=/root/out data=/root/256.zip spec=paper256 model=stylenerf_ffhq

    I have validated that a single GPU A6000 does work, I've also used the provided configs.

    I am running Ubuntu 20.04.3 LTS with Pytorch LTS (1.8.2) and CUDA 11.1 (which is necessary for A6000 support AFAIK).

    Here is the stack trace I am getting, lmk if I can provide any additional information:

    Error executing job with overrides: ['outdir=/root/out', 'data=/root/P256.zip', 'spec=paper256', 'model=stylenerf_ffhq']
    Traceback (most recent call last):
      File "run_train.py", line 396, in <module>
        main() # pylint: disable=no-value-for-parameter
      File "/usr/local/lib/python3.8/dist-packages/hydra/main.py", line 49, in decorated_main
        _run_hydra(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
        run_and_report(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 214, in run_and_report
        raise ex
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
        return func()
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
        lambda: hydra.run(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
        _ = ret.return_value
      File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
        raise self._return_value
      File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
        ret.return_value = task_function(task_cfg)
      File "run_train.py", line 378, in main
        torch.multiprocessing.spawn(fn=subprocess_fn, args=(args,), nprocs=args.num_gpus)
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
        return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
        while not context.join():
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
        raise ProcessRaisedException(msg, error_index, failed_process.pid)
    torch.multiprocessing.spawn.ProcessRaisedException:
    
    -- Process 5 terminated with the following error:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
        fn(i, *args)
      File "/root/StyleNeRF/run_train.py", line 302, in subprocess_fn
        training_loop.training_loop(**args)
      File "/root/StyleNeRF/training/training_loop.py", line 221, in training_loop
        module = torch.nn.parallel.DistributedDataParallel(
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 448, in __init__
        self._ddp_init_helper()
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 603, in _ddp_init_helper
        self.reducer = dist.Reducer(
    RuntimeError: replicas[0][0] in this process with strides [60, 1, 1, 1] appears not to match strides of the same param in process 0.
    
    opened by mike-athene 7
  • some error when inference

    some error when inference

    Hi! Thanks your amazing work! I try to render your pretrained model as described in https://github.com/facebookresearch/StyleNeRF/#render-the-pretrained-model using your ffhq_512.pkl. But it failed! The error message is as follows:

    legacy.py", line 21, in load_network_pkl
        data = _LegacyUnpickler(f).load()
    _pickle.UnpicklingError: invalid load key, 'v'.
    

    Is there anything wrong with your hugging face pretrianed model or somewhere else? Looking forward to your reply!

    opened by 41xu 4
  • Can't get anything to work after installation seemed to go smoothly

    Can't get anything to work after installation seemed to go smoothly

    So I'm a little bit of a noob when it comes to installing things like this (although I have succesfully in the past). I followed all of the instructions, and everything seemed to install fine with no issues.

    After installing the requirements, none of the other commands seem to work.

    Is there a detailed guide anywhere as to how to get everything set up correctly?

    opened by Wythneth 4
  • Request to release pretrained models

    Request to release pretrained models

    We're glad you finally released the code, this is a great job. Can you release the pretrained models, especially the CompCars model, for a better experience?

    opened by huangqiusheng 4
  • Difference between Generator in run_train and the one used to train pretrained checkpoints

    Difference between Generator in run_train and the one used to train pretrained checkpoints

    I get this error when loading pretrained network. The size of the layer was changed: Error loading: synthesis.fg_nerf.feat_out.weight torch.Size([64, 128, 1, 1]) torch.Size([256, 128, 1, 1])

    Where can I modify the structure of generator to be the same as in pretrained checkpoint?

    opened by KyriaAnnwyn 3
  • Bug? Wrong shape

    Bug? Wrong shape

    https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/apps/inversion.py#L119

    if we set encoder_z=True, the shape of zs output from E is [1,17,512], but the mapping network can only accept input with 2 dimensions ([1, 512]). Different from stylegan, the value of each z of zs is not the same (zs[:,0,:] != zs[:,1,:]). So, we can not squeeze zs from [1,17,512] into [1,512].

    opened by lelechen63 2
  • Training config of EG3D

    Training config of EG3D

    Thanks for releasing the code! I tried training eg3d from scratch following the config file "stylenerf_ffhq_eg3d" , but it cannot converge. How should I change the config file?

    opened by MrTornado24 2
  • How to get the high resolution result

    How to get the high resolution result

    Hello I have a issue when I want to train my own model with my dataset. The resolution of my dataset is 10241024 and the output of my model turn out to be 3232 and when I try for 10 hour it became to 64*64 resolution. I have no idea about this growth resolution and I want to get a high resolution result as the same as my training dataset how to do it? Can you help me? Thank you!

    opened by apriljt 1
  • StyleGan3

    StyleGan3

    Hello! I'm trying to use the web_app with stylegan3 generator, but I have an error TypeError: __init__() got an unexpected keyword argument 'channel_base.

    opened by hadhoryth 1
  • Change Facial expressions?

    Change Facial expressions?

    I was wondering if the visualizer allows you to change the facial expressions? I messed around with the demo and it's fine but 90 percent of the faces are smiling or showing teeth which is not what I want. I tried to look at the image of the visualizer GUI but I couldn't tell if there was any input to change the facial expression so I figured I would ask before I go through all the effort to install stylenerf. Thanks everyone

    opened by Echolink50 1
  • Question for Upsample operation (Equation 7 in paper)

    Question for Upsample operation (Equation 7 in paper)

    Thanks for the great work. However, I have a question about the upsample operation. In the released code, the Upsample operation seems to be the following code:

    https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/training/networks.py#L481-L491

    Why is the above code equal to the upsample operation mentioned in the paper? Looking forward to your reponse.

    opened by LeoXing1996 1
  • small dataset

    small dataset

    Hi, @MultiPath. Good job! when I have trained StyleGAN2-ADA on small dataset, result was acceptable. But it is irrelevant concept when i used StyleNeRF.

    Which solution do you recommend to get good performance? I mean huge dataset is always strongly required? or is there appropriate training configuration for small dataset? Thank you in advance

    opened by Harry-KIT 0
  • Wrong parameter count for StyleNeRF checkpoints?

    Wrong parameter count for StyleNeRF checkpoints?

    The README indicates parameter count of 128M, 153M, and 184M for the FFHQ models at 256,512, and 1024 resolution respectively But when I load up the checkpoints in Colab, I see that the 256 resolution model has 5.2 million parameters only. What is the cause of this discrepancy?

    opened by ksagoog 0
  • No block_kwargs for freezed layers

    No block_kwargs for freezed layers

    in run_train.py: line 249: args.D_kwargs.block_kwargs.freeze_layers = cfg.freezed

    I'm getting the error: omegaconf.errors.ConfigAttributeError: Missing key block_kwargs when set non zero value for freezed layers

    How can I freeze some layers?

    opened by KyriaAnnwyn 0
  • TypeError: cannot serialize '_io.BufferedReader' object

    TypeError: cannot serialize '_io.BufferedReader' object

    hi, first of all, thanks to the author for such an excellent work. Now I want to do some experiments on top of your work.

    However, when I rewrite the Dataset class, some baffling exception occurrs:

    Traceback (most recent call last):
      File "run_train.py", line 377, in main
        torch.multiprocessing.spawn(fn=subprocess_fn, args=(args,), nprocs=args.num_gpus)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
        return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
        while not context.join():
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
        raise Exception(msg)
    Exception: 
    
    -- Process 0 terminated with the following error:
    Traceback (most recent call last):
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
        fn(i, *args)
      File "/home/notebook/code/personal/80299039/MoFaStyleNeRF/run_train.py", line 301, in subprocess_fn
        training_loop.training_loop(**args)
      File "/home/notebook/code/personal/80299039/MoFaStyleNeRF/training/training_loop.py", line 150, in training_loop
        dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//world_size, **data_loader_kwargs))
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 352, in __iter__
        return self._get_iterator()
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
        return _MultiProcessingDataLoaderIter(self)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 801, in __init__
        w.start()
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/process.py", line 112, in start
        self._popen = self._Popen(self)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
        return Popen(process_obj)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        super().__init__(process_obj)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
        self._launch(process_obj)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
        reduction.dump(process_obj, fp)
      File "/home/notebook/code/personal/80299039/conda/envs/StyleNeRF/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    TypeError: cannot serialize '_io.BufferedReader' object
    

    Note that I can run your code without a hitch regardless of single GPU or 8 GPUs. So I guess the problem may come from Dataset class I have rewritten. Here is the Dataset I have rewritten:

    class ImageParamFolderDataset(Dataset):
        def __init__(self,
            path,                   # Image path to directory or zip.
            param_path,             # Param path to directory or zip.
            resolution      = None, # Ensure specific resolution, None = highest available.
            **super_kwargs,         # Additional arguments for the Dataset base class.
        ):
            self._path = path
            self._param_path = param_path
            self._zipfile = None
            self._param_zipfile = None
    
            if os.path.isdir(self._path):
                self._type = 'dir'
                self._all_fnames = {os.path.relpath(os.path.join(root, fname), start=self._path) for root, _dirs, files in os.walk(self._path) for fname in files}
                self._all_pnames = {os.path.relpath(os.path.join(root, fname), start=self._param_path) for root, _dirs, files in os.walk(self._param_path) for fname in files}
            elif self._file_ext(self._path) == '.zip':
                self._type = 'zip'
                self._all_fnames = set(self._get_zipfile().namelist())
                self._all_pnames = set(self._get_param_zipfile().namelist())
            else:
                raise IOError('Path must point to a directory or zip')
    
            PIL.Image.init()
            self._image_fnames = sorted(fname for fname in self._all_fnames if self._file_ext(fname) in PIL.Image.EXTENSION)
            self._param_fnames = sorted(pname for pname in self._all_pnames if self._file_ext(pname) == '.mat')
            if len(self._image_fnames) == 0:
                raise IOError('No image files found in the specified path')
            if len(self._param_fnames) == 0:
                raise IOError('No param files found in the specified path')
            if len(self._image_fnames) != len(self._param_fnames):
                raise IOError('Num of image files and num of param files are not equal')
    
            name = os.path.splitext(os.path.basename(self._path))[0]
            raw_shape = [len(self._image_fnames)] + list(self._load_raw_image_param(0)[0].shape)
            if resolution is not None:
                raw_shape[2] = raw_shape[3] = resolution
            # if resolution is not None and (raw_shape[2] != resolution or raw_shape[3] != resolution):
            #     raise IOError('Image files do not match the specified resolution')
            super().__init__(name=name, raw_shape=raw_shape, **super_kwargs)
    
        @staticmethod
        def _file_ext(fname):
            return os.path.splitext(fname)[1].lower()
    
        def _get_zipfile(self):
            assert self._type == 'zip'
            if self._zipfile is None:
                self._zipfile = zipfile.ZipFile(self._path)
            return self._zipfile
    
        def _get_param_zipfile(self):
            assert self._type == 'zip'
            if self._param_zipfile is None:
                self._param_zipfile = zipfile.ZipFile(self._param_path)
            return self._param_zipfile
    
        def _open_file(self, fname):
            if self._type == 'dir':
                return open(os.path.join(self._path, fname), 'rb')
            if self._type == 'zip':
                return self._get_zipfile().open(fname, 'r')
            return None
    
        def _open_param_file(self, fname):
            if self._type == 'dir':
                return open(os.path.join(self._param_path, fname), 'rb')
            if self._type == 'zip':
                return self._get_param_zipfile().open(fname, 'r')
            return None
    
        def close(self):
            try:
                if self._zipfile is not None:
                    self._zipfile.close()
                if self._param_zipfile is not None:
                    self._param_zipfile.close()
            finally:
                self._zipfile = None
                self._param_zipfile = None
    
        def __getstate__(self):
            return dict(super().__getstate__(), _zipfile=None)
    
        def __getitem__(self, idx):
            image, param = self._load_raw_image_param(self._raw_idx[idx])
            assert isinstance(image, np.ndarray)
            assert list(image.shape) == self.image_shape
            assert image.dtype == np.uint8
            if self._xflip[idx]:
                assert image.ndim == 3 # CHW
                image = image[:, :, ::-1]
            return image.copy(), param, self.get_label(idx), idx
    
        def _load_raw_image_param(self, raw_idx):
            fname = self._image_fnames[raw_idx]
            pname = self._param_fnames[raw_idx]
            assert os.path.splitext(fname)[0] == os.path.splitext(pname)[0], 'Path of image and param must be the same'
            with self._open_file(fname) as f:
                if pyspng is not None and self._file_ext(fname) == '.png':
                    image = pyspng.load(f.read())
                else:
                    image = np.array(PIL.Image.open(f))
            with self._open_param_file(pname) as f:
                param_dict = sio.loadmat(f)
                param = self._process_param_dict(param_dict)
            if image.ndim == 2:
                image = image[:, :, np.newaxis] # HW => HWC
            if hasattr(self, '_raw_shape') and image.shape[0] != self.resolution:  # resize input image
                image = cv2.resize(image, (self.resolution, self.resolution), interpolation=cv2.INTER_AREA)
            image = image.transpose(2, 0, 1) # HWC => CHW
            return image, param
    
        def _process_param_dict(self, param_dict):
            id = param_dict['id']; exp = param_dict['exp']
            tex = param_dict['tex']; gamma = param_dict['gamma']
            angle = param_dict['angle']; trans = param_dict['trans']
            return np.concatenate((id, exp, tex, gamma, angle, trans), axis=None)
    
        def _load_raw_labels(self):
            fname = 'dataset.json'
            if fname not in self._all_fnames:
                return None
            with self._open_file(fname) as f:
                labels = json.load(f)['labels']
            if labels is None:
                return None
            labels = dict(labels)
            labels = [labels[fname.replace('\\', '/')] for fname in self._image_fnames]
            labels = np.array(labels)
            labels = labels.astype({1: np.int64, 2: np.float32}[labels.ndim])
            return labels
    
        def get_dali_dataloader(self, batch_size, world_size, rank, gpu):  # TODO
            from nvidia.dali import pipeline_def, Pipeline
            import nvidia.dali.fn as fn
            import nvidia.dali.types as types
            from nvidia.dali.plugin.pytorch import DALIGenericIterator
            
            @pipeline_def
            def pipeline():
                jpegs, _ = fn.readers.file(
                    file_root=self._path,
                    files=list(self._all_fnames),
                    random_shuffle=True,
                    shard_id=rank, 
                    num_shards=world_size, 
                    name='reader')
                images = fn.decoders.image(jpegs, device='mixed')
                mirror = fn.random.coin_flip(probability=0.5) if self.xflip else False
                images = fn.crop_mirror_normalize(
                    images.gpu(), output_layout="CHW", dtype=types.UINT8, mirror=mirror)
                labels = np.zeros([1, 0], dtype=np.float32)
                return images, labels
            
            dali_pipe = pipeline(batch_size=batch_size//world_size, num_threads=2, device_id=gpu)
            dali_pipe.build()
            training_set_iterator = DALIGenericIterator([dali_pipe], ['img', 'label'])
            for data in training_set_iterator:
                yield data[0]['img'], data[0]['label']
    

    Could you please give some suggestions about how to fix it? Thanks in advance!

    opened by YunjieYu 1
  • How can I finetune only stylegan part and leave nerf part as is?

    How can I finetune only stylegan part and leave nerf part as is?

    I want to finetune ffhq_512 on my face database, so that generated faces are closer to my base examples and not to ffhq, but leave nerf block as is in pretrained model. How can I freeze this part?

    opened by KyriaAnnwyn 0
Owner
Meta Research
Meta Research
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 73 Jul 27, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 6.9k Aug 14, 2022
The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

Sight Machine 2.6k Jul 29, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 63.2k Aug 12, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 157 Aug 11, 2022
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

null 46.2k Aug 10, 2022
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.

CellProfiler 693 Aug 15, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 213 Aug 12, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 6.9k Aug 11, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 116 Jan 31, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 408 Aug 12, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 766 Aug 8, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 487 Jun 30, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 89 Apr 28, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 317 Aug 2, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 276 Jul 29, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

null 210 Jul 26, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 73 Jul 31, 2022