Python script to download the celebA-HQ dataset from google drive

Overview

download-celebA-HQ

Python script to download and create the celebA-HQ dataset.

WARNING from the author. I believe this script is broken since a few months (I have not try it for a while). I am really sorry about that. If you fix it, please share you solution in a PR so that everyone can benefit from it.

To get the celebA-HQ dataset, you need to a) download the celebA dataset download_celebA.py , b) download some extra files download_celebA_HQ.py, c) do some processing to get the HQ images make_HQ_images.py.

The size of the final dataset is 89G. However, you will need a bit more storage to be able to run the scripts.

Usage

  1. Clone the repository
git clone https://github.com/nperraud/download-celebA-HQ.git
cd download-celebA-HQ
  1. Install necessary packages (Because specific versions are required Conda is recomended)
conda create -n celebaHQ python=3
source activate celebaHQ
  • Install the packages
conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
pip install opencv-python==3.4.0.12 cryptography==2.1.4
  • Install 7zip (On Ubuntu)
sudo apt-get install p7zip-full
  1. Run the scripts
python download_celebA.py ./
python download_celebA_HQ.py ./
python make_HQ_images.py ./

where ./ is the directory where you wish the data to be saved.

  1. Go watch a movie, theses scripts will take a few hours to run depending on your internet connection and your CPU power. The final HQ images will be saved as .npy files in the ./celebA-HQ folder.

Windows

The script may work on windows, though I have not tested this solution personnaly

Step 2 becomes

conda create -n celebaHQ python=3
source activate celebaHQ
  • Install the packages
conda  install -c anaconda jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
  • Install 7zip

The rest should be unchanged.

Docker

If you have Docker installed, skip the previous installation steps and run the following command from the root directory of this project:

docker build -t celeba . && docker run -it -v $(pwd):/data celeba

By default, this will create the dataset in same directory. To put it elsewhere, replace $(pwd) with the absolute path to the desired output directory.

Outliers

It seems that the dataset has a few outliers. A of problematic images is stored in bad_images.txt. Please report if you find other outliers.

Remark

This script is likely to break somewhere, but if it executes until the end, you should obtain the correct dataset.

Sources

This code is inspired by these files

Citing the dataset

You probably want to cite the paper "Progressive Growing of GANs for Improved Quality, Stability, and Variation" that was submitted to ICLR 2018 by Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University).

Comments
  • celebA-HQ output is 3*3 image replication with 1024*1024?

    celebA-HQ output is 3*3 image replication with 1024*1024?

    Hi nperraud, Thanks for your code, I'm running the make_HQ_images.py without any problem, but the result is not a single subject 1024by1024 image but a 3by3 replication of the subject(and the image is in grey scale). Is this normal or I missed some steps? Here is an example: celebhq_test

    opened by HavenFeng 6
  • Windows 10 Compatibility

    Windows 10 Compatibility

    To get this to work on Windows 10 I had to do the following:

    Create a new environement

    conda activate celebaHQ
    

    Install the packages

    conda  install -c anaconda jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
    

    I had to add a if __name__ == '__main__': line just before num_workers.

    if __name__ == '__main__':
    
        num_workers = mp.cpu_count() - 1
        print('Starting a pool with {} workers'.format(num_workers))
        with mp.Pool(processes=num_workers) as pool:
            pool.map(do_the_work, list(range(expected_dat)))
        if len(glob.glob(os.path.join(delta_dir, '*.npy'))) != 30000:
            raise ValueError('Expected to find {} npy files\n Something went wrong!'.format(30000))
    # Remove the dat files
        for filepath in glob.glob(os.path.join(delta_dir, '*.dat')):
            os.remove(filepath)
        print('All done! Congratulations!')
    
    opened by p-funk 3
  • Error in make_HQ_images.py OSError: 3145728 requested and 1671040 written

    Error in make_HQ_images.py OSError: 3145728 requested and 1671040 written

    Script was working until this error:

    Traceback (most recent call last):
    File "/workspace/make_HQ_images.py", line 164, in pool.map(do_the_work, list(range(expected_dat)))
    File "/opt/conda/lib/python3.5/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/opt/conda/lib/python3.5/multiprocessing/pool.py", line 644, in get raise self._value
    OSError: 3145728 requested and 1671040 written

    opened by TianaCo 3
  • error in Make_hq_images.py   float() argument must be a string or a number, not 'Image'

    error in Make_hq_images.py float() argument must be a string or a number, not 'Image'

    Hi,

    thank you for this code ! I have an error in the make_celeba_hQ.py

    Traceback (most recent call last): File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "make_HQ_images.py", line 157, in do_the_work img = process_func(img_num) File "make_HQ_images.py", line 115, in process_func img = np.pad(np.float32(img), ((pad[1], pad[3]), (pad[0], pad[2]), (0, 0)), 'reflect') TypeError: float() argument must be a string or a number, not 'Image' """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "make_HQ_images.py", line 167, in pool.map(do_the_work, list(range(expected_dat))) File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 644, in get raise self._value TypeError: float() argument must be a string or a number, not 'Image'

    Do you know where it can come from??

    Thanks in advance

    opened by tldoan 3
  • Flaws images 70 && 2815

    Flaws images 70 && 2815

    Hello and thanks for sharing your code! :smile:

    I've been browsing in the generated images and most of them are of high quality. However, I found that images 70 and 2815 are not that great.

    Image 70 - half of the face is cut imghq00070

    Image 2815 - there is an eye in the mouth imghq02815

    Is it the same for you? Is it the same from the original repo?(NIVIDIA) Did you find other images with small flaws?

    opened by nicolastah 3
  • sum simplification code

    sum simplification code

    sum([1 for file in os.listdir(img_dir) if file[-4:] == '.jpg'])
    

    could be simplified in a bit efficient way

    from glob import glob
    len(glob(os.path.join(img_dir,'*.jpg')))
    
    opened by Arsey 3
  • Convert to HQ some part of celeba dataset

    Convert to HQ some part of celeba dataset

    I want the celeba hq dataset for some task in which I just want 2000 images. So I have downloaded the celeba dataset, but please tell me which delta should I download ( Because I have slow internet connection ) to convert it to HQ, and what will be the process

    opened by mabdullahrafique 2
  • Docker issue

    Docker issue "FileNotFoundError: [Errno 2] No such file or directory: 'image_list.txt'"

    It successfully loaded Celeba and Celeba-HQ deltas, then it crashed because "image_list.txt" doesn't exist

    Loading CelebA from ./celebA/Img/img_celeba Loading CelebA-HQ deltas from ./celebA-HQ Traceback (most recent call last): File "/workspace/make_HQ_images.py", line 45, in with open(os.path.join('image_list.txt'), 'rt') as file: FileNotFoundError: [Errno 2] No such file or directory: 'image_list.txt'

    opened by TianaCo 2
  • Total size of the dataset

    Total size of the dataset

    Hello, and thanks for your work! Could you tell me what the total size of the dataset is? I need to know on what kind of hard drive I should run the program. Maybe it would be a useful information to display somewhere on the README.

    opened by RomeoDespres 1
  • Dropbox links not working currently

    Dropbox links not working currently

    Hello, I am currently unable to use dropbox links in download_celebA.py file, are there any alternatives for those or how else I can get the dataset?

    Thank you.

    opened by unlut 0
  • docker fails

    docker fails

    I was trying to run this with docker, it didn't work:

    After clone: $ docker build -t celeba_hq . && docker run -it -v $(pwd):/data celeba_hq ... PackagesNotFoundError: The following packages are not available from current channels:

    • pillow==3.1.1
    • jpeg=8d

    Current channels:

    • https://repo.anaconda.com/pkgs/main/linux-64
    • https://repo.anaconda.com/pkgs/main/noarch
    • https://repo.anaconda.com/pkgs/r/linux-64
    • https://repo.anaconda.com/pkgs/r/noarch

    To search for alternate channels that may provide the conda package you're looking for, navigate to

    https://anaconda.org
    

    and use the search bar at the top of the page. ...

    opened by gogobd 0
  • download_celebA.py: Checksum doesn't match

    download_celebA.py: Checksum doesn't match

    Command: cd download-celebA-HQ/ && python download_celebA.py ./

    Output:

    Downloading img_align_celeba.zip to ./celebA/img_align_celeba.zip ./celebA/img_align_celeba.zip: 1.00B [00:00, 847B/s] Done! Check SHA1 ./celebA/img_align_celeba.zip Traceback (most recent call last): File "download_celebA.py", line 219, in download_celabA(dataset_dir) File "download_celebA.py", line 183, in download_celabA filepaths = download_and_check(_ALIGNED_IMGS_DRIVE, dataset_dir) File "download_celebA.py", line 106, in download_and_check raise RuntimeError('Checksum mismatch for %s.' % save_path) RuntimeError: Checksum mismatch for ./celebA/img_align_celeba.zip.

    opened by madaanpulkit 3
  • zipfile.BadZipFile: File is not a zip file

    zipfile.BadZipFile: File is not a zip file

    The above exception was raised in running download_celebA_HQ.py

    Command:

    python download_celebA_HQ.py ./img_align_celeba

    Output log:

    Deal with file: deltas00000.zip [*] ./img_align_celeba/deltas00000.zip already exists Traceback (most recent call last): File "download_celebA_HQ.py", line 93, in with zipfile.ZipFile(save_path) as zf: File "/u/jliang/anaconda3/envs/tf_gpu/lib/python3.6/zipfile.py", line 1108, in init self._RealGetContents() File "/u/jliang/anaconda3/envs/tf_gpu/lib/python3.6/zipfile.py", line 1175, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

    Given that this issue hasn't seemed to occur previously, is this an issue associated with the file itself? Or is it due to my setup?

    opened by JL321 3
  • Only Linux and Mac OS X support .7z

    Only Linux and Mac OS X support .7z

    I've got the following exception during the downloading

    if os.name != 'posix': raise NotImplementedError('Only Linux and Mac OS X support .7z '

    It downloads and unpacks the first 200K images easily but when it comes to archives like img_celeba.7z.001 it raises the exception.

    What should I do with Windows 10 then? 7z is already installed.

    opened by DenisDiachkov 2
  • Error downloading `jpeg` & `pillow` with conda

    Error downloading `jpeg` & `pillow` with conda

    I get stuck at the same error whether I try the Docker approach or follow the instructions myself.

    PackagesNotFoundError: The following packages are not available from current channels:
    
      - pillow==3.1.1
      - jpeg=8d 
    

    Am I doing something wrong? Or does the script need to be udpated?

    opened by vgunjal 14
  • make_HQ_images.py needs modification

    make_HQ_images.py needs modification

    conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy results in dependency conflict. conda install pillow==3.1.1 alone causes anaconda to demand switching to python2

    doing conda install -c conda-forge tqdm (first anaconda result after google 'conda tqdm') and same for pillow (without setting version) numpy, criptography, scipy,

    and also comment out the the warnings works

    I am using ubuntu 19 I ran the first two python files, than found that i didn't setup conda environment correctly and was installing onto the default env, so removed anaconda and reinstalled, than roughly followed the above procedure and it worked.

    opened by jimmy-academia 2
Owner
null
PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Progressive Growing of GANs inference in PyTorch with CelebA training snapshot Description This is an inference sample written in PyTorch of the origi

null 320 Nov 21, 2022
Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Antonio Piazza 39 Dec 27, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022
An Artificial Intelligence trying to drive a car by itself on a user created map

An Artificial Intelligence trying to drive a car by itself on a user created map

Akhil Sahukaru 17 Jan 13, 2022
An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

犹在镜中 153 Dec 14, 2022
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

A simple API for working with University of California, Irvine (UCI) Machine Learning (ML) repository Table of Contents Introduction About Page of the

Tirthajyoti Sarkar 223 Dec 5, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 8, 2023
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

rand-net 16 Oct 18, 2022
simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

Ramón Casero 1 Jan 7, 2022
Springer Link Download Module for Python

♞ pupalink A simple Python module to search and download books from SpringerLink. ?? This project is still in an early stage of development. Expect br

Pupa Corp. 18 Nov 21, 2022
IPATool-py: download ipa easily

IPATool-py Python version of IPATool! Installation pip3 install -r requirements.txt Usage Quickstart: download app with specific bundleId into DIR: p

null 159 Dec 30, 2022