Python script to download the celebA-HQ dataset from google drive

Last update: Dec 21, 2022

Related tags

Deep Learning download-celebA-HQ

Overview

download-celebA-HQ

Python script to download and create the celebA-HQ dataset.

WARNING from the author. I believe this script is broken since a few months (I have not try it for a while). I am really sorry about that. If you fix it, please share you solution in a PR so that everyone can benefit from it.

To get the celebA-HQ dataset, you need to a) download the celebA dataset download_celebA.py , b) download some extra files download_celebA_HQ.py, c) do some processing to get the HQ images make_HQ_images.py.

The size of the final dataset is 89G. However, you will need a bit more storage to be able to run the scripts.

Usage

Clone the repository

git clone https://github.com/nperraud/download-celebA-HQ.git
cd download-celebA-HQ

Install necessary packages (Because specific versions are required Conda is recomended)

Install miniconda https://conda.io/miniconda.html
Create a new environement

conda create -n celebaHQ python=3
source activate celebaHQ

Install the packages

conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
pip install opencv-python==3.4.0.12 cryptography==2.1.4

Install 7zip (On Ubuntu)

sudo apt-get install p7zip-full

Run the scripts

python download_celebA.py ./
python download_celebA_HQ.py ./
python make_HQ_images.py ./

where ./ is the directory where you wish the data to be saved.

Go watch a movie, theses scripts will take a few hours to run depending on your internet connection and your CPU power. The final HQ images will be saved as .npy files in the ./celebA-HQ folder.

Windows

The script may work on windows, though I have not tested this solution personnaly

Step 2 becomes

Install miniconda https://conda.io/miniconda.html or anaconda
Create a new environement

conda create -n celebaHQ python=3
source activate celebaHQ

Install the packages

conda  install -c anaconda jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy

Install 7zip

The rest should be unchanged.

Docker

If you have Docker installed, skip the previous installation steps and run the following command from the root directory of this project:

docker build -t celeba . && docker run -it -v $(pwd):/data celeba

By default, this will create the dataset in same directory. To put it elsewhere, replace $(pwd) with the absolute path to the desired output directory.

Outliers

It seems that the dataset has a few outliers. A of problematic images is stored in bad_images.txt. Please report if you find other outliers.

Remark

This script is likely to break somewhere, but if it executes until the end, you should obtain the correct dataset.

Sources

This code is inspired by these files

Citing the dataset

You probably want to cite the paper "Progressive Growing of GANs for Improved Quality, Stability, and Variation" that was submitted to ICLR 2018 by Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University).

Comments

celebA-HQ output is 3*3 image replication with 1024*1024?

Hi nperraud, Thanks for your code, I'm running the make_HQ_images.py without any problem, but the result is not a single subject 1024by1024 image but a 3by3 replication of the subject(and the image is in grey scale). Is this normal or I missed some steps? Here is an example:

opened by HavenFeng 6

Windows 10 Compatibility

To get this to work on Windows 10 I had to do the following:

Create a new environement

conda activate celebaHQ

Install the packages

conda  install -c anaconda jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy

I had to add a if __name__ == '__main__': line just before num_workers.

if __name__ == '__main__':

    num_workers = mp.cpu_count() - 1
    print('Starting a pool with {} workers'.format(num_workers))
    with mp.Pool(processes=num_workers) as pool:
        pool.map(do_the_work, list(range(expected_dat)))
    if len(glob.glob(os.path.join(delta_dir, '*.npy'))) != 30000:
        raise ValueError('Expected to find {} npy files\n Something went wrong!'.format(30000))
# Remove the dat files
    for filepath in glob.glob(os.path.join(delta_dir, '*.dat')):
        os.remove(filepath)
    print('All done! Congratulations!')

opened by p-funk 3

Error in make_HQ_images.py OSError: 3145728 requested and 1671040 written

Script was working until this error:

Traceback (most recent call last):
File "/workspace/make_HQ_images.py", line 164, in pool.map(do_the_work, list(range(expected_dat)))
File "/opt/conda/lib/python3.5/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/opt/conda/lib/python3.5/multiprocessing/pool.py", line 644, in get raise self._value
OSError: 3145728 requested and 1671040 written

opened by TianaCo 3
error in Make_hq_images.py float() argument must be a string or a number, not 'Image'

Hi,

thank you for this code ! I have an error in the make_celeba_hQ.py

Traceback (most recent call last): File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "make_HQ_images.py", line 157, in do_the_work img = process_func(img_num) File "make_HQ_images.py", line 115, in process_func img = np.pad(np.float32(img), ((pad[1], pad[3]), (pad[0], pad[2]), (0, 0)), 'reflect') TypeError: float() argument must be a string or a number, not 'Image' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "make_HQ_images.py", line 167, in pool.map(do_the_work, list(range(expected_dat))) File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.5.4/lib/python3.5/multiprocessing/pool.py", line 644, in get raise self._value TypeError: float() argument must be a string or a number, not 'Image'

Do you know where it can come from??

Thanks in advance

opened by tldoan 3
Flaws images 70 && 2815

Hello and thanks for sharing your code! :smile:

I've been browsing in the generated images and most of them are of high quality. However, I found that images 70 and 2815 are not that great.

Image 70 - half of the face is cut

Image 2815 - there is an eye in the mouth

Is it the same for you? Is it the same from the original repo?(NIVIDIA) Did you find other images with small flaws?

opened by nicolastah 3

sum simplification code

sum([1 for file in os.listdir(img_dir) if file[-4:] == '.jpg'])

could be simplified in a bit efficient way

from glob import glob
len(glob(os.path.join(img_dir,'*.jpg')))

opened by Arsey 3

Convert to HQ some part of celeba dataset

I want the celeba hq dataset for some task in which I just want 2000 images. So I have downloaded the celeba dataset, but please tell me which delta should I download ( Because I have slow internet connection ) to convert it to HQ, and what will be the process

opened by mabdullahrafique 2
Docker issue "FileNotFoundError: [Errno 2] No such file or directory: 'image_list.txt'"

It successfully loaded Celeba and Celeba-HQ deltas, then it crashed because "image_list.txt" doesn't exist

Loading CelebA from ./celebA/Img/img_celeba Loading CelebA-HQ deltas from ./celebA-HQ Traceback (most recent call last): File "/workspace/make_HQ_images.py", line 45, in with open(os.path.join('image_list.txt'), 'rt') as file: FileNotFoundError: [Errno 2] No such file or directory: 'image_list.txt'

opened by TianaCo 2
Total size of the dataset

Hello, and thanks for your work! Could you tell me what the total size of the dataset is? I need to know on what kind of hard drive I should run the program. Maybe it would be a useful information to display somewhere on the README.

opened by RomeoDespres 1
Dropbox links not working currently

Hello, I am currently unable to use dropbox links in download_celebA.py file, are there any alternatives for those or how else I can get the dataset?

Thank you.

opened by unlut 0
docker fails
I was trying to run this with docker, it didn't work:

After clone: $ docker build -t celeba_hq . && docker run -it -v $(pwd):/data celeba_hq ... PackagesNotFoundError: The following packages are not available from current channels:

pillow==3.1.1

jpeg=8d

Current channels:

https://repo.anaconda.com/pkgs/main/linux-64

https://repo.anaconda.com/pkgs/main/noarch

https://repo.anaconda.com/pkgs/r/linux-64

https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page. ...
opened by gogobd 0
download_celebA.py: Checksum doesn't match

Command: cd download-celebA-HQ/ && python download_celebA.py ./

Output:

Downloading img_align_celeba.zip to ./celebA/img_align_celeba.zip ./celebA/img_align_celeba.zip: 1.00B [00:00, 847B/s] Done! Check SHA1 ./celebA/img_align_celeba.zip Traceback (most recent call last): File "download_celebA.py", line 219, in download_celabA(dataset_dir) File "download_celebA.py", line 183, in download_celabA filepaths = download_and_check(_ALIGNED_IMGS_DRIVE, dataset_dir) File "download_celebA.py", line 106, in download_and_check raise RuntimeError('Checksum mismatch for %s.' % save_path) RuntimeError: Checksum mismatch for ./celebA/img_align_celeba.zip.

opened by madaanpulkit 3
zipfile.BadZipFile: File is not a zip file

The above exception was raised in running download_celebA_HQ.py

Command:

python download_celebA_HQ.py ./img_align_celeba

Output log:

Deal with file: deltas00000.zip [*] ./img_align_celeba/deltas00000.zip already exists Traceback (most recent call last): File "download_celebA_HQ.py", line 93, in with zipfile.ZipFile(save_path) as zf: File "/u/jliang/anaconda3/envs/tf_gpu/lib/python3.6/zipfile.py", line 1108, in init self._RealGetContents() File "/u/jliang/anaconda3/envs/tf_gpu/lib/python3.6/zipfile.py", line 1175, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

Given that this issue hasn't seemed to occur previously, is this an issue associated with the file itself? Or is it due to my setup?

opened by JL321 3
Only Linux and Mac OS X support .7z

I've got the following exception during the downloading

if os.name != 'posix': raise NotImplementedError('Only Linux and Mac OS X support .7z '

It downloads and unpacks the first 200K images easily but when it comes to archives like img_celeba.7z.001 it raises the exception.

What should I do with Windows 10 then? 7z is already installed.

opened by DenisDiachkov 2
Error downloading `jpeg` & `pillow` with conda
I get stuck at the same error whether I try the Docker approach or follow the instructions myself.

PackagesNotFoundError: The following packages are not available from current channels: - pillow==3.1.1 - jpeg=8d

Am I doing something wrong? Or does the script need to be udpated?
opened by vgunjal 14
make_HQ_images.py needs modification

conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy results in dependency conflict. conda install pillow==3.1.1 alone causes anaconda to demand switching to python2

doing conda install -c conda-forge tqdm (first anaconda result after google 'conda tqdm') and same for pillow (without setting version) numpy, criptography, scipy,

and also comment out the the warnings works

I am using ubuntu 19 I ran the first two python files, than found that i didn't setup conda environment correctly and was installing onto the default env, so removed anaconda and reinstalled, than roughly followed the above procedure and it worked.

opened by jimmy-academia 2

Owner

GitHub

PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Progressive Growing of GANs inference in PyTorch with CelebA training snapshot Description This is an inference sample written in PyTorch of the origi

320 Nov 21, 2022

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

39 Dec 27, 2022

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022

An Artificial Intelligence trying to drive a car by itself on a user created map

17 Jan 13, 2022

An addon uses SMPL's poses and global translation to drive cartoon character in Blender.

Blender addon for driving character The addon drives the cartoon character by passing SMPL's poses and global translation into model's armature in Ble

153 Dec 14, 2022

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

A simple API for working with University of California, Irvine (UCI) Machine Learning (ML) repository Table of Contents Introduction About Page of the

223 Dec 5, 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

This is the dataset and code release of the OpenRooms Dataset.

95 Jan 8, 2023

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

109 Dec 29, 2022

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

68 Jul 18, 2022

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

39 Oct 5, 2021

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

174 Dec 22, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

62 Dec 27, 2022

A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

16 Oct 18, 2022

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

1 Jan 7, 2022

Springer Link Download Module for Python

♞ pupalink A simple Python module to search and download books from SpringerLink. ?? This project is still in an early stage of development. Expect br

18 Nov 21, 2022

IPATool-py: download ipa easily

IPATool-py Python version of IPATool! Installation pip3 install -r requirements.txt Usage Quickstart: download app with specific bundleId into DIR: p

159 Dec 30, 2022