gpv2-web10k
This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image Search, and downloads the returned thumbnail images to an Amazon S3 bucket the user specifies. To use this script, you will need a Bing Image Search API key.
Setup
python3 -mvenv venv
source venv/bin/activate
pip install -r requirements.txt
Adding the Bing Search API Key and Amazon S3 Bucket name
Add your API key to get_api_key()
in tasks.py
on Line 45.
Add the bucket name to tasks.py
on Line 21. The images will be downloaded to this bucket.
Running the script
invoke query query_sample.json # to query Bing Image Search with the queries listed in query_sample.json
invoke print-query-results "mt. everest" # to print the results of a specific query
invoke generate-html # to generate an HTML containing the returned images
invoke download-images # to download the images to an Amazon S3 bucket
Useful links:
Bing Image Search API Pricing (for ~40K queries using an S3-tier instance, we paid about $160)
Bing Image Search API v7 query parameters (to change the returned response content)
Bing Image Search APIs v7 response objects (to understand the returned objects)