Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH

Overview

Crawling-CV-Conference-Papers

News

  • 2021-6-21 Support CVPR-2021

Download all CVPR-2021 papers in one click. Just set the local download directory in download_cvpr2021.py and run it! Don't forget to have your chrome driver ready (i.e., corresponding version to your Chrome browser)

  • 2021-6-20 Support continuation of downloading from where the program encounters interruption. (prevent re-downloading from scratch)

Introduction

Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH. It leverages selenium, a website testing framework to crawl the titles and pdf urls from the conference website, and download them one by one with some simple anti-anti-crawler tricks.

Websites for older conferences are not guaranteed to be bug-free, since this project is based on newest website structure.

Recommend to work with Mendeley. You will get a juicy academic corpus.

Currently only single-thread downloading is implemented. Therefore the downloading for thousands of papers would be slow (takes several hours). It is suggested that you run the script before bed and it would be finished when you get to work again :)

Multi-thread downloading will be coming soon!

Requirements

pip install selenium, slugify

Besides, downlowd chromedriver.exe from the link to any local path you favour.

Usage

To execute the crawler, you could run download.py or download.ipynb (Basically the same). Before the execution, some paths need to be set up, including:

conference = 'neurips'
conference_url = "https://papers.nips.cc/paper/2019" # the conference url to download papers from
chromedriver_path = '.../chromedriver.exe' # the chromedriver.exe path
root = './NeurIPS-2019-ALL' # file path to save the downloaded papers

Here are some conference url examples:

cvpr: https://openaccess.thecvf.com/CVPR2020 (CVPR 2020)
eccv: https://openaccess.thecvf.com/ECCV2018 (ECCV 2018) (changed in 2020)
eccv: https://www.ecva.net/papers.php (ECCV 2020) 
iccv: https://openaccess.thecvf.com/ICCV2019 (ICCV 2019)
icml: http://proceedings.mlr.press/v119/ (ICML 2020)
neurips: https://papers.nips.cc/paper/2020 (NeurIPS 2020)
iclr: https://openreview.net/group?id=ICLR.cc/2021/Conference (ICLR 2021)
siggraph: https://dl.acm.org/toc/tog/2020/39/4 (SIGGRAPH 2020)

Replace the url and the conference names with your choice.

If you want to crawl papers from other conference website, all you need to do is to write a retrieve function like the ones in retrieve_titles_urls_from_websites.py, to parse html code and retrieve the paper titles and pdf urls into two lists.

Others

Warnings: It is heard that crawling from conference websites might cause a banning of your IP (hasn't happened to me so far). Not sure of the risk.

Warnings: This project is for learning purpose only. Do not crawl the same website frequently, which will burden the server.

Welcome to submit a pull request if there is any bugs or if you would like to add support to other conferences!

Maintainer

Xiaoyang Huang

Email: [email protected]

You might also like...
code for paper
code for paper"3D reconstruction method based on a generative model in continuous latent space"

PyTorch implementation of 3D-VGT(3D-VAE-GAN-Transformer) This repository contains the source code for the paper "3D reconstruction method based on a g

Code for "Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions"

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions Codebase for the "Adversarial Motion Priors Make Good Substitutes for Com

Code for
Code for "Temporal Difference Learning for Model Predictive Control"

Temporal Difference Learning for Model Predictive Control Original PyTorch implementation of TD-MPC from Temporal Difference Learning for Model Predic

Python-Youtube-Downloader - An Open Source Python Youtube Downloader
Python-Youtube-Downloader - An Open Source Python Youtube Downloader

Python-Youtube-Downloader Hello There This Is An Open Source Python Youtube Down

Youtube-downloader-using-Python - Youtube downloader using Python

Youtube-downloader-using-Python Hii guys !! Fancy to see here Welcome! built by

AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Overview AkShare requires Python(64 bit) 3.7 or greater, aims to make fetch financial data as convenient as possible. Write less, get more! Documentat

A tool written in Python to download all Snapmaps content from a specific location.
A tool written in Python to download all Snapmaps content from a specific location.

snapmap-archiver A tool written in Python to download all Snapmaps content from a specific location.

The free and open-source Download Manager written in pure Python
The free and open-source Download Manager written in pure Python

The free and open-source Download Manager written in pure Python

FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .
FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM

Owner
Xiaoyang Huang
Xiaoyang Huang
Automatically download multiple papers by keywords in CVPR

Automatically download multiple papers by keywords in CVPR

null 46 Jun 8, 2022
the best video downloader for terminals (currently only compatible with Linux and Windows)

the best video downloader for terminals (currently only compatible with Linux and Windows)

Amaral 2 Oct 14, 2021
Fetch papers and metadata.

Fetch PubMed Central for open-access papers as well as Sci-Hub

null 4 Oct 31, 2022
Programmers-quest - Programmer's Quest! An open source MMO built on top of the Panda3D game engine and Astron server

Programmer's Quest! Programmer's Quest! The open source Python 3 2D MMORPG showc

Jordan Maxwell 5 Oct 7, 2022
python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

VVHai 2 Dec 21, 2021
GTK4 + Python tutorial with code examples

Taiko's GTK4 Python tutorial Wanna make apps for Linux but not sure how to start with GTK? This guide will hopefully help! The intent is to show you h

null 190 Jan 8, 2023
Write reproducible code for getting and processing ChEMBL

chembl_downloader Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download

Charles Tapley Hoyt 34 Dec 25, 2022
Code to scrape , download and upload to youtube daily

Youtube_Automated_Channel Code to scrape , download and upload to youtube daily INSTRUCTIONS Download the Github Repository Download and install Pytho

Atsiksdong 2 Dec 19, 2021
This repository contains code for a youtube-dl GUI written in PyQt.

youtube-dl-GUI This repository contains code for a youtube-dl GUI written in PyQt. It is based on youtube-dl which is a Video downloading script maint

M.Yasoob Ullah Khalid ☺ 191 Jan 2, 2023
Source code of paper: "HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration".

HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration Environments The code mainly requires the following

Intelligent Sensing, Perception and Computing Group 3 Oct 6, 2022