Visualize ICLR 2022 OpenReview Data
ICLR 2022 Paper submission analysis from https://openreview.net/group?id=ICLR.cc/2022/Conference
Requirements
pip install wordcloud nltk pandas imageio selenium tqdm
download nltk packages
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('stopwords')
if you got anything wrong when calling webdriver.Edge('msedgedriver.exe')
, you can
-
Delete
msedgedriver.exe
since it may only work on my computer (Windows) -
Install Microsoft Edge (Chromium): Ensure you have installed Microsoft Edge (Chromium). To confirm that you have Microsoft Edge (Chromium) installed, go to
edge://settings/help
in the browser, and verify the version number is Version 75 or later. -
Download Microsoft Edge Driver:
- Go to
edge://settings/help
to get the version of Edge.
- Go to
-
Navigate to the Microsoft Edge Driver downloads page and download the driver that matches the Edge version number.
From https://stackoverflow.com/questions/63529124/how-to-open-up-microsoft-edge-using-selenium-and-python
Crawl Data
- Run
crawl_paperlist.py
to crawl the list of papers (~0.5h).
Paper List (3,407 submission in total
crawl_paperlist.py
only crawls 3,000 papers, but it has 3,407 in total. The full paper list are in follows:
Visualization
Keywords Frequency
The top 50 common keywords (uncased) and their frequency:
Keywords Cloud
The word clouds formed by keywords of submissions show the hot topics including deep learning, reinforcement learning, representation learning, graph neural network, etc.
Title Keywords Frequency
The top 50 common title keywords (uncased) and their frequency:
Title Keywords Cloud
The word clouds formed by keywords of submission titles:
Acknowledgment
Inspired by this repo: https://github.com/evanzd/ICLR2021-OpenReviewData