Machine Learning Research
1. Project Topic
1.1. Exsiting research
-
Benmark:
-
ACL anthology for NLP papers:
-
Online proceedings of major ML conferences:
- NeurIPS
- ICML, ICLR, CVPR, EMNLP, NAACL
-
Online preprint servers:
-
Top paper menioned on Twitter:
-
Others:
1.2. Datasets and Tasks
-
Huggingface Datasets:
-
Kaggle has many datasets, though some of them are too small for Deep Learning:
-
SOTA NLP:
https://paperswithcode.com/sota
-
A small list of well-known standard datasets for common NLP tasks: https://machinelearningmastery.com/datasets-natural-language-processing/
-
An alphabetical list of free or public domain text datasets:
-
Wikipedia has a list of machine learning text datasets, tabulated with useful information such as dataset size: https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research#Text_data
-
Datahub has lots of datasets, though not all of it is Machine Learning focused:
-
Microsoft Research has a collection of datasets (look under the ‘Dataset directory’ tab): https://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fprojects%2Fdata-science-initiative%2F%20datasets.aspx#!dataset-directory
-
A script to search arXiv papers for a keyword, and extract important information such as performance metrics on a task:
-
Datasets for machine translation:
-
Syntactic corpora for many languages:
2. Project Advice
Processing Data
-
StanfordNLP: a Python library providing tokenization, tagging, parsing, and other capabilities:
-
Other software from the Stanford NLP group: http://nlp.stanford.edu/software/index.shtml
-
NLTK, a lightweight Natural Language Toolkit package in Python: http://nltk.org/
-
spaCy, another Python package that can do preprocessing, but also includes neural models (e.g. Language Models):
3. Top Tiers ML&AI Conferences
-
Site
-
NeurIPS: Neural Information Processing Systems (formerly abbreviated NIPS). NeurIPS has gotten huge over the past few years as AI has become so important. Has a focus on neural networks, but not exclusively.
-
ICML: International Conference on Machine Learning. Has a general machine learning focus.
-
ICLR: International Conference on Learning Representations. ICLR was really the first conference focused on deep learning. It’s called “learning representations” because the motivation behind deep learning is to automatically learn higher-level features, or representations, that summarize data in useful ways. Deep Learning describes the structure of our current best solution to the problem of learning these representations.
-
AAAI: Association for the Advancement of Artificial Intelligence. AAAI is a little more applications focused, and a little less theoretical than some of the other AI conferences.
-
CVPR: Computer Vision and Pattern Recognition.
-
ICCV: International Conference on Computer Vision.