Replication Code for 'Moving On' - Investigating Inventors' Ethnic Origins Using Supervised Learning
This repository contains the code to replicate the paper Moving On - Investigating Inventors' Ethnic Origins Using Supervised Learning.
Repository Structure
Datasets that were created in this analysis can be found in the folder 00_data_and_model
. The trained and tuned LSTM classification model used for the analysis in this paper is stored in this folder as well and can be accessed under 00_data_and_model/model/name_origin_lstm.h5
. The folder 01_create_training_dataset
contains replication files used to construct the dataset of labeld names used to train the LSTM classification model. 02_model_training
features the code to train the LSTM classifier. Lastly, the code for the descriptive analysis (using a random subsample of the paper'sb dataset) can be found in the folder 03_inventor_composition_analysis
Dependencies
Python (3.7)
- joblib==1.0.1
- matplotlib==3.3.1
- numpy==1.19.2
- pandas==1.1.3
- pyreadr==0.3.5
- scikit-learn==0.23.2
- scipy==1.4.1
- tensorflow==2.2.0
- xgboost==0.90
Installing a virtual environment using the environment.yml
or requirements.txt
files is recommended.
R (4.0.1)
- tidyverse
- data.table
- reticulate
- tensorflow
- keras
- stringi
- jsonlite
- countrycode
- viridis
References & Contact
Niggli, M. (2022), 'Moving On' -- Investigating Inventors' Ethnic Origins Using Supervised Learning, arXiv:2201.00578
If you have questions, please contact [email protected].