EarthGAN - Earth Mantle Surrogate Modeling
Can a surrogate model of the Earthโs Mantle Convection data set be built such that it can be readily run in a web-browser and produce high-fidelity results? We're trying to do just that through the use of a generative adversarial network -- we call ours EarthGAN. We are in active research.
See how EarthGAN currently works! Open up the Colab notebook and create results from the preliminary generator:
Progress updates, along with my thoughts, can be found in the devlog. The preliminary results were presented at VIS 2021 as part of the SciVis contest. See the paper on arXiv, here.
This is active research. If you have any thoughts, suggestions, or would like to collaborate, please reach out! You can also post questions/ideas in the discussions section.
Current Approach
We're leveraging the excellent work of Li et al. who have implemented a GAN for creating super-resolution cosmological simulations. The general method is in their map2map repository. We've used their GAN implementation as it works on 3D data. Please cite their work if you find it useful!
The current approach is based on the StyleGAN2 model. In addition, a conditional-GAN (cGAN) is used to produce results that are partially deterministic.
Setup
Works best if you are in a HPC environment (I used Compute Canada). Also tested locally in linux (MacOS should also work). If you run windows you'll have to do much of the environment setup and data download/preprocessing manually.
To reproduce data pipeline and begin training: *
-
Clone this repo -
clone https://github.com/tvhahn/EarthGAN.git
-
Create virtual environment. Assumes that Conda is installed when on a local computer.
-
HPC:
make create_environment
will detect HPC environment and automatically create environment frommake_hpc_venv.sh
. Tested on Compute Canada. Modifymake_hpc_venv.sh
for your own HPC cluster. -
Linux/MacOS: use command from Makefile - `make create_environment
-
-
Download raw data.
-
HPC: use
make download
. Will automatically detect HPC environment. -
Linux/MacOS: use
make download
. Will automatically download to appropriatedata/raw
directory.
-
-
Extract raw data.
- HPC: use
make download
. Will automatically detect HPC environment. Again, modify for your HPC cluster. - Linux/MacOS: use
make extract
. Will automatically extract to appropriatedata/raw
directory.
- HPC: use
-
Ensure virtual environment is activated.
conda activate earth
-
From root directory of
EarthGAN
, runpip install -e .
-- this will give the python scripts access to thesrc
folders. -
Create the processed data that will be used for training.
-
HPC: use
make data
. Will automatically detect HPC environment and create the processed data.๐ Note: You will have to modify themake_hpc_data.sh
in the./bash_scripts/
folder to match the requirements of your HPC environment -
Linux/MacOS: use
make data
.
-
-
Copy the processed data to the
scratch
folder if you're on the HPC. Modifycopy_processed_data_to_scratch.sh
in./bash_scripts/
folder. -
Train!
-
HPC: use
make train
. Again, modify for your HPC cluster. Not yet optimized for multi-GPU training, so be warned, it will be SLOW! -
Linux/MacOS: use
make train
.
-
* Let me know if you run into any problems! This is still in development.
Project Organization
โโโ Makefile <- Makefile with commands like `make data` or `make train`
โ
โโโ bash_scripts <- Bash scripts used in for training models or setting up environment
โ โโโ train_model_hpc.sh <- Bash/SLURM script used to train models on HPC (you will need to modify this to work on your HPC). Called with `make train`
โ โโโ train_model_local.sh <- Bash script used to train models locally. Called on with `make train`
โ
โโโ data
โ โโโ interim <- Intermediate data before we've applied any scaling.
โ โโโ processed <- The final, canonical data sets for modeling.
โ โโโ raw <- Original data from Earth Mantle Convection simulation.
โ
โโโ models <- Trained and serialized models, model predictions, or model summaries
โ โโโ interim <- Interim models and summaries
โ โโโ final <- Final, cononical models
โ
โโโ notebooks <- Jupyter notebooks. Generally used for explaining various components
โ โ of the code base.
โ โโโ scratch <- Rough-draft notebooks, of questionable quality. Be warned!
โ
โโโ references <- Data dictionaries, manuals, and all other explanatory materials.
โ
โโโ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
โ โโโ figures <- Generated graphics and figures to be used in reporting
โ
โโโ requirements.txt <- Recommend using `make create_environment`. However, can use this file
โ for to recreate environment with pip
โโโ envearth.yml <- Used to create conda environment. Use `make create_environment` when
โ on local compute
โ
โโโ setup.py <- makes project pip installable (pip install -e .) so src can be imported
โโโ src <- Source code for use in this project.
โ โโโ __init__.py <- Makes src a Python module
โ โ
โ โโโ data <- Scripts to download or generate data
โ โ โโโ make_dataset.py <- Script for making downsampled data from the original
โ โ โโโ data_prep_utils.py <- Misc functions used in data prep
โ โ โโโ download.sh <- Bash script to download entire Earth Mantle data set
โ โ โ (used when `make data` called)
โ โ โโโdownload.sh <- Bash script to extract all Earth Mantle data set files
โ โ from zip (used when `make extract` called)
โ โ
โ โโโ models <- Scripts to train models and then use trained models to make
โ โ โ predictions
โ โ โ
โ โ โโโ train_model.py
โ โ
โ โโโ visualization <- Scripts to create exploratory and results oriented visualizations
โ โโโ visualize.py
โ
โโโ LICENSE
โโโ README.md <- README describing project.