G-Research-Crypto-Competition
Project for passing the ML exam. Dataset took from the competition on the kaggle https://www.kaggle.com/c/g-research-crypto-forecasting
In this repository you can find an example of using SnakeMake to solve ML tasks.
The workflow automation system Snakemake is a tool for creating reproducible and scalable pipelines. Pipelines are described using a human-readable Python-based language. They can be easily scaled for server, cluster, network and cloud environments without having to change the workflow definition. Finally, Snakemake workflows can include a description of the necessary software that will be automatically deployed in any runtime environment.
Getting Started:
- Create virtual environment for development:
$ conda env create -f devenv.yaml
- Activate virtual environment:
$ conda activate G-Research-Crypto-Competition
- Start snakemake pipelines with 8 cores:
$ snakemake --cores 8
Project Organization
├── README.rst <- The top-level readme for developers. │ ├── data │ ├── external <- Data from third party sources. │ ├── interim <- Intermediate data that has been transformed. │ ├── processed <- The final, canonical data sets for modeling. │ └── raw <- The original, immutable data dump. │ ├── docs <- Technical documentation. │ ├── models <- Trained and serialized models, model predictions, │ or model summaries. │ ├── notebooks <- Jupyter notebooks. Naming convention is a number │ (for ordering), the creator's initials, and a │ short `-` delimited description, e.g. │ `001-jqp-initial-data-exploration`. │ ├── references <- Data dictionaries, manuals, and all other │ explanatory materials. │ ├── reports <- Generated analysis as HTML, PDF, LaTeX, etc. │ └── figures <- Generated graphics and figures to be used in │ reporting. │ ├── devenv.yaml <- The environment file for reproducing the analysis │ environment, e.g. generated with │ `conda env export --from-history > devenv.yaml` │ ├── src <- Source code for use in this project. │ ├── __init__.py <- Makes src a Python package. │ │ │ ├── data <- Scripts to download or generate data. │ │ │ ├── features <- Scripts to turn raw data into features for │ │ modeling. │ │ │ ├── models <- Scripts to train models and then use trained │ │ models to make predictions. │ │ │ └── reports <- Scripts to create exploratory reports and results │ oriented visualizations. │ ├── workflow <- Snakemake workflow storage. │ ├── envs <- Conda environments for snakemake rules. │ │ └── default.yaml │ │ │ ├── rules <- Rules as modules. │ │ └── clean.smk │ │ │ ├── scripts <- Support functions for using in snakemake workflow. │ │ │ ├── config.yaml <- Parameters for workflow in yaml format. │ │ │ └── Snakefile <- Entrypoint of the workflow (it will be │ automatically discovered when running snakemake │ from the root of above structure). │ └── .env.example <- Example of file for environment variables, like MinIO access or Postgresql credentials. It is necessary to create an `.env` file based on it.