Effective Testing for Machine Learning Projects
Code for PyData Global 2021 Presentation by @edublancas. Slides available here.
The project is developed using Ploomber; check it out! :)
If you have questions, ping me on Slack.
Blog post series
Follow @ploomber on Twitter, or subscribe to our newsletter for more amazing content!
Organization
The talk describes five stages of testing, from the most basic one to the most robust. The idea is to make progress and add more robust tests continuously. You can navigate through the branches of this repository to see how each time, it becomes more robust as we add more tests and modularize the code. Here are the links for each level:
- Smoke testing (1-smoke-testing)
- Integration and unit testing (2-integration-and-unit)
- Variable distributions and inference pipeline (3-distribution-and-inference)
- Training-serving skew (4-train-serve-skew)
- Model quality (5-model-quality)
Tests are run automatically on each push using GitHub Actions; you can see the configuration file at .github/workflows/ci.yml
Setup
# get the code
git clone https://github.com/edublancas/ml-testing
# move to one of the branches
git checkout branch-name
# example
git checkout 1-smoke-testing
# install dependencies
# conda
conda env create -f environment.yml
# pip
pip install -r requirements.txt
# build the pipeline
ploomber build
# run unit tests (added on level 2)
pytest