A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Overview

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Project homepage

Requirements to use the cookiecutter template:


  • Python 2.7 or 3.5+
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

asciicast

New version of Cookiecutter Data Science


Cookiecutter data science is moving to v2 soon, which will entail using the command ccds ... rather than cookiecutter .... The cookiecutter command will continue to work, and this version of the template will still be available. To use the legacy template, you will need to explicitly use -c v1 to select it. Please update any scripts/automation you have to append the -c v1 option (as above), which is available now.

The resulting directory structure


The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Contributing

We welcome contributions! See the docs for guidelines.

Installing development requirements


pip install -r requirements.txt

Running the tests


py.test tests
Comments
  • docker support

    docker support

    Hi, Any chance someone can add docker support + sql docker support like in this cookiecutter django project?

    The benefits are:

    1. reproducible environment for running the code -> easier to deply
    2. reproducible database (if needed)

    I am new to Docker&Cookie-cutter, otherwise I would do this myself.

    needs-option 
    opened by chananshgong 22
  • Notebooks from environments cannot find src

    Notebooks from environments cannot find src

    Related to #143 #76

    Configuration WSL

    Name Version Build Channel alabaster 0.7.12 pypi_0 pypi arrow 0.13.2 py36_0 conda-forge asn1crypto 0.24.0 py36_1003 conda-forge attrs 19.1.0 py_0 conda-forge awscli 1.16.164 pypi_0 pypi babel 2.6.0 pypi_0 pypi backcall 0.1.0 py_0 conda-forge binaryornot 0.4.4 py_1 conda-forge bleach 3.1.0 py_0 conda-forge botocore 1.12.154 pypi_0 pypi bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2019.3.9 hecc5488_0 conda-forge certifi 2019.3.9 py36_0 conda-forge cffi 1.12.3 py36h8022711_0 conda-forge chardet 3.0.4 py36_1003 conda-forge click 7.0 py_0 conda-forge colorama 0.3.9 pypi_0 pypi conda 4.6.14 py36_0 conda-forge conda-env 2.6.0 1 conda-forge cookiecutter 1.6.0 py36_1000 conda-forge coverage 4.5.3 pypi_0 pypi cryptography 2.6.1 py36h72c5cf5_0 conda-forge cryptography-vectors 2.6.1 py_0 conda-forge cycler 0.10.0 pypi_0 pypi dbus 1.13.6 he372182_0 conda-forge decorator 4.4.0 py_0 conda-forge defusedxml 0.5.0 py_1 conda-forge docutils 0.14 pypi_0 pypi entrypoints 0.3 py36_1000 conda-forge expat 2.2.5 hf484d3e_1002 conda-forge flake8 3.7.7 pypi_0 pypi fontconfig 2.13.1 he4413a7_1000 conda-forge freetype 2.10.0 he983fc9_0 conda-forge future 0.17.1 py36_1000 conda-forge gettext 0.19.8.1 hc5be6a0_1002 conda-forge glib 2.58.3 hf63aee3_1001 conda-forge gmp 6.1.2 hf484d3e_1000 conda-forge gst-plugins-base 1.14.4 hdf3bae2_1001 conda-forge gstreamer 1.14.4 h66beb1c_1001 conda-forge icu 58.2 hf484d3e_1000 conda-forge idna 2.8 py36_1000 conda-forge imagesize 1.1.0 pypi_0 pypi ipykernel 5.1.1 py36h24bf2e0_0 conda-forge ipython 7.5.0 py36h24bf2e0_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.4.2 py_0 conda-forge isort 4.3.20 py36_0 conda-forge jedi 0.13.3 py36_0 conda-forge jinja2 2.10.1 py_0 conda-forge jinja2-time 0.2.0 py_2 conda-forge jmespath 0.9.4 pypi_0 pypi jpeg 9c h14c3975_1001 conda-forge jsonschema 3.0.1 py36_0 conda-forge jupyter 1.0.0 py_2 conda-forge jupyter-contrib-core 0.3.3 pypi_0 pypi jupyter-contrib-nbextensions 0.5.1 pypi_0 pypi jupyter-highlight-selected-word 0.2.0 pypi_0 pypi jupyter-latex-envs 1.4.6 pypi_0 pypi jupyter-nbextensions-configurator 0.4.1 pypi_0 pypi jupyter_client 5.2.4 py_3 conda-forge jupyter_console 6.0.0 py_0 conda-forge jupyter_contrib_core 0.3.3 py_2 conda-forge jupyter_core 4.4.0 py_0 conda-forge jupyter_highlight_selected_word 0.2.0 py36_1000 conda-forge jupyter_latex_envs 1.4.4 py36_1000 conda-forge jupyterlab 0.35.6 py36_0 conda-forge jupyterlab_server 0.2.0 py_0 conda-forge jupyterthemes 0.20.0 pypi_0 pypi kiwisolver 1.0.1 pypi_0 pypi lesscpy 0.13.0 pypi_0 pypi libedit 3.1.20170329 hf8c457e_1001 conda-forge libffi 3.2.1 he1b5a44_1006 conda-forge libgcc-ng 8.2.0 hdf63c60_1
    libiconv 1.15 h516909a_1005 conda-forge libpng 1.6.37 hed695b0_0 conda-forge libsodium 1.0.16 h14c3975_1001 conda-forge libstdcxx-ng 8.2.0 hdf63c60_1
    libuuid 2.32.1 h14c3975_1000 conda-forge libxcb 1.13 h14c3975_1002 conda-forge libxml2 2.9.9 h13577e0_0 conda-forge libxslt 1.1.32 h4785a14_1002 conda-forge lxml 4.3.0 pypi_0 pypi markupsafe 1.1.1 py36h14c3975_0 conda-forge matplotlib 3.0.2 pypi_0 pypi mccabe 0.6.1 pypi_0 pypi mistune 0.8.4 py36h14c3975_1000 conda-forge nb_conda 2.2.1 py36_2 conda-forge nb_conda_kernels 2.2.2 py36_0 conda-forge nbconvert 5.5.0 py_0 conda-forge nbformat 4.4.0 py_1 conda-forge nbstripout 0.3.5 py_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge nodejs 11.14.0 he1b5a44_1 conda-forge notebook 5.7.8 py36_0 conda-forge numpy 1.16.0 pypi_0 pypi openssl 1.1.1b h14c3975_1 conda-forge packaging 19.0 pypi_0 pypi pandoc 2.7.2 0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.4.0 py_0 conda-forge pcre 8.41 hf484d3e_1003 conda-forge pexpect 4.7.0 py36_0 conda-forge pickleshare 0.7.5 py36_1000 conda-forge pip 19.1.1 pypi_0 pypi ply 3.11 pypi_0 pypi poyo 0.4.2 py_0 conda-forge prometheus_client 0.6.0 py_0 conda-forge prompt_toolkit 2.0.9 py_0 conda-forge pthread-stubs 0.4 h14c3975_1001 conda-forge ptyprocess 0.6.0 py_1001 conda-forge pyasn1 0.4.5 pypi_0 pypi pycodestyle 2.5.0 pypi_0 pypi pycosat 0.6.3 py36h14c3975_1001 conda-forge pycparser 2.19 py36_1 conda-forge pyflakes 2.1.1 pypi_0 pypi pygments 2.4.0 py_0 conda-forge pyopenssl 19.0.0 py36_0 conda-forge pyparsing 2.3.1 pypi_0 pypi pyqt 5.9.2 py36hcca6a23_0 conda-forge pyrsistent 0.15.2 py36h516909a_0 conda-forge pysocks 1.7.0 py36_0 conda-forge python 3.6.7 h381d211_1004 conda-forge python-dateutil 2.8.0 py_0 conda-forge python-dotenv 0.10.2 pypi_0 pypi pytz 2019.1 pypi_0 pypi pyyaml 3.13 pypi_0 pypi pyzmq 18.0.1 py36hc4ba49a_1 conda-forge qt 5.9.7 h52cfd70_1 conda-forge qtconsole 4.4.4 py_0 conda-forge readline 7.0 hf8c457e_1001 conda-forge requests 2.22.0 py36_0 conda-forge rsa 3.4.2 pypi_0 pypi ruamel_yaml 0.15.71 py36h14c3975_1000 conda-forge s3transfer 0.2.0 pypi_0 pypi send2trash 1.5.0 py_0 conda-forge setuptools 41.0.1 py36_0 conda-forge simplegeneric 0.8.1 py_1 conda-forge sip 4.19.8 py36hf484d3e_1000 conda-forge six 1.12.0 py36_1000 conda-forge snowballstemmer 1.2.1 pypi_0 pypi sphinx 2.0.1 pypi_0 pypi sphinxcontrib-applehelp 1.0.1 pypi_0 pypi sphinxcontrib-devhelp 1.0.1 pypi_0 pypi sphinxcontrib-htmlhelp 1.0.2 pypi_0 pypi sphinxcontrib-jsmath 1.0.1 pypi_0 pypi sphinxcontrib-qthelp 1.0.2 pypi_0 pypi sphinxcontrib-serializinghtml 1.1.3 pypi_0 pypi sqlite 3.28.0 h8b20d00_0 conda-forge src 0.1.0 dev_0 terminado 0.8.2 py36_0 conda-forge testpath 0.4.2 py_1001 conda-forge tk 8.6.9 h84994c4_1001 conda-forge tornado 6.0.2 py36h516909a_0 conda-forge tqdm 4.31.1 pypi_0 pypi traitlets 4.3.2 py36_1000 conda-forge urllib3 1.24.3 py36_0 conda-forge wcwidth 0.1.7 py_1 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.33.4 py36_0 conda-forge whichcraft 0.5.2 py_1 conda-forge widgetsnbextension 3.4.2 py36_1000 conda-forge xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.3 h516909a_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge yaml 0.1.7 h14c3975_1001 conda-forge yapf 0.27.0 py_0 conda-forge zeromq 4.3.1 hf484d3e_1000 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge

    Steps to reproduce

    • Install Miniconda
    • Install Jupyter lab
    • Install cookiecutter
    • Install nb_conda_kernels
    • Create conda environment with conda create -n yourenvname python=3
    • Configure environment with
    conda activate yourenvname
    conda install ...
    
    • Return to base environment with conda deactivate
    • Create new project with cookiecutter https://github.com/drivendata/cookiecutter-data-science
    • Execute make data
    • Confirm success with
    make test_environment  
    python3 test_environment.py  
    Development environment passes all tests!  
    
    • Execute jupyter lab
    • In Jupyter Lab, navigate to myproject>notebooks
    • From jupyter lab launcher create a new notebook using the conda environment yourenvname
    • Execute first cell with
    # OPTIONAL: Load the "autoreload" extension so that code can change
    %load_ext autoreload
    
    # OPTIONAL: always reload modules so that as you change code in src, it gets loaded
    %autoreload 2
    
    from src.data import make_dataset
    

    Error

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    <ipython-input-1-f47036c946b3> in <module>
          5 get_ipython().run_line_magic('autoreload', '2')
          6 
    ----> 7 from src.data import make_dataset
    
    ModuleNotFoundError: No module named 'src'
    

    The error does not occur when using a python3 notebook instead of a notebook with the environment kernel.

    It appears that the project package is not recognized by the notebook because it is not installed in the environment kernel.

    I saw the recommendation to install jupyter in the environment, but that seems to be contrary to the design of jupyter lab/notebook and environments. One wants to have one jupyter install in the base environment so one can traverse all projects but still isolate notebooks within kernels.

    This issue plus the conversations on #164 #118 #83 suggest that environments are a source of confusion and complexity. It would be nice if there were a way to let the user choose the package manager then reference environment stuff encapsulated by package manager. That would make the maintenance problem easier rather than either trying to do complex conditional logic in the makefile or forcing the user to debug incorrect assumptions about their starting environment.

    In the meantime, perhaps someone could suggest a command to import src into an existing environment.

    opened by jraviotta 17
  • Use environment.yml mechanism for specifying conda dependencies

    Use environment.yml mechanism for specifying conda dependencies

    Anaconda has a yml-based specification file for indicating dependencies. We should use this instead of pip if we are using conda to manage virtual environments

    • Add a configuration-time question to choose between conda and virtualenv explicitly
    • Use an environment.yml to specify the conda dependencies
    • Install dependencies at venv-creation time, since we already have this info

    As an added bonus, this makes it a little easier to add pipenv support by extending the VIRTUALENV choices

    opened by hackalog 15
  • Workflow for working with figures and reports

    Workflow for working with figures and reports

    I just started using this cookiecutter and I'm wondering how people are using this directory structure in order to generate figures and reports.

    Here's what I'm doing currently:

    • do analysis and generate interesting figure, save them to /reports/figures/
    • write up final jupyter notebook report from within /notebooks/reports/, any reference to figures are going to be ../../reports/figures/fig.png
    • export the report as report.html and place in /reports/

    The issue now, is that when I view the report.html, the figures don't have the proper path. How are people getting around this?

    opened by ConstantinoSchillebeeckx 9
  • Compatibility of pack to create api driven projects

    Compatibility of pack to create api driven projects

    Hi there! Loved the project, this really reflects the maturity of data science projects and where we are standing. So good!

    I rise this issue as I was wondering if the current structure can be adapted to an api-driven project. This is, a project in which the analysis and data flow may be related to an api definition.

    If yes, what would it be? So we can document it (or point me out where it is) If not, why? Some books have recommended having an api flow for analysis and process so our results and analysis are available for our mates in engineering. Even allowing for an easy scale up.

    Thank you so much!

    folder-layout needs-discussion 
    opened by ricalanis 8
  • [WIP] New version with cleaner options

    [WIP] New version with cleaner options

    We've seen a lot of potential features for this where we need to handle forking paths gracefully. By default cookiecutter can't do this (see https://github.com/audreyr/cookiecutter/pull/848). It's been years, so we can't reasonably expect this to change upstream...

    This implements a monkey-patching workaround to enable this behavior. It introduces a couple of major changes, so here are my recommendations.

    Here are the big differences for a user:

    • we now need to run ccds <path to repo> instead of cookiecutter <path to repo>
    • the options and their defaults are changing
    • as a consequence of supporting more environments/dependency managers at setup time, we stop supporting them at execution time. this means that teams will have to pick one of each and stick to it across developers. I think this is pretty widespread already. (i.e. make create_environment will only support one of the options rather than multiple like it does now)

    Implementation details are:

    • Add setup.py to make this package installable and give it a CLI
    • add monkey_patch.py to patch the cookiecutter functions that we need to handle our use case
    • support a list of dictionaries in cookiecutter.json that let's a use pick an option and then sub-options
    • use post_gen_project.py to create the environment file based on a standard list of libraries. (#5)

    There is also work for a number of longstanding items in this branch as well:

    • we now support multiple storage providers (#120)
    • we allow user to choose package manager (#118)
    • we allow user to choose a dependency format
    • src is now {{ cookiecutter.module_name }} (#140)
    • remove test_environment.py which is just cruft IMO

    Done:

    • [x] tag current master as v1 so anyone relying on the current flow/structure can continue to use it easily
    • [x] implement the rails for the high priority items
    • [x] make comprehensive tests run on CI/CD to de-risk
    • [x] support multiple dependency formats (environment.yml, requirements.txt, pipenv)
    • [x] default pydata dependencies options
    • [x] add support for azure/google cloud
    • [x] add ccds command and make cookiecutter-data-science a proper package

    Remaining items

    Cookiecutter default structure

    • [ ] user supplied config files
    • [ ] revise generated python package boilerplate (make optional)
    • [ ] mkdocs in place of sphinx
    • [ ] add lint command to Makefile

    Cookiecutter options

    • [ ] add deon - WIP #244
    • [ ] add nbautoexport - WIP #244
    • [ ] formatting commands / configuration (e.g., black)
    • [ ] test suite configuration (e.g., pytest install and make commands)
    • [ ] add

    Infrastructure

    • [ ] tests passing on windows
    • [ ] release command and make PyPI release

    Docs

    • [ ] add documentation for new installation (pip install cookiecutter-data-science) and new initiation (ccds <path to repo>)
    • [ ] add table of options with links to the project documentation in our docs
    • [ ] update screencast and add screenshots ( #197 ) of the new flow
    • [ ] add options for
    opened by pjbull 7
  • Expand Setup Tests

    Expand Setup Tests

    Part of our Data Days for Good initiative was to work on this repo. One of the requested additions was fleshing out the setup tests a little bit more to test for what would happen if user input was introduced. This was mentioned by Isaac. I have separated out the fixture and put it in it's own file, which is discoverable by pytest and mentioned in the docs. I have also increased the scope of creating the temp directory to be a class scope so it doesn't have to create it and tear it down after every function call. Lastly I also test for certain conditions given user input.

    opened by dmitrypolo 7
  • Integration with dvc

    Integration with dvc

    Adding DVC support can benefit this framework in two ways:

    1. Provide version control backend for data files - including intermediate results, models, etc. As you know git is not the best solution for storing data files. Storing and versioning data is crucial for reproducibility.
    2. Serve as an easier (to learn) alternative to makefiles. DVC allows for tracking execution steps, dependencies between them, etc. All from CLI, no need to learn a new language (make).

    I think both projects can benefit and get more visibility from this. Let me know what you think.

    opened by shcheklein 6
  • Makefile stores S3 bucket name in plaintext!

    Makefile stores S3 bucket name in plaintext!

    How to reproduce:

    1. cookiecutter https://github.com/drivendata/cookiecutter-data-science
    2. Give it your bucket name
    3. Look at line 8 of the Makefile - there's your bucket name in plaintext.

    :-( 👎

    opened by znmeb 6
  • Added a rule to build the folders of data/

    Added a rule to build the folders of data/

    Hi,

    as the data folders are not included in the Github repository, they are not there when the project is cloned. But the Makefile contains commands related to /data (and scripts could). I included a rule for creating the missing folders.

    opened by PetitLepton 6
  • Provide a PyScaffold extension for a Data Science Project

    Provide a PyScaffold extension for a Data Science Project

    Besides Cookiecutter there is also PyScaffold which provides a standardised project scaffold for Python. Since version 3.0 PyScaffold also has an extension system and thus it would be possible to provide this cool template also easily for PyScaffold.

    I see several advantages over Cookiecutter:

    • Currently src seems to be a Python package which is not a good practice if you want to combine several Data Science projects which use this template. One would end up with packages overwriting each others.
    • PyScaffold is best for having a standard-compliant Python project scaffold. One could thus focus on providing additional folders and structure within this scaffold.
    • One would have a modern, descriptive way of configuration with the help of setup.cfg instead of setup.py.
    • PyScaffold would also allow to combine a Data Science structure with other extensions for instance having Tox support for unit testing, precommit for automised tasks etc.
    • PyScaffold allows updating scaffolds which is important in the fast changing world of Python packaging.
    needs-discussion 
    opened by FlorianWilhelm 5
  • make gsutil rsync recursive

    make gsutil rsync recursive

    Small change to allow gsutil to do rsync recursively, syncing all the folders.

    Context: We had some issues using it with gcloud, since make sync_data_up would not upload all the data folders. Adding the -r flag fixes this issue.

    opened by KBodolai 0
  • Fix v2 aws sync commands

    Fix v2 aws sync commands

    https://github.com/drivendata/cookiecutter-data-science/blob/b4c0c12243653c493c188239af14c835b9768fbc/%7B%7B%20cookiecutter.repo_name%20%7D%7D/Makefile#L62

    In v2, sync_data_up incorrectly lists the bucket as the source directory, rather than the local data/ folder. The order should be filpped.

    In addition, there is inconsistent use of environment variables. sync_data_down uses the templatized AWS profile from the cookiecutter form:

    https://github.com/drivendata/cookiecutter-data-science/blob/b4c0c12243653c493c188239af14c835b9768fbc/%7B%7B%20cookiecutter.repo_name%20%7D%7D/Makefile#L51

    while sync_data_up uses the PROFILE environment variable which is not set in the Makefile.

    https://github.com/drivendata/cookiecutter-data-science/blob/b4c0c12243653c493c188239af14c835b9768fbc/%7B%7B%20cookiecutter.repo_name%20%7D%7D/Makefile#L63

    bug 
    opened by chrisjkuch 0
  • Allow users to select mkdocs, sphinx, or none for code documentation

    Allow users to select mkdocs, sphinx, or none for code documentation

    This PR allows a ccds user to select mkdocs, sphinx, or none as their default code documentation tool.

    If sphinx is chosen, provides the output of sphinx-quickstart in /docs/ (this is the current behavior)

    If mkdocs is chosen, provides the output of mkdocs new {{ project_name }} in /docs/

    If none is chosen, provides no default documentation tool, but keeps an empty /docs/ folder.

    Notes

    Adding tests for this brings the number of configurations to 60. With the number of possible configurations continuing to grow exponentially, we may want to consider how we generate test configs and what is appropriate.

    enhancement 
    opened by chrisjkuch 0
  • pip 2020 resolver no longer valid

    pip 2020 resolver no longer valid

    Attempting to run a build on the latest v2 branch yields the following:

    option --use-feature: invalid choice: '2020-resolver' (choose from 'fast-deps', 'truststore', 'no-binary-enable-wheel-cache')
    

    Pip 22.3 removes that option since this resolver is now default.

    bug 
    opened by chrisjkuch 0
  • Optionally remove boilerplate in initial setup

    Optionally remove boilerplate in initial setup

    Closes #49

    Provides additional prompt to make boilerplate python module files and folders disappear, with the exception of __init__.py to make the module a python module.

    enhancement 
    opened by chrisjkuch 0
  • Work with an existing repo

    Work with an existing repo

    The current workflow assumes that the user has not created a directory for the project and does not already have a repo for it.

    I think I'm not alone in preferring to create the repo on GitHub, clone it, and then run ccds. I would be good to support that.

    Or at least provide instructions for how to work with an existing repo.

    v2-bug 
    opened by AllenDowney 1
Releases(v1)
  • v1(Mar 20, 2021)

    Snapshot of the legacy version of Cookiecutter Data Science that will work in perpetuity with the command:

    cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
    

    Breaking changes coming in #246 for cookiecutter version 2 🎉

    Source code(tar.gz)
    Source code(zip)
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 1, 2023
Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

Galaxy Project 1k Jan 8, 2023
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 3, 2022
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a

Spack 3.1k Dec 31, 2022
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer clusters to laptops.

Jeppe Dakin 62 Dec 8, 2022
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

null 267 Jan 1, 2023
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

PsychoPy 1.3k Dec 31, 2022
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

null 16 Jun 30, 2022
Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

datamol 276 Dec 19, 2022
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 6, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 5, 2023
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 915 Dec 29, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Dec 24, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Dec 30, 2022