Improving your data science workflows with

Kjell Wooding

Last update: Dec 23, 2022

Related tags

Data Analysis make_better_defaults

Overview

Make Better Defaults

Author: Kjell Wooding [email protected]

This is the git repo for Makefiles: One great trick for making your conda environments more managable. A Pydata Global 2021 talk given on October 28, 2021 by Kjell Wooding.

Getting Started

To get started, type "make".

To follow along, watch the video once it's posted.

To learn more about Easydata, the framework that generated this repo, see the Getting Started Guide.

The Tips

Use git and virtual environments. Always.
Good workflow trumps good tooling
Good workflow means not having to remember things
Use one virtual environment per git repo. Give them both the same name.
Maintain virtual environments as code.
Use Lockfiles: Separate "what you want" from "what you need".
Auto-document your workflow
Don't be afraid to "Nuke it from orbit"

The Implementation

See https://github.com/hackalog/make_better_defaults

Directory Structure

See Project Organization for details on how this project is organized on disk.

This project was built using Easydata, a python framework aimed at making your data science workflow reproducible.

You might also like...

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

1 Dec 9, 2021

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021

2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

1 Jan 1, 2022

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

2 May 26, 2022

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

1 Feb 11, 2022

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

4 Oct 13, 2022

Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

3 Oct 3, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

27 Dec 28, 2022

Improving your data science workflows with

Related tags

Overview

Make Better Defaults

Getting Started

The Tips

The Implementation

Directory Structure

You might also like...

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

2019 Data Science Bowl

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Fancy data functions that will make your life as a data scientist easier.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

Owner

Kjell Wooding

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

signac-flow - manage workflows with signac

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Orchest is a browser based IDE for Data Science.

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

Lale is a Python library for semi-automated data science.

Data Science Environment Setup in single line

Open source platform for Data Science Management automation