List of Data Science Cheatsheets to rule the world

Overview

Data Science Cheatsheets

List of Data Science Cheatsheets to rule the world.

Table of Contents


Business Science

Business Science Problem Framework (PDF)

Data Science with Python Workflow (PDF)

Data Science with R Workflow (PDF)

Python

Datacamp

Python Crash Course

Dataquest

Others

R

Datacamp

-xts (PDF)

RStudio

Math and Calculus

From @afshinea, @stat110 and @wzchen:

Big Data

Python

R

Machine Learning

Python

R

_ H2O (PDF)

Supervised Learning

From @afshinea:

Unsupervised Learning

From @afshinea:

Hacks, tricks and tips

From @afshinea:

Choosing the right model

Deep Learning

Neural Nets

R

Python

Keras (PDF)

From @afshinea:

SQL

Data Visualization

Python

  • Comprehensive Guide to Data Visualization in Python

R

Data Science in General and Others

By @ml874

Contributors:

Favio Vázquez

Comments
  • Create a clickable index for cheatsheets

    Create a clickable index for cheatsheets

    I think a very useful feature to have would be some clickable links that direct the user to specific portions of the cheatsheets - for example, if I wanted to peruse the cheatsheets for pandas specifically I would be able to click from a link at the very top of the page instead of having to scroll through the whole page to find it.

    opened by ballcap231 2
  • rename Begginers-Python-Cheat-Sheet.pdf

    rename Begginers-Python-Cheat-Sheet.pdf

    Can you rename https://github.com/FavioVazquez/ds-cheatsheets/blob/master/Python/Python_Crash_Course/Begginers-Python-Cheat-Sheet.pdf and https://github.com/FavioVazquez/ds-cheatsheets/blob/master/Python/Python_Crash_Course/Begginers-Python-Cheat-Sheet.png to Beginners-Python-Cheat-Sheet.{png,pdf}? See #22

    opened by 0xflotus 2
  • Add Regular Expressions for Python

    Add Regular Expressions for Python

    There is a regex cheatsheet for R but not for python, so maybe this one or this one would be appropriate.

    Thank you for an amazing resource, by the way

    opened by artem-mateush 2
  • Data Transformation with Dplyr

    Data Transformation with Dplyr

    Hello Favio,

    Thank you for creating this list!

    I've noticed that the image for Data Transformation with Dplyr is the same as the one of Data Import with readr, this PR fixes the url to the correct image.

    opened by euljr 1
  • Ignore .DS_Store

    Ignore .DS_Store

    Thank you for great cheatsheets collection!

    .DS_Store is a file that stores custom attributes of its containing folder, such as the position of icons or the choice of a background image. It shouldn't be in repository.

    opened by orsinium 1
  • core Visualization Principle

    core Visualization Principle

    this file has core details of visualization. check once and merge Read me file also updated with refl ink one more suggestion giving URL as ref in readme not efficiency check my way of ref, its easy and better

    opened by chandru1003 0
  • Added python regex (Dataquest ) and BSU's regression and clustering cheatsheets

    Added python regex (Dataquest ) and BSU's regression and clustering cheatsheets

    As suggested in #16 : Added the PDF and PNG, and updated Readme to reflect.

    Additionally :

    • Added Business Science University (BSU) Clustering and segmentation cheatsheet to Machine Learning > Unsupervised.
    • Added BSU's Regression cheatsheet to Machine Learning > Supervised.
    opened by shrysr 0
  • Regarding inclusion of probability and stats cheatsheets

    Regarding inclusion of probability and stats cheatsheets

    Hi.

    As we have seen a lot of requests regarding cheatsheets for useful probability and stats concepts hence this issue. I suggest to include Seeing Theory's mini book. It is very readable and written in a very precise manner.

    opened by sayakpaul 0
  • Added Business Science + RStudio submodules and updated Readme

    Added Business Science + RStudio submodules and updated Readme

    Business Science submodule:

    • Added the Business Science repo as a submodule referencing the master branch
    • Retained the existing img folder for screenshots on README
    • Updated README links of PDF's to point towards the PDF's in the Business Science repo.

    RStudio submodule

    • Added RStudio's cheatsheet repo as a submodule referencing the master branch
    • Updated PDF links in README pointing to PDF's in RStudio's repo
    • Updated png links in README pointing to the pngs folder in the RStudio repo

    This is in line with the discussion in issue #8, and should hopefully enable the latest cheatsheets to be referenced via README or via fetching the latest commit in the submodules.

    One disadvantage is that RStudio's cheatsheet repo is a relatively large download.

    opened by shrysr 0
  • Add Engineering Practices for Data Scientists Cheatsheet

    Add Engineering Practices for Data Scientists Cheatsheet

    Hey @FavioVazquez, I had a cheatsheet that might be interesting for the folks looking at this repo so I decided to do a PR.

    Let me know what you think.

    opened by skogstrom 0
  • o

    o

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by Thepartner168 0
  • Additional Cross Validate function for SKlearn v0.20.3

    Additional Cross Validate function for SKlearn v0.20.3

    For folks using SciKit Learn version 0.20.3 the Cross Validation function from (ds-cheatsheets/Python/Datacamp/scikit-learn.pdf) should be from sklearn.model_selection.cross_validate if I'm not mistaken. I was running a linear regression algorithm using sklearn v0.20.3 and the sklearn.cross_validation.cross_val_score was not recognized but the aforementioned function was and my program ran with no issues.

    Perhaps I am wrong, just a friendly addition to the cheatsheet :P

    opened by techieslayj 0
Owner
Favio André Vázquez
Physicist and computational engineer. I have a passion for science, philosophy, programming, and lacanian psychoanalysis. Working on cosmology and big data.
Favio André Vázquez
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

Emmanuel ASIFIWE 1 Nov 23, 2021
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 2, 2023
My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

kNN-vs-RFR My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data In many areas, rental bikes have been launched to

null 1 Oct 28, 2021
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

null 1.3k Jan 8, 2023
Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Call of Duty World League: Search & Destroy Outcome Predictions Growing up as an avid Call of Duty player, I was always curious about what factors led

Brett Vogelsang 2 Jan 18, 2022
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

null 2 Jan 22, 2022
🔬 A curated list of awesome machine learning strategies & tools in financial market.

?? A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 7, 2023
Data from "Datamodels: Predicting Predictions with Training Data"

Data from "Datamodels: Predicting Predictions with Training Data" Here we provid

Madry Lab 51 Dec 9, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Jan 5, 2023
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 7, 2023
Python-based implementations of algorithms for learning on imbalanced data.

ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn

DIAL | Notre Dame 220 Dec 13, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Dec 29, 2022