Efficient Python Tricks and Tools for Data Scientists

Overview

View on GitHub View Book

Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Why for data scientist? Because Python has a wide application. The Python tools used in the data science field are not necessarily useful for other fields such as web development.

The goal of this book is to spread the awareness of efficient ways to do Python. They include:

  • efficient built-in methods and libraries to work with iterator, dictionary, function, and class
  • efficient methods to work with popular data science libraries such as pandas and NumPy
  • efficient tools to incorporate in a data science project
  • efficient tools to incorporate in any project
  • efficient tools to work with Jupyter Notebook.

image

What Should You Expect From This Book?

This book expects you to have some basic knowledge of Python and data science.

You should also expect bite-size code snippets for each section. This will allow you to obtain multiple pieces of knowledge in fewer than one minute. I included the link to the resources for every tools introduced in case you want to explore them further.

About This Book

This book includes more than 300 tips and tools I have shared daily on my website, Data Science Simplified. If you want to get the updated of new tips on your mailbox, you can subscribe to my website.

About The Author

image

Khuyen Tran is a data science writer at NVIDIA and a data science intern at Ocelot Consulting. She wrote over 150 data science articles with 100k+ views per month on Towards Data Science. She also wrote 300+ daily data science tips at Data Science Simplified. Her current mission is to make open-source more accessible to the data science community.

Supporters

Special thanks to these supporters for supporting this project!

You might also like...
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer clusters to laptops.

Datamol is a python library to work with molecules
Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

PennyLane is a cross-platform Python library for differentiable programming of quantum computers.
PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications.

Scientific Computational Imaging COde (SCICO) SCICO is a Python package for solving the inverse problems that arise in scientific imaging applications

Float2Binary - A simple python class which finds the binary representation of a floating-point number.

Float2Binary A simple python class which finds the binary representation of a floating-point number. You can find a class in IEEE754.py file with the

A simple computer program made with Python on the brachistochrone curve.
A simple computer program made with Python on the brachistochrone curve.

Brachistochrone-curve This is a simple computer program made with Python on the brachistochrone curve. I decided to write it after a physics lesson on

Comments
Owner
Khuyen Tran
Data Scientist | Data Science Writer at NVIDIA & Towards Data Science
Khuyen Tran
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 1, 2023
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 6, 2023
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 5, 2021
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy

Galaxy Project 1k Jan 8, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 5, 2023
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 3, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Dec 24, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022