Jupyter notebook and datasets from the pandas Q&A video series

Overview

Python pandas Q&A video series

Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas.

Jupyter Notebooks

Videos (playlist)

  1. What is pandas? (Introduction to the Q&A series) (6:24)
  2. How do I read a tabular data file into pandas? (8:54)
  3. How do I select a pandas Series from a DataFrame? (11:10)
  4. Why do some pandas commands end with parentheses (and others don't)? (8:45)
  5. How do I rename columns in a pandas DataFrame? (9:36)
  6. How do I remove columns from a pandas DataFrame? (6:35)
  7. How do I sort a pandas DataFrame or a Series? (8:56)
  8. How do I filter rows of a pandas DataFrame by column value? (13:44)
  9. How do I apply multiple filter criteria to a pandas DataFrame? (9:51)
  10. Your pandas questions answered! (9:06)
  11. How do I use the "axis" parameter in pandas? (8:33)
  12. How do I use string methods in pandas? (6:16)
  13. How do I change the data type of a pandas Series? (7:28)
  14. When should I use a "groupby" in pandas? (8:24)
  15. How do I explore a pandas Series? (9:50)
  16. How do I handle missing values in pandas? (14:27)
  17. What do I need to know about the pandas index? (Part 1) (13:36)
  18. What do I need to know about the pandas index? (Part 2) (10:38)
  19. How do I select multiple rows and columns from a pandas DataFrame? (21:46)
  20. When should I use the "inplace" parameter in pandas? (10:18)
  21. How do I make my pandas DataFrame smaller and faster? (19:05)
  22. How do I use pandas with scikit-learn to create Kaggle submissions? (13:25)
  23. More of your pandas questions answered! (19:23)
  24. How do I create dummy variables in pandas? (13:13)
  25. How do I work with dates and times in pandas? (10:20)
  26. How do I find and remove duplicate rows in pandas? (9:47)
  27. How do I avoid a SettingWithCopyWarning in pandas? (13:29)
  28. How do I change display options in pandas? (14:55)
  29. How do I create a pandas DataFrame from another object? (14:25)
  30. How do I apply a function to a pandas Series or DataFrame? (17:57)
  31. Bonus: How do I use the MultiIndex in pandas? (25:00)
  32. Bonus: How do I merge DataFrames in pandas? (21:48)
  33. Bonus: 4 new time-saving tricks in pandas (14:50)
  34. Bonus: 5 new changes in pandas you need to know about (20:54)
  35. Bonus: My top 25 pandas tricks (27:37)
  36. Bonus: Data Science Best Practices with pandas (PyCon 2019) (1:44:16)
  37. Bonus: Your pandas questions answered! (webcast) (1:56:01)

Datasets

Filename Description Raw File Original Source Other
chipotle.tsv Online orders from the Chipotle restaurant chain bit.ly/chiporders The Upshot Upshot article
drinks.csv Alcohol consumption by country bit.ly/drinksbycountry FiveThirtyEight FiveThirtyEight article
imdb_1000.csv Top rated movies from IMDb bit.ly/imdbratings IMDb Web scraping script
stocks.csv Small dataset of stock prices bit.ly/smallstocks DataCamp
titanic_test.csv Testing set from Kaggle's Titanic competition bit.ly/kaggletest Kaggle Data dictionary
titanic_train.csv Training set from Kaggle's Titanic competition bit.ly/kaggletrain Kaggle Data dictionary
u.data Movie ratings by MovieLens users bit.ly/movielensdata GroupLens Data dictionary
u.item Movie information from MovieLens bit.ly/movieitems GroupLens Data dictionary
u.user Demographic information about MovieLens users bit.ly/movieusers GroupLens Data dictionary
ufo.csv Reports of UFO sightings from 1930-2000 bit.ly/uforeports National UFO Reporting Center Web scraping script
Comments
  • How to add a new time column with new calculated time data?

    How to add a new time column with new calculated time data?

    Hi Just Markham,

    I want to add +8 hours to all the existing times and save it to a new column in the DataFrame. I only figured out how to create a new column. But I don't know how to add my delta time to the new time data.

    Electronically yours, Mr. Biggles

    from datetime import timedelta
    import pandas as pd
    
    ufo = pd.read_csv('http://bit.ly/uforeports')
    
    print("")
    ufo['Time'] = pd.to_datetime(ufo.Time)
    print(ufo)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    print("ufo.dtypes")
    print(ufo.dtypes)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    # Add a new column with new data.
    
    delta = timedelta(hours=-8)
    # calc_utc_time = data_cell + delta
    
    ufo['utc_time'] = ufo.Time
    print(ufo)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    print("ufo.dtypes")
    print(ufo.dtypes)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    opened by mrbiggleswirth 2
  • add custom groupby function trick

    add custom groupby function trick

    First, thank you very much for the video! Also, your YT channel is amazing and has helped me a lot. I always recommend it!

    I have found out in work that is possible to apply custom function to groupby operation in Pandas.

    I added a snippet code to do it in the trick 19.

    I am the one who has left a comment on your YT video with this trick \m/

    opened by joaopcnogueira 2
  • Use raw string for regex

    Use raw string for regex

    Hi :wave: - couple of nitpicks:

    • if you're using a regular expression in .str.replace, it's probably safer to make it a raw string;
    • there's an unnecessary set of brackets in the call to pd.concat

    (found via nqa pyupgrade pandas.ipynb --py36-plus --nbqa-mutate)

    opened by MarcoGorelli 1
  • Converting pandas.Timestamp into date only by removing timestamp

    Converting pandas.Timestamp into date only by removing timestamp

    You've made a good video but you missed this one thing to explain. Actually the situation is in below code.

    class NumpyEncoder(DjangoJSONEncoder):
        def default(self, obj):
            if isinstance(obj, np.ndarray):
                return obj.tolist()
            elif isinstance(obj, pd.Timestamp):
                return str(obj)
    
        return super().default(obj)
    
    with pd.ExcelFile(default_storage.open(path)) as xls:
        sheet_names = xls.sheet_names
        sheets_df = [pd.read_excel(xls, sheet_name, header=None)
                     for sheet_name in sheet_names]
        worksheets_columns = [list(sheet_df.columns) for sheet_df in sheets_df]
        worksheets_data = [json.loads(json.dumps(
            sheet_df.fillna('').applymap(str).to_numpy(), cls=NumpyEncoder))
                           for sheet_df in sheets_df]
    

    I need to convert pandas.Timestamp in the situation above to date only. Currently shows timestamp also.

    Thanks in advance.

    opened by shksajawal 1
  • How to remove the text Timestamp when saving DataFrame as Dictionary?

    How to remove the text Timestamp when saving DataFrame as Dictionary?

    Hi Sensei Markham, @justmarkham

    I need to save my DataFrame as Dictionary in order to push my data into my database using Flask SQLAlchemy. But even when I change the DataType for column Time from Datetime to Object. My dictionary variable still contains the text Timestamp, how can I remove this text and parenthesis characters from my dictionary?

    ufo_dict_2

    {0: {'City': 'Ithaca', 'Colors Reported': nan, 'Shape Reported': 'TRIANGLE', 'State': 'NY', 'Time': Timestamp('1930-06-01 22:00:00')}}

    .

    x13_change_DataType_for_Series.py

    import pandas as pd
    
    #_______________________________________________________________________________
    
    ufo = pd.read_csv('http://bit.ly/uforeports')
    
    print(ufo.head())
    print("")
    print(type(ufo))
    print("")
    print("ufo.dtypes")
    print(ufo.dtypes)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    #_______________________________________________________________________________
    
    # Convert Time datatype from Object to Datetime.
    
    ufo['Time'] = pd.to_datetime(ufo.Time)
    
    print(ufo.head())
    print("")
    print(type(ufo))
    print("")
    print("ufo.dtypes")
    print(ufo.dtypes)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    #_______________________________________________________________________________
    
    # Convert DataFrame to Dictionary.
    
    ufo_dict_1 = ufo.to_dict('index')
    
    print("ufo_dict_1")
    print("")
    print(dict(list(ufo_dict_1.items())[0:3]))
    print("")
    print(type(ufo_dict_1))
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    #_______________________________________________________________________________
    
    # Convert Time datatype from Datetime to Object.
    
    ufo['Time'] = ufo.Time.astype(object)
    
    print(ufo.head())
    print("")
    print(type(ufo))
    print("")
    print("ufo.dtypes")
    print(ufo.dtypes)
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    #_______________________________________________________________________________
    
    # Convert DataFrame to Dictionary.
    
    ufo_dict_2 = ufo.to_dict('index')
    
    print("ufo_dict_2")
    print("")
    print(dict(list(ufo_dict_2.items())[0:3]))
    print("")
    print(type(ufo_dict_2))
    
    print("")
    print("_______________________________________________________________________")
    print("")
    
    
    opened by mrbiggleswirth 1
  • Notebook in text format

    Notebook in text format

    Hi,

    I just created my account in Github, pardon the simple question.

    Is it possible to download the notebooks that you have provided in text format?

    Here's the link of the one I'm looking at: http://nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb#25.-How-do-I-work-with-dates-and-times-in-pandas%3F-%28video%29

    I'm planning to work with the notebooks without having to type the code one by one. If it's possible to do this please teach me how! Thank you!

    opened by vinertia 1
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 294 Feb 12, 2021
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 419 Feb 11, 2021
Render Jupyter notebook in the terminal

jut - JUpyter notebook Terminal viewer. The command line tool view the IPython/Jupyter notebook in the terminal. Install pip install jut Usage $jut --

Kracekumar 169 Dec 27, 2022
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 3, 2023
ipyvizzu - Jupyter notebook integration of Vizzu

ipyvizzu - Jupyter notebook integration of Vizzu. Tutorial · Examples · Repository About The Project ipyvizzu is the Jupyter Notebook integration of V

Vizzu 729 Jan 8, 2023
Calendar heatmaps from Pandas time series data

Note: See MarvinT/calmap for the maintained version of the project. That is also the version that gets published to PyPI and it has received several f

Martijn Vermaat 195 Dec 22, 2022
Draw datasets from within Jupyter.

drawdata This small python app allows you to draw a dataset in a jupyter notebook. This should be very useful when teaching machine learning algorithm

vincent d warmerdam 505 Nov 27, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 1, 2023
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 697 Jan 6, 2023
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 822 Jan 7, 2023
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 349 Feb 15, 2021
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 614 Feb 17, 2021
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

???? Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

wq framework 1.2k Jan 1, 2023
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas

CogniPy for Pandas - In-memory Graph Database and Knowledge Graph with Natural Language Interface Whats in the box Reasoning, exploration of RDF/OWL,

Cognitum Octopus 34 Dec 13, 2022
Using SQLite within Python to create database and analyze Starcraft 2 units data (Pandas also used)

SQLite python Starcraft 2 English This project shows the usage of SQLite with python. To create, modify and communicate with the SQLite database from

null 1 Dec 30, 2021
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 17.9k Dec 31, 2022
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 13.9k Feb 13, 2021
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 14k Feb 18, 2021
High performance, editable, stylable datagrids in jupyter and jupyterlab

An ipywidgets wrapper of regular-table for Jupyter. Examples Two Billion Rows Notebook Click Events Notebook Edit Events Notebook Styling Notebook Pan

J.P. Morgan Chase 75 Dec 15, 2022