Python for Data Analysis, 2nd Edition

Overview

Python for Data Analysis, 2nd Edition

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media

Buy the book on Amazon

Follow Wes on Twitter: Twitter Follow

1st Edition Readers

If you are reading the 1st Edition (published in 2012), please find the reorganized book materials on the 1st-edition branch.

Translations

IPython Notebooks:

License

Code

The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.

Comments
  • What would everyone like to see in the 2nd edition?

    What would everyone like to see in the 2nd edition?

    I've started working on the revised 2nd Edition of Python for Data Analysis. The agenda / table of contents is not set in stone, though!

    Any comments on the existing content or requests for new content would be welcome here. I can't make any promises, but since I know how useful the book has been for many people the last 3.5 years, I would like to make sure the 2nd edition is just as useful (if not more so!) in the following 3.5 years (which will put us all the way to 2020, if you can believe it).

    Thank you all in advance for the support.

    opened by wesm 42
  • Ch2 p18 JSON error

    Ch2 p18 JSON error

    Still having trouble using JSON in Python 3

    I have added the 'rb' to open() records = [json.loads(line) for line in open(path,'rb')]

    Now getting error in json.loads

    TypeError: the JSON object must be str, not 'bytes'

    Any help appreciated Thanks

    opened by TerrySnow1963 30
  • Can't get it to work

    Can't get it to work

    I'm new to your book and trying like crazy---but I seem to be missing a lot somewhere, somehow. I hope you can help me. I have tried to work through the usa.gov data from bit.ly and I always get no such file exists errors. The same goes for the movielens data...no matter how I try I get the same errors. Am I supposed to download those files separately? I've culled the net looking for better code but I figured you could point me in the right direction or tell me what I'm doing wrong?

    Please help. I would hate for this book to have been a terrible waste!!!

    opened by jharbert1 10
  • DataFrame constructor did not accept a list for the index keyword argument

    DataFrame constructor did not accept a list for the index keyword argument

    This is in the ch05.ipynb

    Changed pd.DataFrame(pop, index=[2001, 2002, 2003]) to pd.DataFrame(pop, index=pd.Series([2001, 2002, 2003])) in order to get this cell to compile

    opened by madenu 9
  • ch7 7.3 String Manipulation

    ch7 7.3 String Manipulation

    Vectorized String Functions in pandas

    I can't figure out the meaning of the following code, can you explain it further:

    In [176]: matches .str.get(1) Out[176]: Dave NaN Rob NaN Steve NaN Wes NaN dtype: float64

    In [177]: matches .str[0] Out[177]: Dave NaN Rob NaN Steve NaN Wes NaN dtype: float64

    opened by GengBin414 7
  • Issue Openning Files

    Issue Openning Files

    I have downloaded all the files in this repository. When I try to work with them, for example using the code list(open(('examples/ex3.txt'), I consistently receive the following error:

    FileNotFoundError: [Errno 2] No such file or directory: 'ex3.txt'

    Do you know of any potential causes for this issue? Does it have to do with the directory that I have saved my download to?

    opened by nabs825 6
  • Problem with parsing Movie Lens data using code in book

    Problem with parsing Movie Lens data using code in book

    Hi,

    I am working through the Ch02 material - and have a problem with the initial reading of the movie lens data. I am running the initial code as in the book:

    import pandas as pd
    import os
    encoding = 'latin1'
    
    
    upath = os.path.expanduser('pydata-book-master/ch02/movielens/users.dat')
    rpath = os.path.expanduser('pydata-book-master/ch02/movielens/ratings.dat')
    mpath = os.path.expanduser('pydata-book-master/ch02/movielens/movies.dat')
    
    unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    mnames = ['movie_id', 'title', 'genres']
    

    (with paths amended to work for where I have the files)

    but when I run the line:

    users = pd.read_csv(upath, sep='::', header=None, names=unames, encoding=encoding)
    

    I get the message:

    /Users/Chris/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
      if __name__ == '__main__':
    

    I have tried switching to a Python2 kernel - but get an equivalent message.

    What is the root of this issue? as far as I can interpret this it is having a problem with specifying the multicharacter '::' as the data separator. But I don't really understand how to correct this. How should I fix it to avoid similar issues with this and future code in the book?

    Many thanks

    opened by chrisrb10 5
  • Error on page 352.

    Error on page 352.

    The 2 code examples are the same.

    In[216]: ts.resample('5min', closed='right').sum()

    In[217]: ts.resample('5min', closed='right').sum()

    216 should be without the closed='right'

    opened by charbelsako 4
  • Chp 2 Pg 38 “prop_cumsum” error

    Chp 2 Pg 38 “prop_cumsum” error

    Hi Guys,

    I'm facing some error when running on "Prop_cumsum". I posted on stackoverflow but can't seem to find a solution there.

    I did everything correct prior to running this error, including;

    • changing Row to Indexs and Cols to Columns from the previous commands.

    The error message is on this link below

    http://stackoverflow.com/questions/41298599/python-for-data-analysis-chp-2-pg-38-prop-cumsum-error?noredirect=1#comment69866535_41298599

    Thanks guys

    opened by Ohnonononono 4
  • Delete line index # 495  - something is wrong with this record

    Delete line index # 495 - something is wrong with this record

    Hi,

    Something is wrong with record 495 from ch02/usagov_bitly_data2012-03-16-1331923249.txt.

    It causes error when running these statements:

    import json
    path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
    records = [json.loads(line) for line in open(path)]
    

    Fixed version:

    import json
    path ='ch02/usagov_bitly_data2012-03-16-1331923249.txt'
    temp = [line for line in open(path)]
    temp.pop(495)          # delete problematic record
    for i in range(len(temp)):
         records[i] = json.loads(temp[i])
    
    opened by vbnk 4
  • Third version typo on the website

    Third version typo on the website

    Chapter 5, Paragraph 2, at this link: https://wesmckinney.com/book/pandas-basics.html

    While pandas adopts many coding idioms from NumPy, the biggestabout difference is that pandas is designed for working with tabular or heterogeneous data.

    The word biggestabout seems to be a typo, where it should just be biggest?

    opened by yibenhuang 3
  • 3rd Edition typo on p. 169

    3rd Edition typo on p. 169

    Since MSFT is a valid Python variable name, we can also select these columns using more concise syntax: In [285]: returns["MSFT"].corr(returns["IBM"])

    should be as in 2nd Edition: returns.MSFT.corr(returns.IBM)

    Otherwise there is no difference between the [283] code and the [285] code. Thanks.

    opened by christianvye 3
  • Chaper 2 - Running the IPython Shell

    Chaper 2 - Running the IPython Shell

    The first two lines are Python code statements; the second statement creates a variable named data that refers to a newly created Python dictionary. The last line prints the value of data in the console.

    CORRECTION: data is a list not dictionary

    In [5]: import numpy as np

    In [6]: data = [np.random.standard_normal() for i in range(7)]

    In [7]: data

    opened by tkanngiesser 1
Owner
Wes McKinney
CTO of https://voltrondata.com. Creator of Python pandas. Co-creator Apache Arrow. @apache Member and Apache Parquet PMC
Wes McKinney
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
A set of functions and analysis classes for solvation structure analysis

SolvationAnalysis The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes,

MDAnalysis 19 Nov 24, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 4, 2021
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 7, 2021
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

null 6 Oct 11, 2022
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dynamical systems.

null 2 Nov 8, 2021
Python-based Space Physics Environment Data Analysis Software

pySPEDAS pySPEDAS is an implementation of the SPEDAS framework for Python. The Space Physics Environment Data Analysis Software (SPEDAS) framework is

SPEDAS 98 Dec 22, 2022
Python Project on Pro Data Analysis Track

Udacity-BikeShare-Project: Python Project on Pro Data Analysis Track Basic Data Exploration with pandas on Bikeshare Data Basic Udacity project using

Belal Mohammed 0 Nov 10, 2021
Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python This project is a good starting point for those who have little

Himanshu Kumar singh 2 Dec 4, 2021
Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

Bhavya Gopal 3 Jan 31, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

Hardik Bhanot 1 Dec 26, 2021
Spaghetti: an open-source Python library for the analysis of network-based spatial data

pysal/spaghetti SPAtial GrapHs: nETworks, Topology, & Inference Spaghetti is an open-source Python library for the analysis of network-based spatial d

Python Spatial Analysis Library 203 Jan 3, 2023
TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)

null 136 Dec 22, 2022
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python ??

Thomas 2 May 26, 2022
Project: Netflix Data Analysis and Visualization with Python

Project: Netflix Data Analysis and Visualization with Python Table of Contents General Info Installation Demo Usage and Main Functionalities Contribut

Kathrin Hälbich 2 Feb 13, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022