Resources: Python (pandas) for Data Analysis

I’ve been using Python with pandas on and off to automate analysis of reading-test data at the non-profit where I work. If you’re regularly analysing data – even just running means and standard deviations on excel files – you can save huge amounts of time and frustration by automating things with Python.

These are (amazing, free) resources for pandas that I mostly came across via Tom Augspurger’s site. I’ve dipped in and out of them and it would have saved me a lot of time if I’d found them earlier and been a bit more systematic in learning the basics of Pandas.

Some familiarity with Python or other program languages is a big help before using these books (PY4E or Think Python are enough).

The list:

  1. Wes McKinney’s Python for Data Analysis, 3rd ed., open access version
  2. Greg Reda’s Intro to Pandas Data Structures
  3. Tom Augspurger’s Modern Pandas
  4. Easier data analysis in Python with pandas (video series)
  5. Think Stats (Allen Downey)
  6. DataCarpentry.org Ecology Workshop (this seems to be the best one – a good place to start for thinking about what good data looks like and how to format and handle it)

I'd love to hear your thoughts and recommended resources...