One month ago, I got an email from a colleague:
The subject of his email was a single word: PyData
The email had a single EventBrite.com URL.
Intrigued, I clicked… and that’s when my adventure into PyData began.
Keynote on Friday (Jul-24)
One-sentence summary? An interesting 1-hour plug for code.org, but oddly no mention of Python or much about data analysis.
What was the key message? The key message for me was the Hour of Code. Students (or in some cases entire schools) commit to a one-hour coding event to expose themselves to the awesome world of computing.
Simplified statistics through simulation
Who was the speaker? Justin Bozonier gave this tutorial. Great speaker and actively engaged with the audience without forcing the audience to participate (personally, I don’t like forced participation).
What was the key message? If you’d like to explore some statistics or probability theory, simulate it using Python so you can see the concepts come to life.
Where can I find out more? All the tutorial material is uploaded to Justin’s github PyData2015 folder.
His PyData2015 IPython Notebook is a great way to experiment with the tutorial.
- Monte Carlo Simulation
- Simulating a function
- Simulating split tests
- Simulating a probability puzzle
- …and a few bonuses.
A brief introduction to Distributed Computing with PySpark
Who was the speaker? Holden Karau gave this hands-on tutorial. I found her to be an engaging and knowledgeable speaker! She explained concepts in understandable language (especially for those new to Spark, like me!).
What was the key message? The key message for me was PySpark makes manipulating Spark for large-scale data processing super intuitive and straight-forward.
Where can I find out more? The tutorial slides are on SlideShare.
I especially liked the coverage of:
- Comparison to Hadoop
- Different parts of Spark
- RDDs + lazy evaluation
- Data frames and working with tweets
Learn to Build an App to Find Similar Images using Deep Learning
What was the key message? My key takeaway is Dato’s GraphLab platform makes it super easy to play with and inspect the complexities of deep learning. However, I also enjoyed the history, motivation and difficulties of deep learning that Piotr went over.
Where can I find out more? The tutorial materials are available from Dato.
The IPython Notebooks (check out step 2 from the tutorial materials) from the tutorial are fantastic.
I really enjoyed the 3 exercises:
- Hand-written digit recognition of the MNIST dataset
- Implementing a dress recommender
- Deploying the dress recommender as a service
Now it’s your turn…
Ready to rock with these PyData tutorials?
Then download the tutorials to get rolling right away.
The simulation tutorial is probably the easiest to get setup. The deep learning one is slightly more challenging, but super rewarding!