Avoiding common pitfalls and working effectively with PySpark

by Jonathan Rioux

Machine Learning & Data Science Tools, Testing, and Practices

For a Python developer, using PySpark may feel foreign, like driving a race car in sandals. This talk is about battle stories using PySpark from development to production, and how my many errors can lead to better code. In no particular order, I'll discuss speeding up development, avoiding 'friendly enemies' and testing code. You'll see how to avoid mistakes by seeing me making them, and you'll leave a more insightful PySpark developer.


About the Author

Jonathan is the data science practice lead for EPAM Canada, a global engineering consultancy. He worked in insurance, analytics and data science for a little over a decade. He is passionate about programming languages and how they allow to map more and more complex ideas. Jonathan is the author of PySpark in Action (Manning, scheduled for 2020).


Talk Details

Date: Sunday Nov. 17

Location: Round Room (PyData Track)

Begin time: 11:00

Duration: 25 minutes