33: Katharine Jarmul - Testing in Data Science




Test & Code - Software Testing, Development, Python show

Summary: <p>A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.</p> <p>Some of the topics we discuss:</p> <ul> <li>experimentation vs testing</li> <li>testing pipelines and pipeline changes</li> <li>automating data validation</li> <li>property based testing</li> <li>schema validation and detecting schema changes</li> <li>using unit test techniques to test data pipeline stages</li> <li>testing nodes and transitions in DAGs</li> <li>testing expected and unexpected data</li> <li>missing data and non-signals</li> <li>corrupting a dataset with noise</li> <li>fuzz testing for both data pipelines and web APIs</li> <li>datafuzz</li> <li>hypothesis</li> <li>testing internal interfaces</li> <li>documenting and sharing domain expertise to build good reasonableness </li> <li>intermediary data and stages </li> <li>neural networks</li> <li>speaking at conferences</li> </ul><p>Special Guest: Katharine Jarmul.</p><p>Sponsored By:</p><ul> <li> <a rel="nofollow" href="http://amzn.to/2E6cYZ9">Python Testing with pytest</a>: <a rel="nofollow" href="http://amzn.to/2E6cYZ9">Simple, Rapid, Effective, and Scalable The fastest way to learn pytest. From 0 to expert in under 200 pages.</a> </li> <li> <a rel="nofollow" href="https://www.patreon.com/testpodcast">Patreon Supporters</a>: <a rel="nofollow" href="https://www.patreon.com/testpodcast">Help support the show with as little as $1 per month. Funds help pay for expenses associated with the show.</a> </li> </ul><p><a rel="payment" href="https://www.patreon.com/testpodcast">Support Test &amp; Code - Software Testing, Development, Python</a></p><p>Links:</p><ul> <li> <a title="@kjam on Twitter" rel="nofollow" href="https://twitter.com/kjam">@kjam on Twitter</a> — Data Magic and Computer Sorcery</li> <li><a title="Kjamistan: Data Science" rel="nofollow" href="http://kjamistan.com/">Kjamistan: Data Science</a></li> <li> <a title="datafuzz’s Python library" rel="nofollow" href="https://datafuzz.readthedocs.io/en/latest/">datafuzz’s Python library</a> — The goal of datafuzz is to give you the ability to test your data science code and models with BAD data.</li> <li> <a title="Hypothesis Python library" rel="nofollow" href="https://hypothesis.readthedocs.io/en/latest/">Hypothesis Python library</a> — Hypothesis is a Python library for finding edge cases in your code you wouldn’t have thought to look for.</li> </ul>