O'Reilly Data Show Podcast show

O'Reilly Data Show Podcast

Summary: The O'Reilly Data Show Podcast explores the opportunities and techniques driving big data, data science, and AI.

Join Now to Subscribe to this Podcast

Podcasts:

 Enabling enterprise adoption of AI technologies | File Type: audio/mpeg | Duration: 00:34:30

In this episode of the O’Reilly Data Show, I spoke with Jana Eggers, CEO of Nara Logics. Eggers’ involvement with AI dates back to her days as a researcher at the Los Alamos National Laboratory. Most recently she has been helping companies across many industries adopt AI technologies as a way to enable a range of intelligent data applications.

 Using Agile development techniques for data science projects | File Type: audio/mpeg | Duration: 00:44:18

In this episode of the O’Reilly Data Show, I spoke with John Akred, cofounder and CTO of Silicon Valley Data Science. Akred and his colleagues teach two of the more popular Strata + Hadoop World tutorials—“Developing a Modern Enterprise Data Strategy” and “Architecting a Data Platform.” We talked about his career in data science and consulting, and his penchant for bringing emerging technologies and tools into large enterprises.

 Commercial speech recognition systems in the age of big data and deep learning | File Type: audio/mpeg | Duration: 00:42:19

In this episode of the O’Reilly Data Show, I spoke with Yishay Carmiel, president of Spoken Labs. As voice becomes a common user interface, the need for accurate and intelligent speech technologies has grown. And although computer vision is a common entry point for deep learning, some of the most interesting commercial applications of deep neural networks are in speech recognition. Carmiel has spent several years building commercial speech applications, and along the way he has witnessed (and helped architect) massive improvements in speech technologies.

 Building intelligent applications with deep learning and TensorFlow | File Type: audio/mpeg | Duration: 00:38:44

In this episode of the O’Reilly Data Show, I spoke with Rajat Monga, who serves as a director of engineering at Google and manages the TensorFlow engineering team. We talked about how he ended up working on deep learning, the current state of TensorFlow, and the applications of deep learning to products at Google and other companies.

 Hybrid transactional/analytic systems and the quest for database nirvana | File Type: audio/mpeg | Duration: 00:37:08

In this episode of the O’Reilly Data Show, I spoke with data management industry veteran Rohit Jain, currently the CTO of Esgyn. We talked about his years at HP Labs, and his recent project to bring hybrid transactional/analytic technologies into the Hadoop ecosystem.

 Using AI to build a comprehensive database of knowledge | File Type: audio/mpeg | Duration: 00:39:31

Extracting structured information from semi-structured or unstructured data sources (“dark data”) is an important problem. One can take it a step further by attempting to automatically build a knowledge graph from the same data sources. Knowledge databases and graphs are built using (semi-supervised) machine learning, and then subsequently used to power intelligent systems that form the basis of AI applications. The more advanced messaging and chat bots you’ve encountered rely on these knowledge stores to interact with users. In this episode of the Data Show, I spoke with Mike Tung, founder and CEO of Diffbot - a company dedicated to building large-scale knowledge databases. Diffbot is at the heart of many web applications, and it’s starting to power a wide array of intelligent applications. We talked about the challenges of building a web-scale platform for doing highly accurate, semi-supervised, structured data extraction. We also took a tour through the AI landscape, and the early days of self-driving cars.

 Structured streaming comes to Apache Spark 2.0 | File Type: audio/mpeg | Duration: 00:44:44

With the release of Spark version 2.0, streaming starts becoming much more accessible to users. By adopting a continuous processing model (on an infinite table), the developers of Spark have enabled users of its SQL or DataFrame APIs to extend their analytic capabilities to unbounded streams. Within the Spark community, Databricks Engineer, Michael Armbrust is well-known for having led the long-term project to move Spark’s interactive analytics engine from Shark to Spark SQL. (Full disclosure: I’m an advisor to Databricks.) Most recently he has turned his efforts to helping introduce a much simpler stream processing model to Spark Streaming (“structured streaming”).

 Building and deploying large-scale machine learning applications | File Type: audio/mpeg | Duration: 00:40:30

In this episode of the O’Reilly Data Show, I spoke with Danny Bickson, co-founder and VP at Dato, and the principal organizer of the Data Science Summit (full disclosure: I’m a member of the conference organizing committee). Among machine learning students and practitioners, recommender systems have become somewhat of a canonical use case and application. One of the early and popular building blocks was GraphLab’s collaborative filtering toolkit, a library originally written and maintained by Bickson. He has continued to keep tabs on the latest developments in recommenders and continues to help organize workshops on related topics throughout the world.

 Semi-supervised, unsupervised, and adaptive algorithms for large-scale time series | File Type: audio/mpeg | Duration: 00:48:05

In this episode of the O’Reilly Data Show, I spoke with Ira Cohen, co-founder and chief data scientist at Anodot (full disclosure: I’m an advisor to Anodot). Since my days in quantitative finance, I’ve had a longstanding interest in time-series analysis. Back then, I used statistical (and data mining) techniques on relatively small volumes of financial time series. Today’s applications and use cases involve data volumes and speeds that require a new set of tools for data management, collection, and simple analysis.

 Practical machine learning techniques for building intelligent applications | File Type: audio/mpeg | Duration: 00:42:59

In this episode of the O’Reilly Data Show, I spoke with Mikio Braun, delivery lead and data scientist at Zalando. After spending previous years in academia, Braun recently made the decision to switch to industry. He shared some observations about building large-scale systems, particularly deploying data applications in production systems. Given his longstanding background as a machine learning researcher and practitioner, I wanted to get his take on topics like deep learning, hybrid systems, feature engineering, and AI applications.

 Cameron Turner on the sound of data | File Type: audio/mpeg | Duration: 00:28:41

How can sound be used to both generate data and express data? In this episode of the Hardware podcast, we talk with Cameron Turner, co-founder and principal at The Data Guild. Turner is the author of the new O’Reilly report “Finding Profit in Your Organization’s Data: Examples and Best Practices.”

 Democratizing business analytics | File Type: audio/mpeg | Duration: 00:41:57

In this episode of the O’Reilly Data Show, I spoke with one of Strata + Hadoop World’s most popular teachers—Duncan Ross, data and analytics director at TES Global. In his long career in data, Ross has seen several stages of the evolution of tools, techniques, and training programs, and along the way he has interacted with business managers in many countries and regions. In keeping with his wide-ranging interests, we discussed many topics, including business analytics, data science training programs, data philanthropy and data for good, and university rankings.

 Stream processing and messaging systems for the IoT age | File Type: audio/mpeg | Duration: 00:58:24

In this episode of the O’Reilly Data Show, I spoke with M.C. Srivas, co-founder of MapR and currently chief architect for data at Uber. We discussed his long career in data management and his experience building a variety of distributed systems. In the course of his career, Srivas has architected key components that now comprise many data platforms (distributed file system, database, query engine, messaging system, etc.).

 Using Apache Spark to predict attack vectors among billions of users and trillions of events | File Type: audio/mpeg | Duration: 00:33:39

In this episode of the O'Reilly Data Show, I spoke with Fang Yu, co-founder and CTO of DataVisor. We discussed her days as a researcher at Microsoft, the application of data science and distributed computing to security, and hiring and training data scientists and engineers for the security domain.

 Rachel Kalmar on data ecosystems | File Type: audio/mpeg | Duration: 00:51:35

In this new episode of the Hardware Podcast, David Cranor and I talk with data scientist Rachel Kalmar, formerly with Misfit Wearables and the founder and organizer of the Sensored Meetup in San Francisco. She shares insights from her work at the intersection of data, hardware, and health care.

Comments

Login or signup comment.