O'Reilly Data Show Podcast show

O'Reilly Data Show Podcast

Summary: The O'Reilly Data Show Podcast explores the opportunities and techniques driving big data, data science, and AI.

Join Now to Subscribe to this Podcast

Podcasts:

 Deep learning for Apache Spark | File Type: audio/mpeg | Duration: 00:31:00

In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode. Recently, they have been contributing tools for advanced analytics. In partnership with major cloud providers in China, they’ve written implementations of algorithmic building blocks and machine learning models that let Apache Spark users scale to extremely high-dimensional models and large data sets. They achieve scalability by taking advantage of things like data sparsity and Intel’s MKL software. Along the way, they’ve gained valuable experience and insight into how companies deploy machine learning models in real-world applications.

 The key to building deep learning solutions for large enterprises | File Type: audio/mpeg | Duration: 00:35:07

As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.

 How big compute is powering the deep learning rocket ship | File Type: audio/mpeg | Duration: 00:55:29

Specialists describe deep learning as akin to a rocketship that needs a really big engine (a model) and a lot of fuel (the data) in order to go anywhere interesting. To get a better understanding of the issues involved in building compute systems for deep learning, I spoke with one of the foremost experts on this subject: Greg Diamos, senior researcher at Baidu. Diamos has long worked to combine advances in software and hardware to make computers run faster. In recent years, he has focused on scaling deep learning to help advance the state-of-the-art in areas like speech recognition.

 2017 will be the year the data science and big data community engage with AI technologies | File Type: audio/mpeg | Duration: 00:16:16

This episode consists of excerpts from a recent talk I gave at a conference commemorating the end of the UC Berkeley AMPLab project. This section pertained to some recent trends in Data and AI. For a complete list of trends we’re watching in 2017, as well as regular doses of highly curated resources, subscribe to our Data and AI newsletters.

 Data is only as valuable as the decisions it enables | File Type: audio/mpeg | Duration: 00:50:12

In this episode I spoke with Ion Stoica, cofounder and chairman of Databricks. Stoica is also a professor of computer science at UC Berkeley, where he serves as director of the new RISE Lab (the successor to AMPLab). Fresh off the incredible success of AMPLab, RISE seeks to build tools and platforms that enable sophisticated real-time applications on live data, while maintaining strong security. As Stoica points out, users will increasingly expect security guarantees on systems that rely on online machine learning algorithms that make use of personal or proprietary data.

 Introducing model-based thinking into AI systems | File Type: audio/mpeg | Duration: 00:44:31

In this episode I spoke with Vikash Mansinghka, research scientist at MIT, where he leads the Probabilistic Computing Project, and co-founder of Empirical Systems. I’ve long wanted to introduce listeners to recent developments in probabilistic programming, and I found the perfect guide in Mansinghka.

 Building the next-generation big data analytics stack | File Type: audio/mpeg | Duration: 00:53:07

In this episode I spoke with Michael Franklin, co-director of UC Berkeley’s AMPLab and chair of the Department of Computer Science at the University of Chicago. AMPLab is well-known in the data community for having originated Apache Spark, Alluxio (formerly Tachyon) and many other open source tools. Today marks the start of a two-day symposium commemorating the end of AMPLab, and we took the opportunity to reflect on its impressive accomplishments.

 Visual tools for overcoming information overload | File Type: audio/mpeg | Duration: 00:50:15

In this special two-segment episode of the Data Show, I spoke with Dafna Shahaf, assistant professor at the School of Computer Science and Engineering at the Hebrew University of Jerusalem. Her area of research is focused on tools and techniques for overcoming information overload, an area of increasing importance in an attention economy. With the upcoming U.S. Presidential Elections right around the corner, I included a conversation between Jenn Webb, host of the O’Reilly Radar Podcast, and Sam Wang, co-founder of the Princeton Election Consortium and professor of neuroscience and molecular biology at Princeton University.

 Why businesses should pay attention to deep learning | File Type: audio/mpeg | Duration: 00:51:24

In this episode of the O’Reilly Data Show, I spoke with Christopher Nguyen, CEO and co-founder of Arimo. Nguyen and Arimo were among the first adopters and proponents of Apache Spark, Alluxio, and other open source technologies. Most recently, Arimo’s suite of analytic products has relied on deep learning to address a range of business problems.

 Understanding predictive analytics | File Type: audio/mpeg | Duration: 00:31:22

In this episode of the O’Reilly Data Show, O’Reilly’s online managing editor Jenn Webb speaks with Natalino Busa on the topic of predictive analytics, the challenges of feature engineering, and a new class of techniques that is enabling features to emerge from patterns within the data. They also discuss the relationship between predictive techniques and high-quality microservices, and how machine learning is being used to improve financial services.

 The technology behind self-driving vehicles | File Type: audio/mpeg | Duration: 00:34:49

Ask a random person for an example of an AI system and chances are he or she will name self-driving vehicles. In this episode of the O’Reilly Data Show, I sat down with Shaoshan Liu, co-founder of PerceptIn and previously the senior architect (autonomous driving) at Baidu USA. We talked about the technology behind self-driving vehicles, their reliance on rule-based decision engines, and deploying large-scale deep learning systems.

 Data architectures for streaming applications | File Type: audio/mpeg | Duration: 00:41:00

In this episode of the O’Reilly Data Show I sat down with O’Reilly author Dean Wampler, big data architect at Lightbend. We talked about new architectures for stream processing, Scala, and cloud computing.

 Data science for humans and data science for machines | File Type: audio/mpeg | Duration: 00:42:12

In this episode of the O’Reilly Data Show, I spoke with Michael Li, cofounder and CEO of the Data Incubator. We discussed the current state of data science and data engineering training programs, Apache Spark, quantitative finance, and the misunderstanding around the term “data science.”

 The importance of emotion in AI systems | File Type: audio/mpeg | Duration: 00:32:13

While I was in Beijing for Strata + Hadoop World, several people reminded me of the chatbot Xiaoice—one of the most popular accounts on the Chinese social media site Weibo. Developed by Microsoft researchers, Xiaoice comes with a personality and is able to engage users in extended conversations on Weibo. These types of capabilities highlight that in an attention economy, systems that are able to forge an emotional connection will garner more loyalty and engagement from users. In this episode of the O’Reilly Data Show, I sat down with Rana el Kaliouby, co-founder and CEO of Affectiva, one of the leading experts in emotion sensing systems. We talked about the impact of deep learning and computer vision, Affectiva’s large facial expression database, and privacy and ethics in an era of multimodal systems.

 Building human-assisted AI applications | File Type: audio/mpeg | Duration: 00:43:55

In this episode of the O’Reilly Data Show, I spoke with Adam Marcus, co-founder and CTO of B12, a startup focused on building human-in-the-loop intelligent applications. We talked about the open source platform Orchestra,for coordinating human-in-the-loop projects; the current wave of human-assisted AI applications; best practices for reviewing and scoring experts; and flash teams.

Comments

Login or signup comment.