O'Reilly Data Show Podcast
Summary: The O'Reilly Data Show Podcast explores the opportunities and techniques driving big data, data science, and AI.
- Visit Website
- RSS
- Artist: O'Reilly Media
- Copyright: © 2020 O'Reilly Media
Podcasts:
In this episode of the Data Show, I spoke with Jason Dai, CTO of big data technologies at Intel, and co-chair of Strata + Hadoop World Beijing. Dai and his team are prolific and longstanding contributors to the Apache Spark project. Their early contributions to Spark tended to be on the systems side and included Netty-based shuffle, a fair-scheduler, and the “yarn-client” mode. Recently, they have been contributing tools for advanced analytics. In partnership with major cloud providers in China, they’ve written implementations of algorithmic building blocks and machine learning models that let Apache Spark users scale to extremely high-dimensional models and large data sets. They achieve scalability by taking advantage of things like data sparsity and Intel’s MKL software. Along the way, they’ve gained valuable experience and insight into how companies deploy machine learning models in real-world applications.
As data scientists add deep learning to their arsenals, they need tools that integrate with existing platforms and frameworks. This is particularly important for those who work in large enterprises. In this episode of the Data Show, I spoke with Adam Gibson, co-founder and CTO of Skymind, and co-creator of Deeplearning4J (DL4J). Gibson has spent the last few years developing the DL4J library and community, while simultaneously building deep learning solutions and products for large enterprises.
Specialists describe deep learning as akin to a rocketship that needs a really big engine (a model) and a lot of fuel (the data) in order to go anywhere interesting. To get a better understanding of the issues involved in building compute systems for deep learning, I spoke with one of the foremost experts on this subject: Greg Diamos, senior researcher at Baidu. Diamos has long worked to combine advances in software and hardware to make computers run faster. In recent years, he has focused on scaling deep learning to help advance the state-of-the-art in areas like speech recognition.
This episode consists of excerpts from a recent talk I gave at a conference commemorating the end of the UC Berkeley AMPLab project. This section pertained to some recent trends in Data and AI. For a complete list of trends we’re watching in 2017, as well as regular doses of highly curated resources, subscribe to our Data and AI newsletters.
In this episode I spoke with Ion Stoica, cofounder and chairman of Databricks. Stoica is also a professor of computer science at UC Berkeley, where he serves as director of the new RISE Lab (the successor to AMPLab). Fresh off the incredible success of AMPLab, RISE seeks to build tools and platforms that enable sophisticated real-time applications on live data, while maintaining strong security. As Stoica points out, users will increasingly expect security guarantees on systems that rely on online machine learning algorithms that make use of personal or proprietary data.
In this episode I spoke with Vikash Mansinghka, research scientist at MIT, where he leads the Probabilistic Computing Project, and co-founder of Empirical Systems. I’ve long wanted to introduce listeners to recent developments in probabilistic programming, and I found the perfect guide in Mansinghka.
In this episode I spoke with Michael Franklin, co-director of UC Berkeley’s AMPLab and chair of the Department of Computer Science at the University of Chicago. AMPLab is well-known in the data community for having originated Apache Spark, Alluxio (formerly Tachyon) and many other open source tools. Today marks the start of a two-day symposium commemorating the end of AMPLab, and we took the opportunity to reflect on its impressive accomplishments.
In this special two-segment episode of the Data Show, I spoke with Dafna Shahaf, assistant professor at the School of Computer Science and Engineering at the Hebrew University of Jerusalem. Her area of research is focused on tools and techniques for overcoming information overload, an area of increasing importance in an attention economy. With the upcoming U.S. Presidential Elections right around the corner, I included a conversation between Jenn Webb, host of the O’Reilly Radar Podcast, and Sam Wang, co-founder of the Princeton Election Consortium and professor of neuroscience and molecular biology at Princeton University.
In this episode of the O’Reilly Data Show, I spoke with Christopher Nguyen, CEO and co-founder of Arimo. Nguyen and Arimo were among the first adopters and proponents of Apache Spark, Alluxio, and other open source technologies. Most recently, Arimo’s suite of analytic products has relied on deep learning to address a range of business problems.
In this episode of the O’Reilly Data Show, O’Reilly’s online managing editor Jenn Webb speaks with Natalino Busa on the topic of predictive analytics, the challenges of feature engineering, and a new class of techniques that is enabling features to emerge from patterns within the data. They also discuss the relationship between predictive techniques and high-quality microservices, and how machine learning is being used to improve financial services.
Ask a random person for an example of an AI system and chances are he or she will name self-driving vehicles. In this episode of the O’Reilly Data Show, I sat down with Shaoshan Liu, co-founder of PerceptIn and previously the senior architect (autonomous driving) at Baidu USA. We talked about the technology behind self-driving vehicles, their reliance on rule-based decision engines, and deploying large-scale deep learning systems.
In this episode of the O’Reilly Data Show I sat down with O’Reilly author Dean Wampler, big data architect at Lightbend. We talked about new architectures for stream processing, Scala, and cloud computing.
In this episode of the O’Reilly Data Show, I spoke with Michael Li, cofounder and CEO of the Data Incubator. We discussed the current state of data science and data engineering training programs, Apache Spark, quantitative finance, and the misunderstanding around the term “data science.”
While I was in Beijing for Strata + Hadoop World, several people reminded me of the chatbot Xiaoice—one of the most popular accounts on the Chinese social media site Weibo. Developed by Microsoft researchers, Xiaoice comes with a personality and is able to engage users in extended conversations on Weibo. These types of capabilities highlight that in an attention economy, systems that are able to forge an emotional connection will garner more loyalty and engagement from users. In this episode of the O’Reilly Data Show, I sat down with Rana el Kaliouby, co-founder and CEO of Affectiva, one of the leading experts in emotion sensing systems. We talked about the impact of deep learning and computer vision, Affectiva’s large facial expression database, and privacy and ethics in an era of multimodal systems.
In this episode of the O’Reilly Data Show, I spoke with Adam Marcus, co-founder and CTO of B12, a startup focused on building human-in-the-loop intelligent applications. We talked about the open source platform Orchestra,for coordinating human-in-the-loop projects; the current wave of human-assisted AI applications; best practices for reviewing and scoring experts; and flash teams.