Data Podcast show

Data Podcast

Summary: Our podcast includes both technical & non-technical discussions on BigData, DataScience, BI, AI, DW, Business Intelligence, TDWI, SqlServer, SQL, NoSql, AWS, Azure, R, Python. Hosts: Rajib Bahar, Shabnam Khan

Join Now to Subscribe to this Podcast


 Joe Sack (@JoeSackMSFT) - Adaptive Query Processing in SQL Server 2017 Engine | File Type: audio/mpeg | Duration: 00:21:23

Joe Sack is a Principal Program Manager in the Azure SQL Database and SQL Server product team at Microsoft, with a focus on the Query Processor. Joe is an author and speaker with over 20 years of experience in the industry, specializing in performance tuning, high availability and disaster recovery. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - Your team created Adaptive Query Processing or QP. It is new in SQL Server 2017 and SQL Azure. As we know, SQL Server uses query plan internally to run tsql statements. Sometimes the plan chosen by the query optimizer is not optimal for reasons such as incorrect cardinal estimate and various other issues. What are some other pain points Adaptive QP is meant to cure? SB - Adapative QP's strength lies behind Batch mode memory grant feedback, Batch mode adaptive joins, Interleaved execution... How do they work internally? RB - What are steps to enabling QP and some best practices? Can you tell us what's in the pipe line for upcoming enhancement? SB - How do we connect with you on Social Media? Music:

 Bill Inmon, Father of Datawarehouse discusses history & relevance of DW in age of Big Data | File Type: audio/mpeg | Duration: 00:22:01

Bill Inmon – the “father of data warehouse” – has written 57 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional data base so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado. For more information about textual disambiguation refer to Interviewer: Rajib Bahar, Shabnam Khan Agenda: SB - In the 1970s, you have coined the term, "Datawarehouse". There are countless Data gurus referring to you as the father of "Datawarehousing". We are curious how did your journey start? What did you envision a "Datawarehouse" to be back then, and now? RB - Who were the earliest adopter? What were some interesting discoveries back then? How has the industry evolved? SB - In current state of the Data industry, do you think Datawarehousing is relevant in this hyped up age of Big Data and Data Science? Do these technologies simply compliment existing Data practices? What is your thought on it? RB- One of your project in the data space is called Textual ETL... What is it about? Is it a theoretical concept? Are there any tool in the industry that meets the standard? SB - Your recent publications are on Taxonomies, and Textual Analytics... Our knowledge on it is quite limited. Please enlighten us about the use case scenario for which it's relevant. RB - How do we connect with you in Social media such as Twitter or Blog? Music:

 Varun Bhartia (@VBhartia) - BeeHyve.IO, A learning platform for Computer Scientists | File Type: audio/mpeg | Duration: 00:20:07

Varun Bhartia is the cofounder of an online learning platform for computer science students around the world - helping students connect with each other and with the best career opportunities. He has spend his entire career working in technology NASA, Microsoft, Facebook, and Uber. He has an undergraduate degree from the university of arizona and an mba from Harvard. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - You have served some of the most interesting and awesome organizations... As a product manager, what is the most valuable lesson you have learned? SB - Recently, you left Uber to launch a startup venture called Beehyve so that college students can utilize it as a portal to find exams and homework from prior years... I can see why students would love it... If they can predict the questions on the next test, it'll definitely add value in their academic career. On the other hand, Don't you think professor would hate this idea if they had to do additional homework on coming up with unique test each year? Won't that add risk to your venture? How did you go about critically analyzing all the risk and benefits? What's your vision behind it? RB - I understand you are working on a Data Vertical. How do you plan to achieve it? Is Cloud computing or Big Data involved in any way? SB - Lately, IoT is getting similar kind of positive attention Data Science, Cloud Computing, and Big Data part of the world are receiving? Weren't you involved in an IoT competition in Minnesota? Can you tell us your thoughts on it? RB - How do we connect with you in Twitter, LinkedIn or Blog? Music:

 Michael Ludwig - Graph Database, Apache Gremlin, & Tinkerpop | File Type: audio/mpeg | Duration: 00:20:37

Michael Ludwig is a Data Solution Architect at Microsoft, where he works on Machine Learning, Big Data and Blockchain applications on the Azure platform. Prior to joining Microsoft, Michael worked at Silver Bay, designing and optimizing geographical and financial statistical analysis solutions (mostly regression analysis and clustering). Before that, he was a database architect and then the lead systems architect of a multi-tenant cloud-based Internet-of-Things application for LogicPD, in Minneapolis. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - What is the purpose of a graph database? Why do we use it? SB - Does GoogleMap use graph databases in it's application? RB - What are the major Graph Systems out there? How does Apache Gremlin fit into that? SB - SQL Server 2017 has support for graph table.How do you implement it? RB - How is this similar or dis-similar to graph computing solution implemented in vendor agnostic tools such as Apache Tinker pop? SB- Who can be involved in coding community of TinkerPop? RB - How do we connect with you professionally? Music:

 Frank La Vigne (@TableTeer) - Microsoft Data Science Certification & Data Driven podcast | File Type: audio/mpeg | Duration: 00:30:48

Frank La Vigne leads the Data & Analytics practice at Wintellect and co-hosts the DataDriven podcast. He blogs regularly at and you can watch him on his YouTube channel, “Frank’s World TV” (FranksWorld.TV). Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - You have recently gone through Microsoft's Professional Certification for Data Scientists. Also, you are training others in this area. What are the 4 units of this Data Science certification program, and where does the units of modules also overlap with Microsoft's Big Data certifiation program? SB - Can you tell us a little bit about the Cortana Intelligence Capstone project in Data Science certification? What sort of time committment and technical knowledge required? RB - We often see questionable studies stating something like coffee is unhealthy followed by a counter study contradicting it? Does statistics or overfitting a data model play a role in it? SB - One of the cool thing you do is co-hosting the "Data Driven" podcast with Andy Leonard. He was our guest in the past. In your facebook page for "Data Driven" podcast, your listeners also get to become your viewer and see live videos from Data Science, SQL Server, & other Technology related conferences. What are some insights from recent big conferences. RB - How do we connect with you in Twitter or Blogs or Social Media in general? Music:

 Curtis Seare (@DataCrunchPod) - Co-Host of Data Crunch Podcast discusses his journey & IoT Use cases | File Type: audio/mpeg | Duration: 00:21:24

Curtis Seare is a co-host of the Data Crunch podcast, a Tableau and Trifacta instructor, and the Director of Analytics at Shelfbucks, a retail analytics startup in Austin, Texas. He’s worked for almost a decade in the data-science field across multiple companies and industries. He’s solved problems spanning IoT, retail, marketing, sales, competitive intelligence, nonprofit donations, and product development, among others. Bringing organizational change and innovation in analytical processes has been the center of his work. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - Please give us a little background on Data Crunch podcast's history. SB - We have listened to your Data Crunch episodes highlighting some really interesting applications of analytics such as preventing honey bee fallout, eradicating malaria in Zambia etc. Please enlighten us more on what you have discovered in your research. RB - What are some top application of IoT that retailers find useful? SB - One of the buzzword associated with IoT is streaming analytics. How is this different from standard analytics that we know or understand? RB - In our lifetime, we may find ourselves in a situation where we over-analyze a problem leading to analysis-paralysis. Is there a methodology do you follow in keeping solutions simple with complex analytics project. SB - How do we connect with you on Twitter or Social Media or blog? Music:

 Alteryx (@Alteryx) - Co-founder Libby, & product manager Nick on Gartner wins in Analytics | File Type: audio/mpeg | Duration: 00:29:49

Libby Duane is the Chief Customer Officer and a founding partner of Alteryx. In this role, Libby is responsible for overseeing and maximizing the complete Alteryx customer experience, from engagement to on-boarding, communications, performance, and retention. She has interacted with nearly every Alteryx customer, giving her a holistic perspective of the overall experience from implementation to adoption success. Nick Jewell, Technology Evangelist for Alteryx. He started his career with a PhD in Data Science before Data Science was a sexy term! His background is in studying Chemical Information Science and got to work on some exciting data projects around drug design. He made the jump into ‘big finance’ and spent over a decade learning and developing BI, Big Data and Analytics solutions before the perfect opportunity presented itself to join Alteryx as part of their solutions team. Interviewer: Rajib Bahar, Shabnam Khan Agenda: SB - Alteryx is a platform for Self-Service Data Analytics. What is the mission and vision about it? RB - In Gartner 2017 Magic Quadrant For Data Science Platforms, Alteryx was positioned as a Challenger. Also, it's at the top of niche players in the Business Intelligence and analytics platforms. How did your organization achieve it? SB - There are studies out there on Forbes stating that the majority of time a Data Scientist spends is on preparation of Data. What advantage does Alteryx give on that regard? RB - What kind of Machine Learning or Deep Learning algorithm can Alteryx implement? Please name few of them. Is it possible to customize them to fit a specific scenario? SB - Does Alteryx designer's workflow output directly to Dashboard applications such as Tableau and PowerBI? RB- There was a major conference namely #Alteryx17 recently. We would like to learn about some inside scoop from there. How is it organized and what kind of learning opportunities are available? What kind of audience does it cater to? Techies? or Business experts? or both? SB - What kind of Alteryx learning opportunities are in Minnesota? Is there a user group? RB - How can we utilize your site to learn about Alteryx? Music:

 Guy in a cube (@GuyInACube), Chat with Adam Saxton & Patrick LeBlanc on PowerBI | File Type: audio/mpeg | Duration: 00:23:32

"Guy In a Cube" is a youtube channel for Power BI, which is a data visualization application. Currently, Patrick Leblanc and Adam Saxton are producing contents there. Adam Saxton is just a guy in a cube doing the work! He is on the Power BI team, at Microsoft, working on documentation for Power BI and Reporting Services. He is based in Texas, and started with Microsoft supporting SQL Server connectivity and Reporting Services in 2005. Adam has worked with Power BI since the beginning, on the support side, and now helps to produce content for these products. In addition to documentation, he produces weekly videos for the "Guy in a Cube" YouTube channel. Patrick LeBlanc is currently a Data Platform Solutions Architect at Microsoft and a contributing partner to Guy in a Cube. Along with his 15+ years’ experience in IT, he holds a Masters of Science degree from Louisiana State University. He is the author and co-author of five SQL Server books. Prior to joining Microsoft, he was awarded the Microsoft MVP award for his contributions to the community. Patrick is a regular speaker at many SQL Server conferences. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - Both of you make highly engaging contents touching on Power BI technology in your YouTube Channel. That channel has helped us learn concepts we weren't aware of. One thing I enjoyed was family-guy-esque randomness with video clips. Please tell us why you chose that format. SB - There is a video where Adam states the concept behind "Guy in Cube". Do you both share the same philosophy? Would you like to add more to that history? RB - What's the latest with Power BI Report Server? What was the driving factor behind it? It used to reside primarily in SQL platform. SB - Patrick showed us how to enable Cortana to connect with Power BI. That was a super amazing demo. What is the idea behind it? Is it fair to say the use case would be to broadly distribute dashboards so that they won't have to login to a specific site. RB - How is security different in Power BI as opposed to SSRS? Can you share few pointers about RLS aka "Row Level Security"? SB - Is RLS in PowerBI similar to RLS in SSAS? RB - Will the Power BI team prevent the issue that causes RLS settings to reset when data is refreshed? SB - Patrick, there aren't many dashboard-ing application that has native features to utilize SQL Server Availability groups. It makes Power BI stand out. For those of us who don't know or understand High Availability feature, please tell us what it's about, and how Power BI adds value there. RB - Adam, you have recently visited Israel to visit your team of engineers there. I have been to presentations by Arina, & Aviv on excel's integration with Power BI. Have you worked with their team? What kind of exciting challenges and insights have they given you? SB - Please share any interesting stories from one of the recent conferences you both have attended? RB - Are you both going to PASS summit or any other interesting conferences in coming months? SB - How do we connect with you on Social Media? Music: (@datadotworld) - Joe Boutros & Alex Zelenak talks about this innovative Open Data Portal | File Type: audio/mpeg | Duration: 00:31:34

Joe Boutros is the Director of Product Engineering at, overseeing the product development and user research programs. He has spent the last 15 years working on early stage consumer facing technology problems as a software engineer, product manager, entrepreneur, and consultant. Joe focuses deeply on data informed product development - the magic at the intersection of measurement and testing, and hands-on user research. Previously, Joe was the founder of Indeed Labs, the innovation team inside the #1 job site in the world. His team was responsible for envisioning the future of’s product suite via invention and rapid prototyping and was the genesis of their entrance into new product categories. Alex Zelenak is a Product and User Experience Designer, currently helping people collaborate around data at Alex has spend over a decade designing and building products on behalf of agencies, enterprises, and startups. Whether through a mobile app, analytics portal, or social platform, Alex has a passion for translating ideas into positive outcomes. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - Most data experts at one point or another have been to one of the 2700 open data portals. has made it's own space. Unlike other open data portals, it is a crowd sourced data collection site aiming to have quality data. Without it, I would not have known facts such as 18 million open datasets in the world or 2.4 million websites existence during Google’s launch in 1998. We have seen some interesting data collection effort when Hurrican Harvey hit. Please tell us more about this portal. SB - Datasets subreddit is a place where many data hungry pros request for datasets. Where is it's limitation? And what kind of common 4 problems related to researching Open Data are you trying to cure? RB - Tell us more about your collaborative efforts with around 200 Data experts in various disciplines. SB - Based upon our understanding, a data project in Data.World is when you're ready to share your collected data to the whole world. Why do you make a distinction between Data Project and DataSet? RB - How do we use workspace in Data.World? Is that some sort of Integrated development environment for data focus? SB - In a dataset, do you import data or have live connection to it only? RB - Has anyone raised privacy or other related concern? How do you make sure someone is not sharing sensitive data that may fall into PHI or other relevant category? SB - How do we connect with you both Twitter/LinkedIn or blog? Music:

 Dharma Shukla (@DharmaShukla) - Founder of Azure Cosmos DB shares cool insights | File Type: audio/mpeg | Duration: 00:31:33

Dharma Shukla is a distinguished Engineer at Microsoft. He is the General Manager of Azure Cosmos DB. He is the founder of Azure Cosmos DB, which was launched officially in May 2017. It is Microsoft’s globally-distributed, multi-model database service for managing data at planet-scale". One interesting cool fact about him that he has more than 60 patents in technology industry and he is a long distance runner. Interviewer: Rajib Bahar & Shabnam Khan Agenda: SB - Why the name Cosmos? How did Cosmos DB started at Microsoft? Or Why did you decide to build Cosmos DB? We heard that it is used extensively within Microsoft, is it true? RB - What is the programming language in which Cosmos DB is written? SB - What makes Cosmos DB special? Can you give us more insights into its capabilities? (resume) RB - When we heard your podcast episode in Data Skeptic, there was a mention of "Auto Index". You explained it quite well at high level and how it gives freedom to developer from worries related to indexing as their application scale up. Our follow up question to that is how does this Auto-indexing work internally? Does Cosmos keep track of most used data internally in some kind of table/tree structure to determine this? Is this based on an existing algorithm in Computer Science realm or something propriety? SB - Can you tell us more about the new capabilities and features your team is working on? RB - We see that Cosmos DB keeps shipping new features every few weeks. Can you tell us how do you roll out new features? SB - How do we connect with you in Twitter or other professional network or blogs? Music:

 Gregory Piatetsky-Shapiro, KDnuggets President(@KDnuggets), a top Big Data & Data Science Influencer | File Type: audio/mpeg | Duration: 00:28:39

Gregory Piatetsky-Shapiro, PHD, is the President of KDnuggets, a leading site for Analytics, Big Data, Data Science, and Machine Learning. Gregory is a co-founder of KDD (Knowledge Discovery and Data mining conferences), and a top research conference in the field... He is also a co-founder and past chair of ACM SIGKDD, a professional association for Data Mining and Data Science, and a well-known Data Scientist. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - According to Forbes, you're one of the top Big Data Influencers...Your KDNuggets site has well over 60 awards and mentions as a leading publication. I often find myself reading and tweeting your articles from KDNuggets. There are few places to find exciting articles on Data Science, AI, BigData... Please tell us a brief history of KDNuggets... SB - I understand you have transitioned from researcher role to a high level editor role. What do you enjoy about it? RB - What are your thoughts on Global trend on Machine Learning, AI, & Big Data? SB - Where does automation of Data Science come into play? Is that a helpful process or distraction from useful analysis? How do you implement it? RB - We all have bias. Does Big Data suffer from any bias such as implicit bias? SB - Who are these so called Citizen Data Scientists? Why are they important? How can they serve society at large? RB - How do we connect with you on twitter & social media? Additional Reference Materials from Gregory for our listeners. Data Science Automation: Data Scientists Automated and Unemployed by 2025? The Current State of Automated Machine Learning Trends: Machine Learning overtaking Big Data Optimism about AI improving society is high, but drops with experience developing AI systems Bias in Big Data Mirage of a Citizen Data Scientist: Citizen Data Scientist Cartoon: Overfitting The Cardinal Sin of Data Mining and Data Science: Overfitting Music:

 Jen Underwood (@idigdata) - Natural Language Generation, NLG vs NLP, Automation Analytics | File Type: audio/mpeg | Duration: 00:26:21

Jen Underwood, founder of Impact Analytix, LLC, is a recognized analytics industry expert. She has a unique blend of product management, design and over 20 years of “hands-on” development of data warehouses, reporting, visualization and advanced analytics solutions. In addition to keeping a constant pulse on industry trends, she enjoys digging into oceans of data. Jen is honored to be an IBM Analytics Insider, SAS contributor, former Tableau Zen Master, and active analytics community member. In the past, Jen has held worldwide product management roles at Microsoft and served as a technical lead for system implementation firms. She has launched new analytics products and turned around failed projects. Today she provides industry thought leadership, advisory, strategy, and market research. She also writes for InformationWeek, O’Reilly Media and other tech industry publications. Jen has a Bachelor of Business Administration – Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science – Data Mining from the University of California, San Diego. Interviewer: Rajib Bahar, Shabnam Khan - WSJ had an article on automation analytics recently. As if we don't have enough terms to keep track of such as descriptive analytics, predictive analytics, prescriptive analytics... What is the deal with automation analytics? Are they calling automatically scheduled jobs automation analytics? Or is this concept completely different? - According to Gartner, “By 2019, natural-language generation will be a standard feature of 90% of modern BI and analytics platforms.” NLG was also cited by Forbes in 2017 as a Top 10 Hot AI technology. What is natural-language generation? How does this subfield of AI differ from Natural language processing or NLP? - Recently, you released a white-paper on "Humanizing Enterprise Application Software with Natural Language". Would you like to share the lessons you have learned? - What major forces are currently driving demand for Advanced NLG? - How do Basic & Advanced NLG work? - Are there any benefits of embedding NLG into applications? - Is Quill by Narrative Science the only NLG product in this area? How does it compare to the competition? Please share it's pros and cons to other similar platform. - how do we connect with you in Twitter or other professional networking sites? Music:

 Dan English (@DEnglishBI) - Data Platform MVP, PASS Business Analytics Leader (@passbavc) | File Type: audio/mpeg | Duration: 00:16:40

Dan English is a Microsoft Data Platform MVP, author, speaker, community leader, husband, and father. He is the Group Leader for the PASS Business Analytics Virtual Group. Also, He is a Business Intelligence Architect and Community Leader with a strong passion for Microsoft technologies. Specializes in Business Intelligence, Microsoft BI Toolset, Analysis Services (SSAS), BI Semantic Model (BISM), Datazen, Excel, Integration Services (SSIS), OLAP, PerformancePoint (PPS), Power BI, Power BI Desktop, Power Map, Power Query, Power View, Power Pivot, ProClarity (PAS), Pyramid Analytics (BI Office), Reporting Services (SSRS), Report Builder (RB), SharePoint Server, SQL Server. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - I used to work in your team... You would do your part in getting the team involved in technical community. As I recall at the time you were involved with PASS BA Virtual Group. What was the motivation behind it's origin? Why is it important? SB - Let's talk about SQL Saturday program which provides the tools and knowledge needed for groups and event leaders to organize and host a free day of training for SQL Server professionals... Several years ago you interviewed several experts local to the Minnesota's technology community. Those were broadcasted via KFAI radio later. What were some of the most interesting things you learned from that experience? RB - You were involved in co-authoring and technical editing of multiple books. Most technologists we talked to found the book writing process to be a painful set of activity... What was your experience like? SB - Microsoft runs this MVP program, which recognizes notable professionals in their area of expertise. You are a data platform MVP. What is the MVP program about? Is it based on technical knowledge or serving of such knowledge? SB - Any thoughts on Gartner's ranking of current Business Intelligence tools, practices, platforms? RB - When you're not a technologist, you're a father and often coach your kids soccer team. Do you bring your work to your coaching experience or coaching experience to work? or both? SB - How do we connect with you on twitter or blog or linkedin? Music:

 Dave Borean (@daveborean) - MDM Data Architect, Customer Intelligence 360 in AllSight | File Type: audio/mpeg | Duration: 00:15:30

Dave leads the technology strategy for AllSight and works with clients on their initial implementations of AllSight. He has many years of experience working on the architecture and development of mission-critical enterprise applications in both the product and custom solution spaces. Dave came from IBM where he was a Senior Technical Staff Member responsible for leading the InfoSphere MDM portfolio of products and working directly with clients around the world. Interviewer; Rajib Bahar Agenda: - In your previous role at IBM, you were a MDM Data Architect. How has that role related to your Customer Intelligence 360 initiative? - What's Entity Resolution in the Big Data Scene? - I'm interested in learning about as many techniques involving stitching together and de-duplicate customer data on Big Data platform? What operational and analytical use cases do they have? - As it relates to Customer Intellgence 360, what are some Machine Learning and Graph DB solutions out there? - How do we connect with you on twitter / social media? Music:

 Brad Rubin (@bradrubin) - Director of the Center of Excellence for Big Data, Hadoop User Group | File Type: audio/mpeg | Duration: 00:12:45

Brad Rubin has been a professor in the Graduate Programs in Software department in the School of Engineering at the University of St. Thomas for the past 13 years. He is also Director of the Center of Excellence for Big Data, and teaches a course in Big Data Architecture, along with courses in Computer Security and Scala. He co-leads the Twin Cities Spark and Hadoop User Group. Previously, he spent most of his industry career at IBM in Rochester, MN. Brad has degrees in Computer and Electrical Engineering from the University of Illinois, Urbana and a doctorate in Computer Science from the University of Wisconsin, Madison. Interviewer: Rajib Bahar Agenda: - You are one of the organizers of TC Spark & Hadoop user group. What motivated you to start it? When did it start? We would love to hear about it's history. - In your role as Director of "Center of Excellence for Big Data" in the Graduate Programs at the University of St. Thomas, how do you make sure your program is actually relevant to the industry? In early 2000, computer science curriculum were generally behind from industry trend. Do you find your program having similar struggle? - What is the difference between your Data Science Certificate & Master of Data Science program? - Is your Big Data curriculum technology agnostic? I am just wondering... Do they employ only open source framework or both commercial and open source? - How many Big Data and Data Science experts have graduated from University of St Thomas's program? - Now, back to TC Spark & Hadoop User group... What has been the most interesting presentation? - How can we get connected with you online or social media? Music:


Login or signup comment.

Rajib2k5 says: