Distributed Data Show Episode 41: Graph-based Genealogy with Dave Bechberger




Distributed Data Show show

Summary: Migrating from a Relational application to a Graph based application is an undertaking that takes forethought, planning and the right use case. The challenges with taking a team used to working in a relational world and transitioning them to a distributed, eventually consistent system based on a graph are many. In this episode we talk to Dave Bechberger who is the Chief Software for Gene By Gene which is a Bioinformatics company specializing in Genetic Genealogy. Dave will share his experience and learnings from migrating their relational application to a graph based one. Highlights! 0:15 - We welcome Dave Bechberger to the show and learn how he got into big data technologies like Apache Cassandra and DSE Graph 1:38 - Dave introduces his current work with Gene by Gene applying graph technology to genealogy applications 3:43 - The performance of legacy systems was database bound, so they began to look at using non-relational databases, especially graph, starting with a family tree application. 6:19 - Dave describes how his team identified a graph datastore as the best approach for the family tree functionality. Family tree queries can be very recursive - for example: how are these two people related? 7:43 - The biggest challenge in adopting graph technology was training - Gremlin queries require a different way of thinking. At the same time, they were also migrating to a microservice architecture style based on asynchronous event passing. 10:04 - The benefits of all this change outweighed cost. The biggest “do different” they identified was to Invest in upfront training on pragmatic approaches to graph. 11:24 - Why specialization can be beneficial to a team learning multiple new technologies. 13:06 - The graph schema evolved over time - while their initial schema was based on an industry standard, they ended up adding additional vertex and edge types, as well as indexes to help optimize queries for both analytical and transactional use cases. 15:31 - Dave’s team iterated on both their data model and their graph traversals. Denormalization is a key technique to work around vertices with many edges. 16:42 - Dave’s advice for scaling up on a graph database is similar to familiar guidance for Cassandra: know your data and how you’re going to query it. 18:13 - Tooling is a major area of growth for graph databases that will help spur adoption