Instagram's Cassandra Layer with Michaël Figuière | Ep. 122 Distributed Data Show




Distributed Data Show show

Summary: Instagram engineer Michaël Figuière talks with Jeff Carpenter about the Cassandra abstraction layer his team maintains, how it helps development teams move faster, and when companies should consider creating their own abstractions. Highlights: 0:00 - Welcoming Michaël to the show and recapping past conversations on Cassandra usage at Instagram including the RocksDB storage engine we discussed with Dikang Gu and the geographic replication approach that Andrew Whang presented at the 2019 DataStax Accelerate conference 1:50 - Michaël joined DataStax in 2012 to lead the driver team, with his particular focus on Java. Since then he's worked for some large Cassandra users including Instagram. 3:54 - Michaël encourages users to provide feedback to the DataStax drivers team. It's a great thing when companies in the Cassandra community help each other out! 5:45 - Michaël is now at Instagram, which is one of the biggest users of Cassandra since they have over 1bn users active on Instagram's service monthly spread across the globe. 7:09 - Instagram has built an abstraction layer in front of Cassandra that limits what CQL features are exposed. This stateless API is based on a simplified version of the original Cassandra Thrift API, since Thrift is the main RPC interface style in use at Instagram. 9:22 - Having an abstraction layer provides an insertion doing double writes to main and shadow clusters for testing or hardware migration purposes. 11:38 - The abstraction layer helps teams move faster by making it simpler to experiment with new client side configurations for load balancing, traffic optimization, and monitoring. 14:01 - One thing they haven't tried to abstract away is the Cassandra data modeling best practices of creating a denormalized table per query. Michaël's team provides a UI to allow internal customers to define their tables including desired quality of service and anticipated growth. 15:56 - Instagram uses a homegrown Graph database built on MySQL as a significant part of their data infrastructure alongside Cassandra. 16:57 - Challenges in building the Cassandra abstraction layer included the migration from Thrift API to CQL and optimizing load balancing given Instagram's unique networking requirements. 18:25 - Companies may want to investigate creating their own Cassandra abstraction layers when reaching usage of certain size.