Cassandra Day Tokyo Reflections | Ep. 115 Distributed Data Show




Distributed Data Show show

Summary: Patrick and Jeff cover some of the universal questions that come up on "day 2" for Cassandra users around batches, lightweight transactions, secondary indexes, and materialized views. They also challenge some of the biases around using Cassandra for use cases such as banking. Highlights: 0:23 - Recapping a great day hosted by Yahoo! Japan 1:20 - Introducing some of the universal questions we hear no matter where we are in the world 4:11 - Many of the questions we hear most frequently have to do with skills vs. knowledge - knowing when to apply a technique depending on your use case 5:55 - The big 4 we get asked about - when is it ok to use batches, lightweight transactions, secondary indexes, materialized views 6:35 - Batches in Cassandra are not the same as in Oracle, where they refer to bulk loading. Batches are a useful way to group writes to multiple denormalized tables. the key thing to think about is the total amount of data volume being written 9:34 - Materialized views are an acceptable way to manage index tables. If you have a lot of writes, write amplification can be a problem. Repairing materialized views can also be a challenge in Cassandra 3.x, there are some reliability improvements in 4.0. 11:43 - Cassandra has historically provided a lot of flexibility and features which in some cases could be misused. Make sure to test out your approach at scale as much as possible. 13:22 - Secondary indexes are for convenience, not for speed, as they would be in a relational database. Inequality searches on a single partition are an instance where these indexes will scale well. 16:08 - Lightweight transactions in Cassandra are not ACID. They exist to address a particular set of race conditions in distributed systems - uniqueness check on creation and check-and-set. 19:04 - Banking is often cited as a use case for which Cassandra is not well-suited, but people often forget that these systems are historically built using a ledger data model and reconciliation. 21:15 - Hey listeners - send us your ideas for future episode topics, including use cases where you wonder whether Cassandra will be a good fit.