tdau

Cassandra on Nimble: Switching From Local Commodity Drives is Worth It

Blog Post created by tdau Employee on Dec 9, 2014

Everyone wants to know — management, customers, and prospects alike — whether it really makes sense to run a Cassandra database on Nimble Storage: how efficient is it really, when you look at the whole solution? I mean, everybody knows that Nimble can bring a lot of value to the table, but can it really offset the incremental cost to switch from local commodity drives for a Big Data application?

 

I’m here to set the record straight: Absolutely. And when I spend time breaking it down for customers and prospects, explaining the efficiency, resiliency, and performance benefits against the investment, they are able to reach the same conclusion. But first, a little history.

 

In the early days of NoSQL databases like Cassandra, the storage design was left intentionally simple and inexpensive. Storage would be configured as local disk on the compute nodes, which were typically just low-cost commodity servers. Until recently, there hasn’t been much change in this part of the Big Data stack.

 

A common node setup comprises 8 to 16 CPU cores, 8 to 16GBs of memory, and a few 10K or 15K RPM drives. To save storage capacity, and with the data stored on local disk, Cassandra provides application-level compression. To prevent a SPOF in a single data center cluster, a replication strategy such as SimpleStrategy might be used. For clusters deployed across multiple data centers, a strategy such as NetworkTopologyStrategy can be used.  This works well for a small cluster with a few nodes. But in a large cluster (hundreds or thousands of nodes), is local storage cost-effective?

 

In order to architect a cost-effective and performant Cassandra cluster with high availability and data protection, you need to understand the requirements of the application and translate those into resource requirements. You might run some tests and then try to extrapolate to a required level of CPU, memory, and storage capacity. Then, you might apply some multiplier for redundancy.

 

The inefficiencies due to local storage start to become apparent when you look at this exercise, and what ends up happening as you go through your requirements. It’s a mess because resources can’t be sized or scaled independently – not even close. Data protection can require significantly more compute and storage.

 

For example, how much capacity do you need? Take what you need for your application. Now, double it for HA. Oh wait – triple it, since replication has to protect cluster availability, data availability, and data integrity. With Nimble, you can often consider dropping your replication factor by one. So let’s say at this point that with local disk, you’ll have 3/2 the node count you might with Nimble.

 

You can use application-level compression to reduce the capacity requirement… right? Well, theoretically, except, now your host cores will spend time doing compression, meaning you need more nodes to maintain performance. In fact, using Cassandra compression vs.Nimble compression can reduce your performance by 50 percent. So to maintain performance, you would need twice the nodes. And since each node comes with a fixed amount of capacity, you’ll end up with even more capacity than you started with, unless you reduce the density of the HDDs you buy. In any event, let’s say that now with local disk, you’ll have 3X the node count you might with Nimble.

 

Now, how do you back up a petabyte of data? Carefully, it turns out. If you want to be running backups off your main cluster, you will have to over-provision that cluster again so that the backups don’t destroy performance. You could create a hot standby to run backups from, or you could move that hot standby cluster off-site, providing real DR. In any of these cases however, the outcome is the same: once again multiply your node count by 2X. So now with local disk, you’ll have 6X the node count you might with Nimble.

 

Want to use this cluster to provide interactive query sessions for users, or back a customer-facing website with it? Well, if you’re using Nimble, the hot blocks and metadata will be placed in the SSD cache automatically, providing the IOPS you’ll need for a pleasant interactive experience. The same performance might take 5X-10X the number of spindles, depending on your specific implementation details. For now, let’s just use another 2X multiplier. So now with local disk, you’re looking at a requirement for 10X+ the node count you might with Nimble.

 

Let’s discuss using shared storage in a Cassandra environment. I know that hardcore NoSQL/Hadoop users are not a big fan of shared storage. NoSQL/Hadoop was started with local storage on commodity servers because they are cheap. This seems to work great because legacy shared storage was too expensive and did not always work well in a NoSQL/Hadoop environment. Here is an analogy.  We all know that a motorcycle has 2 wheels and that everybody loves motorcycles. The feeling of quick acceleration, popping wheelies, and hugging corners so low that the foot peg scrapes the road to make sparks (I did this myself).

 

But did anyone ever think of riding a 3-wheel motorcycle? I don’t think so.  You would ask, “How can I hug corners or pop wheelies on a 3-wheel motorcycle?” It’s the same with shared storage. Not all shared storage is created equal. Some shared storage works better for one type of environment (i.e., popping wheelies or hugging corners) while others work better for other types of environments (i.e., comfortable cruising or safety). At Nimble, you’ll find a superior product from a price/performance perspective — and it’s geared toward Big Data acceleration.

 

I can't speak for other shared storage but at Nimble, I’ve  had the opportunity to test Cassandra along with other NoSQL databases and Hadoop to find out how our Nimble Adaptive Flash and Cache Accelerated Sequential Layout (CASL™) architectures provide benefit. I have been honored to work with our Chief Data Scientist Larry Lancaster to learn more in-depth about Nimble product performance in these environments.  Together, we’ve worked closely in the Big Data space to dissect the I/O patterns of different workloads, not only from the storage perspective but also at the application and operating system levels.

 

We have concluded that running Cassandra on Nimble yields the best total cost of ownership (TCO). Nimble delivers performance, space efficiency, data protection, availability, and an order-of-magnitude density increase at a very low price-point.

 

As we’ve discussed here, then, Nimble lets you:

  • - Leverage Nimble Adaptive Flash for random I/O and CASL for sequential IO for better performance.
  • - Use Nimble inline compression and turn off Cassandra compression to reduce unnecessary CPU cycles on the cluster nodes; hence more operations per second.
  • - Use Nimble snapshot for cluster backup and recovery (no extra cost).
  • - Use Nimble replication to replicate the cluster to another datacenter for Disaster Recovery or Test/Dev (no extra cost).
  • - Use Cassandra replication factor of 2 for node redundancy only; hence storage savings.
  • - Scale storage independently of compute nodes and vice versa.
  • - Lower datacenter footprint, power, and cooling costs by utilizing denser compute nodes.

 

The picture below shows the difference between using Nimble and local disks.

architecture.png

I would like to thank our Chief Data Scientist Larry Lancaster for his contribution to this blog.

larry.png

Outcomes