"The wind and the waves are always on the side of the ablest navigator."
--Edward Gibbon, Historian
As an IT professional focused on both critical applications and storage, you’re probably sick of warnings about the torrent of bits and bytes predicted to overwhelm your organization. No doubt about it, we’re in the midst of a data deluge. This three-part blog series offers comprehensive guidance on choosing – as well as managing – a storage system with the level of the scalability that can keep your organization floating above a churning sea of data.
With storage, you will be faced with navigating the seven C’s of scalability: Capacity, Compute, Cache, Clustering, Configuration, Continuity – and the C most important to those in upper management: Cost Efficiency.
The seven C’s of storage scalability are all interconnected, and with the right storage platform you can successfully navigate your way through to successfully supporting your organization as it faces uncertain data growth patterns and the changing performance demands of current and future critical applications.
Smooth Scaling, Part 1: Scaling Methodology and the Seven C’s
No storage project should begin on a low note, but here it is: Even with the best tools and the most comprehensive input from groups across your organization, it’s impossible to accurately predict how much your performance and capacity needs will need change over time. That’s why scalability is so important. A truly scalable storage solution can easily and cost-effectively adapt for a wide range of workloads, accommodate more end users, and house a greater amount of data. The following framework will allow you to determine whether a particular storage product offers the degree of flexibility you need.
The methodology is this: First, scale performance and capacity within a single array, and then both, simultaneously, via a scale-out cluster.
Almost every storage vendor will tell you how big and quickly their system can grow. But how that scale is achieved varies significantly. The likelihood that you’ll smoothly scale across all C’s increases dramatically once you get a look at what’s under the hood of the array, and at the underlying file system and its set of scale-out capabilities.
The first C’s to consider are Capacity (predominantly disk-based), Cache, and Compute.
Yesterday’s storage architectures (which can be found underlying many of today’s current storage products) offer only limited scalability. The reason is that with legacy storage systems ,including those with bolted-on flash, disk speed and spindle count drive performance. To generate more IOPS means connecting more disk drives or shelves to the head unit. This type of scaling is simple, but ultimately wasteful. That’s because capacity also scales – whether you need extra space or not. And, you’ll also need more power and cooling, which present additional costs. What started out as a simple upgrade can result in a swell of unused capacity and wasted OPEX.
The latest generation of storage is built on some combination of disk (Capacity), flash (Cache), and controllers (Compute) driven by multi-core CPUs. Hybrid storage (flash combined with disk) and all-flash storage each take a different approach to leveraging flash. And, each approach impacts scalability. The most crucial factor to consider is a product’s ability to scale performance and capacity – independently. This guarantees storage resources will be deployed with a minimum of waste. (Cost efficiency considerations will be addressed in greater detail later on.)
As mentioned earlier, performance and capacity are conjoined in legacy storage architectures. There’s no getting around it: Yesterday’s file systems simply can’t make up for disk’s cripplingly slow random write speeds, or lay out data for better performance. On the other hand, all-flash arrays deliver massive performance, and offer flash and controller CPU upgrades. However, scaling capacity within an all-flash infrastructure can be expensive, depending on the effectiveness of data footprint reduction methods, such as compression and deduplication.
Nimble Storage’s CASL™ (Cache-Accelerated Sequential Layout) architecture enables independent performance and capacity scaling because it decouples performance from disk. Although it incorporates flash and disk in a single system, CASL is a CPU-driven architecture. Its unique data layout sequentializes random write data and writes it to disk in full stripes, taking full advantage of disk’s excellent sequential write performance. And, it makes efficient use of disk space, which is easily expanded through the addition of disk shelves. CASL leverages flash as a cache for active data to accelerate read operations. It, too, can flexibly scale to accommodate entire working sets across different applications by adding higher-density SSDs or even an all-flash expansion shelf. Controllers can be upgraded without disruption to those with more CPU cores to scale overall IOPS. (See Nimble’s product portfolio.)
A storage solution with best-in-class scalability allows you to upgrade what you need, when you need it.
The table below summarizes the initial set of scalability characteristics that should be carefully considered when evaluating a particular solution’s scalability.
CPU-driven storage architecture
Cost-efficient capacity scaling
Common scaling / upgrade hardware across multiple storage product families
Smooth Scaling, PART 2: Clustering, Configuration and Continuity
This installment of Smooth Scaling focuses on scale out, and addresses some of the key clustering, configuration and continuity considerations for selecting a storage solution for higher levels of scalability.
In fast-growing organizations, where near-term capacity growth patterns and performance demands are ambiguous at best, creating a scale-out cluster is a great strategy to mitigate those uncertainties while consolidating multiple workloads onto a common storage infrastructure. Furthermore, it offers significant cost benefits which will be addressed in greater detail in part 3 of this blog.
The upside to scaling out can be substantial, but without the right type of scale-out storage solution, IT teams risk drowning in configuration complexity, excessive performance overhead, or worse: having most of their waking hours consumed by manual data migrations.
Building a storage cluster shouldn’t require the services of a network engineering team; the scale-out cluster should be easy for the storage team to set up and configure. Smooth scaling would involve a solution that requires no dedicated backplane, automates network connections and simplifies creating and defining storage pools (a pool is a set of arrays in a cluster over which data of resident volumes is striped and automatically rebalanced). And, since smooth scaling requires flexibility, the right scale-out solution can allow for mixing and matching of arrays across product families within the same cluster. For example, Nimble Storage’s scale-out architecture allows for up to four of any Nimble CS-Series Arrays to be clustered together.
It is not uncommon for IT organizations to periodically modify clusters. For example, capacity, compute or cache within individual array nodes might need to be upgraded, or an array might be repurposed as a replication target and replaced with a higher-performing one. Regardless, it is essential that the applications supported on the cluster continue to run without any disruption. Not only should scale-out storage facilitate seamless upgrades to individual nodes, it should allow arrays to be easily added or removed, with the resulting data migrations handled in a robust, automated fashion.
Again, Nimble Storage is a great example of the kind of scale-out architecture that expertly automates data management across a cluster as it changes. For example, if an array is to be removed from the cluster, Nimble transparently (and non-disruptively) migrates the data volumes off of the outgoing array to the remaining array(s) in the pool.
Storage clusters are even configured specifically to enhance storage management by leveraging the ability to migrate volumes between arrays without disruption. Though automated data migrations can sometimes take hours to complete, this practice frees up IT teams for more productive endeavors.
Scaling Capacity and Performance Within a Cluster
Here’s what to expect in terms of how performance and capacity scale through clustering.
Naturally, performance and capacity scale simultaneously when configuring multiple arrays in a scale-out cluster. While capacity scales to the sum of the individual arrays’ effective capacities, performance scaling though clustering is less straightforward, as some level of performance overhead is incurred. For many current scale-out storage solutions, data has to be forwarded between arrays in order to fulfill an IO request, primarily due to a lack of intelligence as to which array in the cluster that piece of data belongs to. The process of data forwarding between nodes adds considerable latency and unnecessary compute load.
Nimble Storage’s scale-out architecture successfully scales performance while minimizing scale-out performance overhead. Firstly, it performs fine-grained striping of data across arrays (cluster nodes), allowing the volumes that span those arrays to fully leverage the combined cache and compute resources. Secondly, Nimble employs an intelligent Multi-Path IO (MPIO) module at the host, which figures out to which array in the cluster a piece of data should be directed. IO load is dynamically balanced at the host, minimizing impact to cluster performance. The overall result is performance that scales in a linear fashion.
Here is a summary of key scale out characteristics and their benefits that should be carefully considered:
Scale-Out Storage Characteristic
Mix/Match: Cluster arrays from different product families within the vendor catalog
Flexibility in scaling
Extension of useful life of older arrays
Automated network configuration
No dedicated resources or specialized networking expertise required
Rapid setup and connection of the cluster to multiple hosts
Striping of data across multiple arrays
Allows volumes to leverage the combined hardware resources of pooled arrays
Minimizes forwarding of data between cluster nodes, which can negatively impact latency and IOPS scaling
Non-disruptive configuration and upgrades
Critical applications can keep running while storage is adapted to support changing needs
Gain operational efficiency through managing all storage hardware as a single entity
Smooth Scaling, PART 3: Scalable Storage’s Seventh ‘C’: Cost Efficiency
“Doing more with less” has become the reigning mantra among IT professionals across both technical and management functions. Makes sense given today’s perfect storm of exponential data growth and lean budgeting.
Here’s a brief recap of the scalability characteristics that are a must-have in order to maximize storage ROI and streamline management processes to free up IT teams from tedious manual processes.
Scalable Storage Characteristic
ROI and Cost Benefits
Independent scaling of performance and capacity
Scale to meet immediate needs at the lowest incremental cost
Scale Out, with mix/match arrays from different product families within a cluster
Extend storage ROI by continuing use of arrays whose IOPS and/or capacity might otherwise be exhausted
Consolidate multiple critical applications on one storage infrastructure, managed efficiently as a single entity
Cluster configuration with simplified networking (no dedicated backplane required)
No special resources/equipment required, fast time to cluster deployment
Non-disruptive upgrades and seamless cluster configuration
Avoidance of costly application downtime
Automated data migration across the cluster
No valuable time taken away by laborious manual processes
Given the diversity of use cases and environments today, a detailed analysis is required to estimate scalable storage-driven cost savings. However, a good storage vendor should be able to present you with a realistic (and compelling) total cost of ownership (TCO).
Let’s look at a scenario that highlights two approaches to scaling via which a best-in-class scalable storage solution can deliver ROI benefits.
Imagine a rapidly growing small-enterprise technology company with 6,000 current employees worldwide. The company’s primary critical applications include VDI (4,500 desktops), MS-Exchange (6,000 mailboxes) and 70+ SQL databases. A small (and busy) IT team manages the company’s entire storage infrastructure, working closely with the staff who administer virtualization, database platforms and e-mail. The company’s MS-Exchange installation is the sole tenant of a Nimble Storage CS500 Array (~70TB effective capacity with ES1-H45 disk expansion shelf). Using InfoSight, Nimble Storage’s cloud-connected support and management system, the team can see that the Exchange volumes currently occupy 92% of the array’s total effective capacity, projected to be used up in 8 weeks. However, the array is performing well below its maximum throughput, with IOPS and cache to spare. For the company’s rapidly expanding SQL platform, it will deploy another Nimble Storage array-- a CS300 (minus expansion shelf). The database team’s revised performance spec lists 40,000 IOPS as the minimum requirement. However, SQL will only require 35% of the array’s effective capacity.
One array needs more capacity; the other, more IOPS. The storage team can follow one of two viable scaling approaches.
OPTION 1: Upgrade Individual Arrays
Since Nimble Storage enables independent scaling of performance and capacity, the problem can be solved with a simple upgrade to each array. For the MS-Exchange array, capacity can be easily expanded by adding an ES1 disk expansion shelf, which is a 3U appliance that connects to the array, growing effective capacity by up to 68 TB with a single shelf. For the SQL deployment, the Nimble CS300’s throughput can be easily augmented by a factor of 3x with a simple controller upgrade, converting it to a CS500 without any downtime. The benefit of this approach is that performance and capacity upgrades can be tailored to individual needs, quickly resolving bottlenecks without disruption, and at the lowest incremental cost.
OPTION 2: Cluster Arrays in a Scale-Out Configuration
Pooling all storage resources into a cluster gives the Exchange array access to the SQL array’s excess capacity, and SQL can leverage the majority of the combined IOPS. And all that can be achieved without spending a dime. Though configuring the two-node cluster is slightly more involved, considering the short time needed to configure host connections and rebalance each of the volumes across a newly defined storage pool, it greatly extends the ROI of the current storage infrastructure while deferring upgrade costs for several quarters. Also, the consolidation of storage resources eliminates the previous silos, making everything easier to manage from a single console.
Nimble is an example of highly flexible storage scalability, but it can sometimes be difficult to determine the optimal approach. With its powerful data sciences engine, InfoSight generates guidance on the optimal approach to scaling cache, compute and capacity within a single array, or within a scale-out cluster. That approach is presented to the customer with product-level specifics on upgrades and expansion shelf additions. Whether scaling involves individual performance and capacity upgrades or clustering arrays, critical applications will continue to run without disruption.
In conclusion, the storage solution that scales seamlessly, simply and most cost-effectively is the one that will keep you and your IT team (as well as your entire organization) safely afloat in what’s become a vast and turbulent sea of data.