In my 20 years experience managing Information Technology products and projects, I've seen many waves of technology emerge, have an impact, and then be superseded by even newer technologies. This is a great time to be focused on enterprise data storage, because new waves of technologies like virtualization (server and now desktop) and hybrid storage arrays are disrupting old-school incumbents. What is this transition, and why’s it important for pretty much all medium- to large-sized IT shops?

Old-School Storage

Let’s start with the “old school” enterprise storage vendors that have emerged over the past two decades. Storage Area Networks (SANs) were invented to solve a problem – IT managers desperately needed more capacity and a higher rate of performance from their existing storage systems. In the old scheme, called direct attached storage (DAS), drives resided within a compute server, so adding more capacity was more complicated than just attaching another hard drive.


Performance was another problem: new software applications needed a higher rate of performance (measured as input/output operations per second, or IOPs) than legacy storage systems were able to deliver. The typical solution was adding more hard drives (the idea being if two drives are better than one, then 16 drives are faster still).


These SANs were designed to render these storage systems into more than just a sum of their parts. They cached reads and writes, spreading input/output (I/O) functions across spindles (i.e.: physical disk drives) to eliminate hot spots. In other words, the system shares the work and overall performance benefits as a result.


But SANs using traditional hard disks are constrained by severe design tradeoffs: they’re either optimized for performance or for capacity; hard drives are not immune to the laws of physics. As size shrinks, rotational velocity increases so that a 15,000 revolutions-per-minute (RPM) drive holds 600 gigabytes (GBs) while one that spins at 7,200 RPMs can hold 1, 2, 3 and even 4 Terabytes (TBs) of data.] Historically, SAN vendors have dealt with the performance versus capacity challenges by caching and tiering data between different drives. For example, “hot data,” the data most recently read and written, is stored on smaller drives. As demand for it cools, SAN software copies that data to larger (and slower) drives. That’s fine until the data (now on the slower drives) gets hot again. Returning it to a smaller disk for quick access can take minutes – sometimes hours.


Addressing the Problems

Given the real-world limitations of early storage area networks, vendors rolled out a number of new features, including:

  • Replication: The ability to send a copy of data as it’s written to another SAN. Sending copies to a storage system miles away protects it in the case of a system failure, or other disaster, at the site of origin.
  • Data Compression: SAN compression eases the processing load of the host system by off-loading data compression to a dedicated piece of hardware.
  • Deduplication: First used to reduce backup storage’s capacity needs, it creates a catalog of byte patterns (already safely stored on backup media), so that instead saving the files again, they can be referenced by a pointer.


Backup is Crucial

Appreciating the role of backup storage plays in the data center is crucial to understanding the newest generation of primary storage solutions. Inside any computer or server, active data is stored in random access memory (RAM). Hard drives store tons of inactive data until it needs to be sent to RAM. Over the passage of many weeks, months and years, sooner or later some of that data will be lost or corrupted. Backup copies ensure that missing data can be accurately recreated.


Until recently, copies were stored on tapes, which for years were much less expensive than hard disks. The downside is that pinpointing a file’s location requires manually winding and rewinding the tape. As hard disks dropped in cost to become almost as cheap as tape, a nifty algorithm eliminated the need for multiple copies. Deduplication algorithms can detect minute differences between copies; rather than store an entire document it points to any changes that have been made.


A New Wave Emerges

Starting around the year 2000, data storage technology was significantly disrupted by the emergence of flash solid state drives (SSDs). Although ideal for some consumer applications, such as smart phones, flash storage was at first an unlikely replacement for spinning disk drives, primarily due to cost and technical limitations.


Flash eliminates moving parts, a great advantage when it comes to read performance. But writing data to flash cuts short a flash chip’s already brief lifespan, and flash offers far less capacity when compared to hard disk. That doesn’t mean flash doesn’t have a role in the data center, it means that it needs to be used appropriately. In a well-designed system, flash and disk are complementary. It simply comes down to understanding the challenges of each and using the right technology for the right job. That’s why the “new school” technology that’s shaking up enterprise data storage is the hybrid array, combining the best features of spinning disk and solid-state flash to provide BOTH performance and capacity.


Testing New-School Storage

So, if hybrid data storage is the hot new technology, does it really matter which vendor you choose? Yes, it does; many flash-and-disk hybrid storage systems on the market today are nothing more than an old-school storage system with a flash cache. Some vendors tout the impressive IOPS performance of these systems, but look more closely and you’ll find the numbers don’t hold up.

Specifically, an IOPS metric means nothing without context. You need to ask: how big is the operation that’s being tested? The smallest practical IOP test should be no smaller than 4 KB because almost no applications read and write data in smaller chunks.


What specific demands are being placed on the storage system? What is the mix of read and write operations? (We use a 50/50 mix at Nimble.) Be sure not to rely on metrics based on read-heavy operations where an all-flash cache will have an advantage but may not reflect your environment. Ask about write performance – it’s likely to be far less impressive, but is absolutely critical for long-term performance growth for your future storage needs.

Finally, use the right tools. That means use a storage-performance testing tool like IOmeter (2010 edition) and use the “Random” test pattern. You should also test for a minimum of 60 minutes; look at a system’s performance over the course of an hour or more. The following performance chart show the dramatic drop-off of a competitor’s supposed “superior” performance, but as you can see neither the read nor write performance can sustain over time (1-hour in this case). Within a few minutes their performance degrades rapidly and never returns.


Competitor Epic Performance Failure

(White = Total IOPS, Green = Reads, Red = Writes)


Storage systems are like high-rise buildings: they require a foundation. That’s why incumbent storage vendors are having a hard time competing against the upstarts – their file system infrastructure prevents them from embracing new technology. In the old days, expanding storage capacity was all about adding more spindles, whereas today’s new systems are no longer spindle-bound.

Similarly, until recently most processor chips contained a single “core”, but today multi-core processors are the new standard. That’s why Nimble Storage was able to achieve such incredible performance numbers, right from the start: there’s more CPU power under the hood.


As for capacity, some storage vendors have seen performance decline quickly as storage space becomes limited. For instance, users might experience a significant drop-off in performance after their system reaches 50 percent capacity. But Nimble’s CASL (Cache Accelerated Sequential Layout) file system was designed from the ground up to use flash and hard disk storage uniquely, each to their best advantage.


The magic’s in the software: CASL determines how data is laid out on cache and on disk so that it is both easy accessible and protected. And that’s the biggest difference between yesterday’s “old school” vendors and today’s latest generation of enterprise data storage systems.