“Snapshots are NOT backup!”
I'll admit I was a little surprised to hear that from a prospect recently, as my experience has been that the usefulness of snapshots always depends on how they’ve been implemented. So I asked him to explain why he felt that way, and perhaps explain a situation where he perceived this to be the case.
Here’s what I learned:
- The customer had been trying to restore a SQL database that had apparently corrupted silently at some point, but appeared to be functional and normal while still in production.
- They had configured daily application-consistent snapshots of the volume upon which the SQL database resided.
- It was discovered, after some length of time, that the SQL database was corrupt and needed to be restored to a previous version (prior to the corruption). Easy enough, or so they thought.
- The incumbent storage vendor’s support folks then tried everything they could to get previous snapshots restored and mounted – in the hopes they could find at least one snapshot which occurred before the data were corrupted.
- It proved a futile endeavor, as they eventually got to the oldest snapshot, only to discover that it too held a corrupted version of the SQL database.
At this point the exasperated customer and vendor support folks agreed that restoring from a tape backup was the only solution. This had always been an option, but they knew it was a painful and lengthy operation to restore from slow tape devices, not to mention they’d have to engage the backup support folks and schedule and arrange for the restore. After multiple attempts and several more days of tape restore madness, this eventually proved successful.
As part of a post-mortem, the vendor’s support specialist, account rep and sales engineer explained to the customer that he shouldn’t have relied on snapshots in the first place as they are not a substitute for backup. In fact, they suggested a proposal to upgrade the existing backup software and aging tape devices right away before the next corruption occurred. Pressing further along these lines, the vendor even quoted and sold the customer a dedupe-enabled VTL (virtual tape library) to help speed up the backup and restoration process, thus ensuring a quick and speedy recovery the next time. In the end the customer came away from the experience with the notion that “snapshots are not backup”.
After hearing this story I asked the customer how far back they had to go to find a good version of the SQL database. His answer was roughly 20 days. Was that a snapshot, I asked? No, that was where they had restored from tape – the snapshots only went back about ten days.
That’s the moment when the little light bulb appeared above my head. Is it possible, I prodded, that you could have restored from snapshots if your snapshot retention went back 20 days?
He looked at me strangely and said sure – then added that you cannot have snapshots going that far back. Even if you do, they would consume way too much disk space, and the performance of the volume the SQL database resides on would nose-dive. And, he continued, if the storage array blows up or gets destroyed in whole all your snapshots are toast too. Worse still, the customer’s support folks recommended against this approach.
That all makes sense. But what, I asked, if you could retain snapshots going back 60 or even 90 days, without consuming more than the compressed, incremental changes to the SQL database, and without any performance degradation on the volume or the array? He said it seemed like magic or marketing hype.
Pushing on, I asked, what if you had a byte-for-byte copy of that data efficiently replicated offsite leveraging the same compressed, incremental bits used by the snapshots, and were able to extend that retention separately for a longer length of time as a DR (disaster recovery) strategy? Furthermore, I explained, what if you could still hang on to those slow, aging tape devices to do backups as needed for the purposes of long-term archival and legal retention?
At this point, the customer wanted to know more; as a proof-point I showed him statistics based on daily, real-world telemetry data collected from our customers’ production arrays. He seemed impressed with the fact that, not only were customers using snapshots for SQL backups, but that they were able to achieve RPOs (recovery point objectives) as granular as one hour, and have a retention schedule that spanned more than a month. (We actually have some customers with RPOs as granular as 5 minutes, spanning several months, but I didn’t want to risk blowing his mind at this point.)
Clearly, the lion’s share of Nimble Storage's customers are actively using snapshots for backup – and by extension are using replication for DR. I suppose for some storage vendors thinking that “snapshots are not backups” affords them the possibility for upselling add-on products that merely address the symptom. Nimble Storage has shattered outdated perceptions of snapshots and how they can be leveraged to replace old paradigms of backup.