Skip navigation
1 2 3 Previous Next

Performance & Data Protection

31 posts



In a universe not long ago in a data center near you, storage benchmarking was a fairly easy affair.  Hook up some hosts you have laying around, load up IOmeter, and press go.  In most cases you would be able to easily saturate the backend storage and understand how the array performed under pressure.  With the industry wide flash revolution, this has become more difficult.  Not only are the arrays more performant, but they have a wide variety of data reduction features that can complicate testing.  This has led to the popularity of more complicated test tools such as Vdbench, as well as a need for more horsepower in benchmarking rigs.  In this blog I will be examining some of the most frequently seen bottlenecks in testing, as well as providing guidance on how to avoid them.





The first step in successful testing is setting your expectations correctly.  Make sure you understand the equipment you are testing, and where the limits might be.  For example, I would not expect an entry level array to perform at the same level as a high end solution.  As such, I would first consult with product experts to determine a ballpark figure for the testing.  I have seen this pendulum swing in both directions.  Both with expectations being too high and being disappointed by results, as well as being ecstatic with results when the system is only performing at 50% of actual ability.  Setting expectations, and testing to them, helps to keep things realistic and to customize systems to meet your needs. 


Block sizes come to mind immediately when considering expectations.  One misconception I frequently see is that if an array can perform X amount of operations a second, it should be able to meet that number regardless of block size.  The truth of the matter is that you should be taking into consideration the testing conditions used to arrive at X.  If this measurement was taken using a 4k block size, it would be misguided to assume you could meet the same IOPS count with 128k blocks.  When setting expectations around performance, always seek to understand the conditions your reference numbers were based on.





The infrastructure you are using to benchmark is just as important as the end device you are testing.  Do you use a 10 year old PC to run your applications that require high performance?  Then why would you expect it to be able to push a modern storage system to it's limits?  


Be aware of the tool you are using to benchmark and the overhead it will require to generate IO.  While some tools such as Vdbench are light weight and easily generate IO, other tools that perform tasks such as sending IO transactions through a database stack will require more resources.  Consider the tool you are using and the effect it will have on your testing hosts.  Generally speaking, you do not want to push the test host past 90% utilization of CPU or memory.  If experiencing unexpected results during testing, verify the system resource usage and be sure it is within acceptable boundaries.


Transport infrastructure is also a very important piece of the stack.  Having the right amount of connections at the right speed can contribute greatly to the results.  If your storage array is capable of multiple GB/s of throughput, how are you going to get the data from point A to point B?  Be aware of the theoretical limitations of the technology you are using.  Fibre Channel connectivity is known for being fast with low latency, but a single 8Gb port is only capable of driving ~800MB/s of throughput.  The same applies for 10Gb Ethernet, a single connection will likely yield around 1,000MB/s.


Understanding the topology of the transport network is also very important.  Always evaluate how many switches or pieces of equipment there are in the link, as well as the uplinks between them before you begin testing.  Eight connections into the transport will not matter if you only have two uplinks.





With multiple links configured for the transport, understanding how and when each of these links are utilized is key.  Whether using Fibre Channel or iSCSI, hosts will have a method of determining the best path to send IO.  Windows uses a native multipathing stack with Device Specific Modules or DSMs and Linux typically uses the built in DM-Multipath stack with options set in /etc/multipath.conf.  Virtualization hosts will also have settings configured for controlling multipath, such as VMware's Path Selection Policies.  


In an optimally configured environment, you should see traffic flowing across all active paths evenly.  If only seeing IO use one path, or if paths appear to have uneven amounts of traffic, the first place to look is your multipath settings.  Make sure that the multipath configuration is configured to allow traffic across all of the active paths using either a round robin or least queue depth methodology.  Consult your vendor's implementation guides for specific settings.





You have your beefy test host setup, your transport squared away, your test harness ready.......You press go and immediately are disappointed.  You checked with your vendor and they assured you the array would perform to an expectation, but it just isn't getting there.  What should you check first?




Start with the basics.  Double check the methodology matches the way your expectation was measured.  Verify block sizes, access patterns, read/write ratio, thread count, and data reduction settings.


Check your host utilization, make sure all resources are <90% utilized.


Check your transport and be sure all links are up and negotiating to the right speed, duplex, and MTU.  Also a good time to double check topology and uplinks.


Check your number of iSCSI connections per volume.  By default this is typically 2, more connections will be needed to drive max performance, I recommend 8 as a starting point.  With Nimble arrays this can be configured using Nimble Connection Manager, Nimble Linux Toolkit, or the VMware host plugin.


Verify multipathing and all paths being equally utilized.


If high latency is observed, compare array reported latency vs host reported latency.  If host reported latency is off by >.5 ms, investigate host side queues/thread count and transport latency.


Fibre Channel:


Start with the basics.  Double check the methodology matches the way your expectation was measured.  Verify block sizes, access patterns, read/write ratio, thread count, and data reduction settings.


Check your host utilization, make sure all resources are <90% utilized.


Check your transport and be sure all links are up and negotiating to the right speed.  Also a good time to double check topology and uplinks.


Check that your FC queue depths are properly configured for benchmarking.  Typically you want these as high as possible.  Check your HBA vendor's instructions for how to increase this count in their drivers.  Some are set to as low as 32 by default.


Verify multipathing and all paths being equally utilized.


If high latency is observed, compare array reported latency vs host reported latency.  If host reported latency is off by >.5 ms, investigate host side queues/thread count and transport latency.




When all else fails, double check the settings again methodically.  With all of the different places in the stack to look, it can be very easy to overlook a setting.

Following on with the setup guide of the Nimble Secondary Flash Array, I am going to go through the deployment options, and the settings needed for implementation with Veeam Backup and Replication.

What will be covered in this blog post?

  • Quick overview of the SFA
  • Deployment Options
    • Utilizing features of Veeam with the SFA
    • Using a backup repository LUN
  • Best practices to use as backup repository
    • Veeam Proxy – Direct SAN Access
    • Creating your LUN on the SFA for use as a backup repository
    • Setting up your backup repository in Veeam
    • vPower NFS Service on the mount server
    • Backup Job settings
    • SureBackup / SureReplica
    • Backup Job – Nimble Storage Primary Snapshot – Configure Secondary destinations for this job
    • Encryption – Don’t do it in Veeam!
  • Viewing data reduction savings on the Nimble Secondary Storage
  • Summary

My test lab looks similar to the below diagram provided by Veeam (Benefits of using Nimble SFA with Veeam).

Quick overview of the SFA

The SFA is essentially the same as the previous Nimble Storage devices before it, the same hardware and software. But with one key difference, the software has been optimized against data reduction and space-saving efficiencies, rather than for performance. Which means you would purchase the Nimble CS/AF range for production workloads, with high IOP performance and low latency. And the SFA would be used for your DR environment, backup solution, providing the same low latency to allow for high-speed recovery, and long-term archival of data.

Deployment options

With the deployment of an SFA, you are looking at roughly the same deployment options as the CS/AF array for use with Veeam (This blog, Veeam Blog). However with the high dedupe expectancy, you are able to store a hell of a lot more data!

So the options are as follows;

  1. iSCSI or FC LUN to your server as a Veeam Backup Repo.
    • Instant VM Recovery
    • Backup Repository
    • SureBackup / SureReplica
    • Virtual Labs
  2. Replication Target for an existing Nimble.
    • Utilizing Veeam Storage Integration
      • Backup VMs from Secondary Storage Snapshot
      • Control Nimble Storage Snapshot schedules and replication of volumes


Continue reading this article over at my personal blog.

I was speaking about the new Secondary Flash Array (SFA) with one of our people in Asia-Pacific, and this is what he shared about the data reduction capability: “18:1 is a game changer for sizing and pricing.”


I thought back to when we were first seeing results out of our labs, which were in the area of 8x.  Even at this level of data reduction we were getting incredibly positive feedback. Our new Adaptive Flash (i.e. hybrid)-based SF-Series was doubling the data efficiency of what we were typically seeing on the All-Flash series. And with the SF-300 model having a planned 200TB capacity, we were excited to promote that we could deliver over a Petabyte and a half of effective capacity within just a 4U system. This was energizing the field.


Well, fast-forward to May when the SF-Series went GA, and with more testing under our belt along with a half-dozen new customers, a surprising thing happened.  We discovered our data was wrong.


The data reduction was even greater.


We’re now seeing expected data reduction in the area of 18-to-1!


Data Efficiency explored

So how does the Nimble dedupe work?  The data reduction on the SFA is based on always-on, in-line data deduplication and compression. The SFA dedupes per pool, and is based on 4k blocks.  The approach leverages the patented CASL (Cache Accelerated Sequential Layout) architecture -- the Nimble Storage operating environment, which encompasses both the NimbleOS and CASL, employs a flash-first design where all writes are acknowledged from an enhanced capacity NV DIMM layer before they are write-coalesced and written to spinning disk in large stripes. All writes occur on the SSD tier, are deduped and compressed, then are written to disk.


If this sounds familiar (esp. for Nimble customers), it should be. The SFA still uses NimbleOS, and the approach is similar to what we are doing on the All-Flash AF-Series arrays.  There are a couple key differences however, due to the engineering that was done specifically for the SFA:


  1. Reduced resource (RAM and compute) requirements to dedupe same capacity
  2. SFA tuned to minimize going into burst mode (switching off dedupe temporarily)


So theoretically, for very active data, the data efficiency ratios may converge between the SF and AF platforms. However, the Adaptive Flash economics and validated design work around the SFA has optimized that platform for less active data storage, and thus the typical reduction rates will differ significantly between the two platforms.


Estimating Data Reduction on the SFA

A key difference between our early testing and now is that we’ve had more time to simulate typical backup retention schedules. Though the basic appeal of the SFA is around ‘active backup’ or using your backup for more than just rare operational recovery, we are still seeing the majority of the data being inactive backup data. So the data reduction is primarily a factor of full backups over time. Here’s a way to estimate data savings:


  1. Take the number of full backups (synthetic or active) being retained. All full backups will dedupe well against each other; if there is no new data written between full backups, the new full backup will completely dedupe.  And this data reduction benefit extends to all the full backups retained over time, such as say a typical 12 week schedule.
  2. Multiply by the reduction on the data set. This could be comparable to the reduction you would see on a Nimble All-Flash array. This is workload dependent – some data sets like VDI dedupe much better than say video data. We can expect to see at the very least an average 1.5x compression savings on any data, and perhaps greater dedupe savings beyond that.


So a formula for a conservative data reduction estimate is as follows: 12 weekly backups will be expected to mostly dedupe, and multiplied by 1.5x, results in an expected 18:1 data reduction.


Change the Game with SFA

Put your backup data to work, and start realizing the increased data efficiency and cost savings of the Nimble Secondary Flash Array.  Change your data protection and secondary storage game today by getting up to 3.6PB of effective capacity in just a 4U footprint, and a List cost per effective GB as low as just 10 cents!


Get more information or request a demo at

Since the GA on May 11, the Nimble Secondary Flash array has been getting awareness, both positive and otherwise. On the plus side, multiple deals have been closed and a pipeline of demand is growing behind that. Recently however we’re seeing controversy over “Mis-naming”.


Let me address the issues brought up in today’s Register story:


  1. The new Secondary Flash Array does in fact use disk. It’s built on Nimble pioneering Hybrid flash technology. Thus it’s a mix of spinning disk and SSD, delivering affordable performance plus capacity optimizing Data Deduplication. Naming it “Secondary Hybrid Flash” seemed too long. We had considered “Secondary Storage Array” which would have been Less informative regarding what it’s made of. And the solution-specific name that was suggested, “Secondary Hybrid Integrated To Zerto”, had its own issue.
  2. The CS Series is still available. As was the case with Mark Twain, the reports of the death of the CS-Series are much exaggerated. The CS-Series adaptive flash arrays are still at the center of our product line, with IOPS maximizing All-Flash AF-Series on one end, and now the Capacity oriented SF-Series on the other. Our entire installed base of Nimble arrays are currently seeing a measured 99.9999% availability.
  3. “We should not view the SF Series as comparable to all-flash…” I totally agree. The Secondary Flash Array does not have the same IOPS of a Nimble All-Flash array, with scale-up performance topping 1.4 million IOPS, or even that of the CS-Series. I’d also add that the SF-Series is not the same as a purpose built backup appliance, such as StoreOnce, made just for serious backup workloads with support for VTL, very broad selection of application plug-ins, or very high levels of throughput (hundreds of TBs/Hour). The Secondary Flash Array serves an emerging need for customers with typical backup requirements and who want to put their backup data to work.


I hope this clears up things – thanks again for all of the overwhelming response on this topic, as well as for the interest in the new Nimble Secondary Flash Array.

How to Identify Secondary StorageSFA-FieldGuide.png

Veteran storage admins will have no problem identifying and understanding Secondary Storage.  They’re not confused by some classic computer science definition that distinguishes system RAM attached to CPU vs. nonvolatile disk storage, etc.  Here we’re referencing the colloquial distinction of the class of data storage used for more inactive data.


As an industry, we’re pretty obsessed with Primary storage: the systems housing the more transactional data associated with top tier applications: Databases.  ERP systems.  Financials and General Ledger systems.  In contrast, Secondary storage is a distinct tier of storage living beneath that very active Primary tier and extending down towards the ‘cold storage’ or Archive tier.  Typical uses include Data Protection as well as less business-critical tasks like Dev/Test, QA, and Analytics.  Secondary storage is typically less expensive, less performant, has fewer features, lower availability characteristics, and is often more scale-oriented, than Primary. 


The nature of Secondary storage is evolving though.  One example is in the area of Data Backup.  Traditionally, it was sufficient to have storage that could ingest and retain backup data relatively quickly and cheaply. The need to restore data was rare and it didn’t matter if it was slow. And there was no expectation to be able to actually access the stored data for anything other than those infrequent times when a file was lost or a power surge wrecked a system. But companies Are now looking to restore data faster. And access backup data to use for testing or analytics. And get multiple uses out of previously single-use devices. And streamline and simplify any and all on-premise IT infrastructure investments.


Field Markings of Modern Secondary Storage

The trademark of today’s Secondary Storage system is its ability to let you put your backup data to work.  There are at least three traits that can be easily identified:

  1. Flash Storage enabled – Secondary Storage includes a mix of Flash SSDs that help ensure fast data access, whether to recover information, or to enable the use of the data directly from the storage system, at near-Primary Storage speed.
  2. Capacity Optimized – The system provides significant data reduction capabilities to maximize the data capacity and keep the effective cost of the system low. Also the data reduction is an inline feature, rather than a post-process that requires twice as much capacity to make it work.
  3. IoT Connected intelligence – Secondary Storage has built in sensors connected back to a machine learning-enabled predictive analytics capability that anticipates and prevents issues for trouble-free operations.


The Nimble Secondary Flash ArraySFA-Icon-V.png

The Nimble Secondary Flash Array is shipping this month, and is an industry-first, effectively using Flash storage to create a data backup platform that lets companies do real work with their backup.  The Secondary Flash array will also be delivered as part of an integrated data protection solution with Veeam.


The Nimble Secondary Flash Array (SFA) is optimized for both Capacity and Performance. It adds high performance flash storage to a capacity-optimized storage architecture for a unique data backup platform — it’s one device optimized for multiple uses: data backup, Disaster Recovery, local archiving, as well as secondary data storage uses such as Dev/Test, QA and Analytics. Users can instantly backup and recover data from any primary storage system, and with our integration with Veeam and other leading data protection software, it simplifies data lifecycle management and provides a path to cloud storage. As part of the Nimble Storage product line, it comes with InfoSight Predictive Analytics that enable 99.9999% availability across more than 10,000 deployed storage systems.


Where to Find a Secondary Storage Array

The new Nimble Secondary Flash Array will be taking residence within leading IT resellers around the globe this month.  The SFA is going to change the nature of Secondary Storage by helping IT put its backup data to work.  More information about the SFA and the joint solution with Veeam is available through the new Secondary Flash Array webpages on

ActiveBackupMovingTruck.pngI’ve been sharing with you my observations around data backup and the growing indication that customers are expecting more from their backup systems.  We’ve seen it in an increasing number of deals, we’re hearing it from partners talking up ‘Copy Data Management’, and it’s being confirmed by analysts who predict 30% of orgs will leverage backup for more than just recovery.


To satisfy my curiosity, I conducted a quick survey on the subject.  Admittedly just a handful of respondents, but the findings were in line with everything we’re seeing:


1. Everyone wants to use their Backup more than just in the event of data loss or corruption

Who wouldn’t, right?  Two out of three respondents weren’t able to, but they all had ideas of how they would.

The interesting thing for me is that people have a nuanced take on what this means – specifically what they want to do with the backup Data vs. the backup Systems.  For instance, respondents wanted to be able to use the Data for Analytics and as Test data.  Whereas, they wanted to use free System resources to verify DR readiness, and as a repository for data mining.


2. Everyone wants Ease of Use

When asked what specifically they look for in a Backup, DR or Archive solution, “Ease of Use” kept coming up.  This was an unprompted, open response.  Of course no one wants to fight with a difficult, or overly complex product.  But I’m seeing this correspond to a shift away from IT teams having deep on-staff expertise, and instead having generalists rely upon very capable, highly usable data storage offerings, whether on-premise or in the cloud.

3. DR is a Bet-the-Business exercise

The seriousness of Disaster Recovery is obvious, but it was made even more clear in the responses. For Backup and Archive, “Integration” and “Cost” came up as top purchase criteria.   But in the case of a Disaster Recovery solution, “Company Stability/Reputation” was an answer that jumped out. It spoke to the strategic nature of the purchase, and the importance of a Vendor track record and longevity. 


Share your Insights about Data Backup

Everyone has a moving story to tell about how they are changing their approach to data management and storage – whether it’s in terms of scale, new data products, formats, or in migrating data to (or back from) the cloud.  Share one or two of your biggest insights here in the comment section below.

It has been estimated that of the 30,000 traffic fatalities that occur each year, 94% are caused by human error. Self-driving cars can potentially end this carnage by removing the human element out of the equation. In addition to making cars infinitely safer, the transportation communication systems in and between vehicles and the roads they travel on, will likely put an end to traffic snarls and inner city gridlock. In short, autonomous vehicles will revolutionize the transportation industry and create a safer and as importantly, a much more predictable user experience.

Photo Credit: Star Rapid


In the digital economy, a predictable user experience is also key. End users are conditioned to having data and applications at their fingertips 24x7x365. Think of your smartphone. You have access to thousands of apps that provide great performance and are typically always-on. And if for some reason mobile app performance grinds to a crawl or goes down, you’re only a click away to a competing service.


To compete in the digital economy, organizations need ways to ensure predictable application performance and ensure the availability of key business services. Failure to do so could result in the digital equivalent of a fiery car wreck.


Picture1.png Picture2.png


It is estimated that businesses lose up to $16M annually due to the failure to provide the kind of service levels that end-users have come to expect and demand. In fact, 84% of IT decision makers admit they have an availability gap. In addition to lost revenue opportunities, the bigger risk is a long term hit to your company brand and prestige; as well as a loss in customer confidence and trust.


We recently presented at the Nimble vConference and showed how to close the door on the availability gap while enhancing and maintaining consistent application performance across all applications workloads. By driving very high levels of performance and availability, the combined Nimble and Veeam solution helps deliver a more predictable user experience, while simplifying and automating IT operational management.


You can watch the full session here: How Veeam and Nimble Enhance Application Performance and Simplify Availability


We will also be at NimbleConnect Live! Don’t let your data center transformation take a detour down a digital dirt road. Register now for Nimble Connect Live from June 19-21 in the city of Angels and put your business in the fast lane to success!




Colm Keegan, Product Marketing Manager, Veeam Software


The Big Game is here once again, and this year it’s the Patriots squared off against the Falcons. Both teams had impressive seasons, but one of the Tales of the Tape is the lack of balance in the Falcon’s game – a 2nd ranked offence, yet a 27th ranked defense.  In life its typically important to seek balance, whether it’s ethical (see: Aristotle’s Golden Mean), your work-life, or your football game.  And it’s the same with your data storage.


Looking back to the beginning of the 2000’s, traditional data storage backup devices were developed to be optimized just for data ingest, which is to say, they were designed to accept data as fast as possible to not become a bottleneck for the upstream systems, and not cause a long ‘backup window’. Coupled with efficiency technology like data deduplication, the industry succeeded in achieving a class of devices that could deliver ingress throughput, but it created a legacy of data siloes with costly tradeoffs: slow restores, trapped information, the need for specialized expertise, and even lost data.  The backup appliance became all about getting data into the device, and not so much about getting it back out again. (It became all Defense and no Offense – the opposite of the Falcons!)


Fast forward to today’s IT world, where innovations in data storage are bringing a big change to the area of data backup.  According to Gartner, by 2020, 30% of organizations will leverage backup for more than just operational recovery, up from less than 10% at the beginning of 2016.


Flash becomes the Star player of Backup

A new breed of offerings are enabling capabilities like near-instant backup and restore, the ability to use backup data for real work like Dev/Test and Analytics, and a higher level of availability than has ever been seen before. The key change in the game plan is around the use of Flash storage. With Flash storage, data can be accessed 100x faster than on magnetic hard-drives. It provides relative breakthroughs in data recovery speed, and because of the high level of IOPS delivered, even provides options for recovery such as the ability to run workloads on the backup device itself, rather than waiting to completely restore before restarting applications.


Having Flash storage as part of the backup architecture allows for the ability to use backup data for real business work. In traditional designs, the data starts within a ‘landing zone’ of storage, then undergoes offline processing for things like deduplication, then sits trapped within another tier of storage.  This approach has redundant capacity and makes data restores slow.  In contrast, a modern flash-enabled backup system manages the data in a streamlined way, accepting data and performing important processes like deduplication and verification right when the data is being stored.  It’s faster, more effective, and more space-efficient.


Getting the most Yardage out of your Data

The other key benefit of this new game plan is how it delivers more value out of the backup data. Traditionally the backup appliance was a one-way trip for the data – getting stored, deduped and forgotten.  But now, IT teams can get more use out of the backup data and the backup system, thanks to the inclusion of Flash. They not only get instant access to data when it’s needed in the rare case of outages, but can access it anytime for everyday tasks like Dev/Test, QA, patch testing, analytics and reporting.  Modern backup solutions offer space-efficient cloning that makes instant copies of the backup data that can then be worked with but with no impact to the performance of the upstream systems.


Write your Winning Backup Game plan

In the same way football teams need a balanced plan to win the big game, you need a backup solution that balances fast data input with fast data access and restore.  Draw up an approach that uses new generation flash-enabled backup products that are becoming available this year.



What’s your biggest challenge when it comes to backing up?  Speed, cost, something else?  Share your issues and thoughts here.

Flash in the data center has enabled faster access to data, which has led a great deal of infrastructure consolidation. Consolidation lends better utilization and reduces cost. However, it also brings together diverse applications contending for the same set of resources. These applications usually have different Service Level Agreements (SLA). For example, some of the challenges an administrator faces are, how to retrieve that most important email for the CEO from the past snapshot without disturbing several hour long Extract-Transform-Load (ETL) job running on the same storage array? Or how to take backup during normal business hours without disrupting access to home directories?


A modern flash optimized storage array needs to satisfy the SLAs of many consumers such as application Input Output operations (IOs), snapshot creation, replication, garbage collection, metadata processing etc. Some of these consumers are throughput demanding while others are sensitive to latency. Some are opportunistic and some have stricter deadlines.


The problem of meeting expectations of consumers translates to allocating storage system performance to them. In well-designed storage software architecture, performance is usually Central Processing Unit (CPU) bound. It is better to be CPU speed bound than the storage media speed bound, because increasing the amount of compute can increase performance of storage software and we can take advantage of Moore's law. However, at times, due to application IO profile or due to degraded mode of operation - such as rebuild after disk or flash drive failure - performance can also be limited by available disk or flash drive throughput. Overall, ensuring that SLAs of consumers are met depends on how the saturated resources are allocated to the consumers.


What are the requirements of an App centric storage QoS?


To find answer to this question, we turned to vast amount of data collected by InfoSight from our install base of over 9000 customers. We carefully analyzed application behavior at times of contention. We profiled storage resource consumption to find hot spots. We modeled changes in application behavior by experimenting with various resource allocation schemes. We also spoke to numerous customers about their pain points in deploying consolidated infrastructure and ensuring QoS to their applications. Based on our analysis and what we heard from our customers, it became clear to us that a good storage QoS solution needs to have the following properties:


A) Fair sharing of resources between consumers


At times of congestion, there are a couple of ways to ensure fairness:


Fairness in outcome:

Under this scheme, system ensures that each application gets its fair share of performance, measured either as Input/Output Operations Per Second (IOPS) Or Mega Bytes / Second (MB/s).


Typically, small random IOs consume more resources per byte while large sequential IOs consume more resources per IO. Due to this difference in IO characteristics, system could still be unfair to applications.


For example, if the system were to ensure fair share of IOPS, then an application doing large sequential IO would win more resources. On the other hand, if the system were to ensure fair share of MB/s, then an application doing small random IO would win more resources.


Fairness in opportunity:

Under this scheme, system ensures that each consumer gets its fair share of resources. The goal is not to focus on the outcome, but on the consumption. An application that efficiently utilizes resources will have consumed same amount of resources as an un-optimized application. However, due to its efficiency, it is able to produce better results.


For example, an application sending large sequential IOs will get same resources – such as CPU cycles – as the application sending small random IOs. But due to its efficiency, it will be able to produce better throughput measured in MB/s.


Under the first scheme, performance of applications is determined by the most un-optimized application in the system. However, the second scheme ensures that application performance is not influenced by other applications, providing isolation between them.


In other words, fair sharing of resources between consumers is the key to provide isolation from noisy neighbors.


B) Work conserving


In computing and communication systems, a work-conserving scheduler is a scheduler that always tries to keep the scheduled resource(s) busy.


Under this requirement, if there is resource available and there is demand for it, demand is always satisfied. Moreover, system lets a consumer consume resources when there is surplus, and does not penalize for consumption of surplus when there is contention in future.


For example, when backup job is the only application running on the system, it can consume all the resources available. However, when a production application - such as database - starts doing IO, resources are automatically fair shared. The backup job is not throttled because of its past consumption.


C) Balancing demands and SLAs between consumers


This requirement is about accommodating differing and often conflicting needs of applications. In the presence of throughput intensive applications, latency sensitive ones should not suffer. While ensuring fair sharing of resources, system should also absorb bursts and variations in application demand. Above all, it should consistently meet applications’ performance expectations.


D) Minimal user input in setting performance expectations


This is the most stringent requirement of all. The best solution is the one that works out of the box, with as little tinkering as possible. By reducing administrative overhead, this allows administrator to focus on his applications and users instead of infrastructure.


Note that it does not eliminate the need for an administrator to input exact performance specifications for applications – such as IOPS, MB/s or performance classes, which are required to convey the intent of administrator to the system. However, in most cases where the requirements are isolating noisy neighbors, providing performance insulation and protecting latency sensitive applications, exact performance specification is a burden on the administrator.


A review of existing storage QoS solutions in the market


When we explore existing storage QoS solutions, we see many variants of following designs being prevalent:


1. Slowdown admission to the speed of slowest bottleneck


As per M/M/1 queuing model, with exponential distribution of arrival time and service time, response time for requests increases non-linearly when resources are saturated. When the system has multiple resources involved in request processing, speed of the slowest resource determines the response time. However, at the time of request admission, it might not be possible to determine which resources are needed upfront. In order to avoid non-deterministic response time, this design throttles admission based on the speed of slowest resource.


Clearly, this design is wasteful. Since upper limit on consumption is based on the slowest bottleneck in the system, consumption of remaining resources cannot be maximized. E.g. Application IO requests that can be served purely from memory may be throttled to the speed of IOs that need to be served from storage media.


To give a real life analogy, this solution is similar to restricting the admission to a museum based on the capacity of the smallest gallery. Patrons not interested in visiting that particular gallery have no option but to wait in that single long line.


Comparing this design to requirements of QoS, we can see that this design does not meet requirement B (Work conserving).


2. Limit resource consumption


In order to ensure QoS, this design puts a limit on application throughput (IOPS or MB/s), so that their resource consumption is limited and none of the resources are saturated. There are few problems with this:


  1. Since limiting consumption is the way of resource allocation, this design requires limits to be set on every object/Logical Unit Number (LUN)/volume. This is simply an administrative headache.
  2. They are not work conserving. Since upper limit on consumption is purely arbitrary, the underlying resource utilization may be far from saturation. Yet applications cannot use those resources.
  3. If the limit is too high, it may be ineffective.


In real life, we experience throttling of video streaming by some Internet Service Providers as a strategy to avoid network congestion. However, throttling is in effect even when the network is not congested.


Comparing this design to requirements of QoS, we can see that this design does not meet requirements A (Fair sharing), B (Work conserving) and D (Minimal user input).


3. Guarantees on resource consumption


To avoid the problem of limiting every consumer, this solution gives guarantee (or reservation) on certain amount of resource consumption to a few consumers. Those who have guarantees can consume up to the guarantee even during resource contention. When there is surplus, they can consume more than the guarantee. In order to ensure that unimportant applications do not drive the system to saturation, limit still needs to be set on those.


There are a couple of disadvantages with this design:

  1. To ensure guarantees are met, this design forces a pessimistic estimation of performance headroom. As a result, the amount of reservable or guarantee-able consumption is much lower than system capabilities
  2. Though it eliminates the need to configure limits on every LUN or object, explicit guarantees still need to be configured on those who need it.


This design meets requirements A (Fair sharing), B (Work conserving) and C (Balancing competing SLAs) most of the time, if we ignore instances when applications are constrained due to limits set on them. However, it doesn't meet requirement D (Minimal input).


A review of Nimble’s storage QoS solution


Keeping the requirements in mind and understanding limitations of existing solutions, we designed our QoS solution with the goal of having a fair sharing scheduler in front of each bottlenecked resource in the system. These schedulers will not restrict or limit consumption at times of contention, instead encourage and ensure responsible sharing. Those applications that don't have the need of a certain resource won't have to be throttled, should contention for that resource arise.


Here is how our solution fares against the requirements:


A) Fair sharing of resources


We tested our design for fairness using two diverse workloads: Virtual Desktop Infrastructure (VDI) and VMware’s Storage vMotion. While VDI is light on resource consumption with small random IOs (8K to 16K IO sizes), storage vMotion sends large IOs (128K or more IO sizes) that are resource hungry.




In the chart above, App1 and App2 represent two different workloads. While App1 represents VDI traffic through out the test, App2 starts out as VDI initially then transitions to VDI + storage vMotion traffic during the test.


At Interval 1, both App1 and App2 represent VDI traffic, sending equal amount of load. As you can see in the graph, performance of App1 and App2 are nearly identical even with fair sharing turned off.


At Interval 2, App2 transitions to VDI + storage vMotion and starts sending large sequential storage vMotion IOs in addition to small random VDI IOs. With Fair Sharing disabled in this interval, we can see App2’s IOs consume most of the resources, effectively starving resources for App1’s IOs. As a result, performance of App1 is impacted. This is a typical ‘noisy neighbor’ scenario.


At Interval 3, we enabled fair sharing. As a result, we can see that App1’s performance rises back to the level at Interval 1, when both App1 and App2 were using equal resources. In effect, App1’s performance is isolated from noisy neighbors. App2 still enjoys higher performance because it is more efficient with larger IOs.


B) Work conserving


To test work-conserving nature of our QoS solution, we ran an experiment that involves a backup job and a database workload. Backup job is long running and sends large sequential IOs, while database workload sends intermittent random IOs that are latency sensitive.




In the chart above, App1 represents backup job and App2 represents database workload.


At Interval 1, App1 is sending large sequential IOs from backup job. Since there is no one else in the system competing for resources, it could consume all the resources to get the maximum throughput.


At Interval 2, App2 starts sending intermittent random IOs from database workload. At this time, due to resource contention, QoS kicks in to ensure fair allocation of resources between the two workloads.


At Interval 3, App2 stops and App1 starts consuming all the resources again. We can see:

  1. At no instance, we had to cap the throughput of App1
  2. At Interval 1 and Interval 3, App1 was allowed to consume all available resources
  3. At Interval 2, App1 was not punished for consuming resources when there was no contention


C) Balancing demands and SLAs


To verify that our design meets diverse SLAs of applications, we ran a latency sensitive database query along with a throughput intensive backup job.




In the chart above, App1 represents database query workload, App2 represents backup job.


At Interval 1, we have QoS turned off. App1 sends small random database query IOs. App2 sends large sequential backup IOs. Since there is no QoS, we can see that App1 suffers high latency


At Interval 2, we have QoS enabled. We can see that App1 enjoys latency that is 5x lower.


The QoS was automatically able to identify that App1’s IOs are latency sensitive and was able to ensure lower latency without any explicit configuration.


D) Minimal user input


In none of the experiments above, we had to explicitly specify application demands, or put a limit on their IOs to avoid contention. System was able to identify contention and allocate resources as per the demands. There were no user-facing knobs to control.


As we can see, the result of meeting all these requirements is an adaptive system that automatically satisfies application requirements, isolates them from greedy or resource hungry applications (i.e., noisy neighbor isolation) without needing the administrator to intervene. This way, an administrator can focus more on delivering service to his customers and less on the infrastructure. This is what makes an infrastructure app centric.

Flash storage became the big info tech news of 2016, not only because of how it delivered higher levels of performance to the data center, but because of how it’s proliferated across the storage market.  Industry publications have highlighted the ‘coolest’ flash storage products of the year which span arrays, appliances, solid-state drives and even a software defined version of an all-flash storage array.

With the addition of new technologies like NVMe, new product form factors and new licensing schemes, there is no shortage of innovation and excitement within the flash storage sector. A key storyline in all this is the diffusion of flash technology benefits across the product segments, from one end of the product spectrum to the other.  A specific example of this is how flash technology has now dramatically


enhanced the area of Data Backup.


Put your Backup Data to Work

With the gains in read and write speed for data storage arrays, users are discovering what was previously a ‘backup’ storage system – only intended to be used to restore data on the rare occasion of primary system data loss – can actually be used for much more. These additional uses include Test/Dev, DevOps and analytics.  This market shift was documented by Gartner who wrote: “By 2020, 30% of organizations will leverage backup for more than just operational recovery”.  This topic was also introduced within a recent Nimble blog post.


Flash Generation Backup

Flash-enabled backup systems deliver fast backup and restores, and don’t require twice as much capacity to keep up with host systems, as with purely hard-disk drive (HDD) based systems. Flash also provides the speed to let users quickly test and verify backups as they go, providing peace of mind.

Active backup systems eliminate back-up windows and also Restore windows. Administrators get quick access to files, VMs, applications or entire systems and are able to rapidly copy them back to the primary storage.  Or, they have the option to not wait to restore at all, but instead ‘live mount’ production workloads at full speed on the flash-enabled secondary storage array, and restore data in parallel. Flash-enabled reads that are 100x faster than traditional HDD-based appliances puts an end to the ‘Hotel California’ syndrome of traditional backup.


Consolidate and save with Flash

Another area of benefits being realized with these ‘Secondary Flash’ systems is the ability to eliminate redundant storage systems.  With the addition of Flash to backup systems, IT teams are able to converge backup and other previously separate secondary storage systems within a single solution.  These new flash-enabled backup systems deliver both performance and capacity optimization, so there’s no longer a need to maintain storage systems to support your Backup separate from, say, Test/Dev.  Organizations can now use the ‘backup system’ for Analytics without investing in a separate ‘data lake’.  And with continuing declines in flash cost per gigabyte, it could also affordably function as the local archive.


Look for new ‘active backup’ products in 2017 that will shed IT cost and let IT do more with backup data.


Speaking of 2017, what are you planning in terms of Data Backup in the new year?  Do you expect to spend more or less?  Are you moving more data to the Cloud?  Share your comments and insight here.

I recently became aware of a new talent joining the San Jose Sharks line up, Timo Meier.  Called up from the Shark’s AHL affiliate, the Barracuda, he surprised fans by scoring his first NHL goal on his first shot against arguably the league’s best goalie, Carey Price of the Montreal Canadiens.


surprise-image.pngSurprises can come in all shapes and sizes, and in the IT space it happens a lot.


Gartner recent published a surprising stat: By 2020, 30% of organizations will leverage backup for more than just operational recovery, up from less than 10% at the beginning of 2016.  They cite specific uses beyond just backup to include disaster recovery, test/development and DevOps.  This leading, global technology analyst sees change coming for secondary storage.  We see a key reason being the diffusion of Flash memory benefits across the entire storage product category, similar to how key innovations are shared across other familiar product categories.


Innovations Crossing Product Segments

One example I like to use is how in automobiles it doesn’t take long for new features – whether for performance, convenience, efficiency, etc. – to cross over from $70,000 sedans down to $13,000 compacts.  We take it for granted, especially when it pertains to Safety features.  You can see this today in the Ford Fiesta.


Considered one of the “cheapest cars of the year”, the Ford Fiesta comes with the expected standard equipment for a popular compact car, but also key features that compare with top of the line models.  Supposedly they can be ordered with Turbocharging, Direct fuel injection, automated Parking assistance and even Key-less entry.  All features you would also find in a top of the line sedan.


Similarly, I’m expecting to see Flash memory performance benefits cascading across the segments of familiar IT products in the new year.


Put your Backup to Work

Experts have already predicted in past years the ascension of Flash memory in enterprise storage.  This was made simple by the relative differences in the slopes of the hard-drive and flash drive cost curves, with the mechanically-based hard disk drive unable to achieve the same cost decline that Flash could, with its more semiconductor-like economies of scale.  It was only a matter of predicting “when” HDD/SSD price per GB would approach parity, not “if”.


Back to the Gartner prediction, specifically about how organizations will leverage backup for more than just operational recovery, we are presented with a future of what we term “Active Backup”. This is the idea that backup data, traditionally trapped in a system optimized for ingest, but not recovery, should be also accessible for more active duty than just being a dormant insurance policy against primary system outages or accidental data deletions.  We believe the key to putting the ‘active’ into ‘backup’ is Flash memory.  The same technology that is enabling primary storage to top 1 million IOPS, is the same innovation that will allow IT teams to leverage backup data for everything from Patch testing, to Test/Dev, to analytics and reporting.

We expect the leverage of flash across the range of data storage offerings, from Primary to Secondary, to unlock new value from previously trapped data, speed up administrative tasks and increase overall IT operational efficiency.


What is your favorite recent Enterprise Storage innovation?  Better yet, what new data center innovations do you except to really stand out in the coming year?  Share your comments and predictions.

Very exciting news from our friends at Veeam today; they've announced as part of VEEAM 9.5 (due for release later this year) it will have full Nimble Storage integration for our SmartSnap, SmartReplicate and SmartClone integration!


Here's the VEEAM Press Release: PRESS RELEASE


Here's a blog detailing what native Storage Integration can provide for backup and recovery: BLOG


VEEAM are running a joint webinar with us on May 25th @ 7pm UTC+1 (3pm EST, 2pm CST, 12pm PDT) to discuss the integration functionality in more detail. You can sign up for the Webinar here: WEBINAR REGISTRATION




All that's needed from Nimble is for the system to be on NimbleOS 2.3 or above - so if you're a VEEAM customer but on an earlier release, it may be wise to start planning that upgrade!


I haven't personally seen the integration just yet so i'm very exciting to see what this will bring


I recently came across a situation where a customer wanted to replicate volumes from their production Nimble Storage group to their secondary group at a Disaster Recovery (DR) site.  Nothing unusual in that of course, it’s very straightforward to configure and a good percentage of Nimble customers replicate between disparate array groups. However, in this case the customer had implemented a scale out cluster at the DR site with 2 discreet storage pools configured, and they wanted to specify the target pool that a particular volume collection should be replicated into.

The first thing we should note is if the customer didn’t have any specific requirement to maintain separate pools at their DR site, then a simple solution would be to non-disruptively merge the pools to create a single destination pool for replication to DR.  However the premise of this article is that in this instance, the customer wished to maintain separate pools in order to provide data segregation.

When a replication partnership is first configured on a group that contains more than one storage pool, the user is prompted to select the default destination pool for the incoming replicas to be stored.  Once this default has been set, all volume collections replicated from the upstream partner will be placed into this pool. 


So, what do we do if we want a particular volume collection to be replicated into another pool on the downstream group which isn’t the configured default?

Worry not, where there’s a will, there’s a way.  On initial replication, the replica will always be created in the default destination pool as specified by the replication partnership.  However, once replication has been established, it is possible to select the replica volume collection on the downstream array and initiate a move operation to migrate the replica to the desired destination pool. This starts a background data migration process to relocate the replica volume collection from the default pool to the newly specified pool.  This operation doesn’t require incoming replication to be suspended, so replication from the upstream partner can continue throughout the move operation. Once the move is complete, you now have the downstream replica in the desired pool!



Tip: Don’t forget that for any scale out configuration it is mandatory to have Nimble Connection Manager (NCM) installed on all Windows and VMware hosts that will connect to the scale out group in order to support volume striping and non-disruptive volume moves.

It should also be noted that the Volume Move operation can be automated using API so can be readily scripted and automated if required.


Doing some digging around the last couple weeks after being asked by more than one customer, “Can I use my Nimble array as a backup target?”


Why wouldn't you want to have amazingly fast storage, compression, and the ability to restore files?  The thing is if you can use free no impact snaps, use them, but not everyone can move everything to their Nimble array. Some customers buy 5 years support on arrays and can’t just throw the garbage out. So the answer is a resounding – Yes!


For this blog looked at the existing storage partners for CommVault (CV) and found the solutions to be a 20 year old architecture with one twist - dedicated flash drives.  The basic configuration for De-Duplication storage is lots of SATA disk for the main pool, and a few SSD drives to hold the de-dup meta-data. The problem with that solution is two things: dedicated SSD drives to speed up the De-Dup lookups, and SATA pools still have high latency and can extremely slow under heavy load – like a full recovery. If anyone has used SATA drives for backup storage pools in the past, you know these “work” but are not the best solution.


I talked with the Local CV team and we ran their benchmark in my lab on my CS210.  The result were amazing and we think this is going to be a huge winning configuration for both teams. The smallest array in our quiver delivered 3.3 TB per hour write, and 8.3 TB per hour read performance. Now, the CV test is not a long test, so the read testing result looks a little questionable and could be a result of all blocks coming from SSD, but that could be the case in production. We ran the test over and over and the results were pretty static.


What you have to remember when talking about de-duplication backups is that doing restores brings EVERYTHING back.  Doing the incremental forever backup is great, and doing de-duplicated full backups do save time, but if you have to restore 2-TB of data to an empty target, you need to transfer 2-TB of data. Those blocks are going to be all over the array and searching for blocks on SATA disk is slow. Most restores from backup software take from 2x-6x+ the backup time depending on number of files, size, and networks.


The array one customer was looking to purchase was a 24 SATA drive and 4 SSD drive configuration. I’m thinking Raid-5 for the SATA, and RAID-10 for the SSDs.  The SSD’s are dedicated to the CV Meta-Data LUN, so don’t count for backup performance, but would only see around 5000 IOPS for meta-data. The 24 SATA drives would be the largest bottleneck being able to provide ~1600 IOPS. The example performance from CV show the pool write at 550 MB/s, and reading at 608 MB/s. Moving this solution to a Nimble CS2XX series array would increase performance for both the Meta-Data LUNS and the Pool LUNS.

The competitors configuration we reviewed for the CV solution would have a hard time keeping up in a busy environment.  The CS2XX series delivers 15,000 sustained IOPS without managing tiers of storage (or should I say tears?).  We have an excellent partnership with CV and are on their hardware integration list for intellisnaps. 

We would suggest replacing the proposed storage array for the following reasons.  We would be faster on writes by 8x and reads by 10X (see our test below).  The other solution requires CV to do the compression which will add a serious load to the CV servers.  We would allow them to turn off compression on the server and use us for compression.  We would estimate a 33% reduction in server CPU requirements with compression disabled. Our solution is smaller and takes less power. You get our all-in license model this includes replication, snaps (restore points for meta-data and backups), clones, compression, encryption, and enterprise monitoring. Of course the most compelling reason – we are cheaper according to customer.


From what I can tell the current storage partners for CVs storage pools are just trying to make money and not really help customers solve problems.

Here are some details on what I would propose. 

Nimble CS235

  • 15,000 IO
  • 2 x 10GB ISCSI
  • RAID 6
  • ~26TB Effective Capacity
  • Ability to Pin ~330GB to flash (to pin CommVault hash table)
  • All in software licensing including Snaps, Encryption, Replication, Monitoring,  and ability to use on other system…etc. 
  • Ability to expand to ~200TB capacity



Nimble/CommVault disk performance results. 


DiskPerf Version        : 1.3

Path Used : F:\DISK01

Read-Write type         : SEQUENCE

Block Size : 524288

Block Count             : 4096

File Count : 6

Total Bytes Written     : 12884901888

Time Taken to Write(S)  : 12.88

Throughput Write(GB/H)  : 3353.56

Total Bytes Read        : 12884901888

Time Taken to Read(S)   : 5.18

Throughput Read(GB/H)   : 8337.62

Hello all,


I thought i'd write a quick blog post based on a recent observation i've seen across my customers when performing Infosight analysis; the subject today is using Nimble Storage for Backup Repositories.


When creating a backup repository (from the likes of Veeam, Commvault, Symantec etc) there are few things to watch out for in order to ensure performance is kept the same for other workloads on the array.


Nowadays, host side backup tools as mentioned above have built in data reduction technologies such as deduplication and compression, which is performed on the proxy server before being sent to the repository.


However the trend i've seen recently is the repository is given a standard Application Policy on the Nimble array, or a policy which has caching and compression enabled. The problems that have arisen from this are:


  1. The backup repository is being placed into SSD cache - something which shouldn't be done as the backup data is rarely read back.
  2. The array is attempting to compress the backup repository as data is being written, however this data is already compressed at source, meaning CPU cycles are being burnt for no gain on the array itself.


Here's an example taken from Infosight (customer data removed). Four backup repositories are created (one for each country), yet each volume is being cached (one at 95%!).


Screen Shot 2015-03-04 at 11.51.132.png


This is because the volumes have been allocated a policy called "Veeam", which has compression AND caching enabled. Notice the Compression stat at 0.97x, which is 0% reduction.


Screen Shot 2015-03-04 at 11.51.36.png


This in turn started to burn CPU cycles and cache (notice the CPU and cache increase from the end of December onwards, when this was configured).


Screen Shot 2015-03-04 at 11.50.35.png

To rectify this issue, create a new policy - I created one called Backup-Repository. I kept it at 4KB, however I turned OFF compression (as it's being compressed at source) and caching (as I don't want to serve backup data through flash cache).

Screen Shot 2015-03-04 at 12.19.45.png

I can then change the Application Policy allocation on the fly for any already created volumes on the array, and any new data being written to the system will a) NOT be compressed and b) NOT be cached. Exactly what we want!


Screen Shot 2015-03-04 at 12.32.04.png


Hope this is useful to you all. We've also just released a new Best Practice Guide which focuses specifically on Veeam Backup & Replication, which is available to download here: