Jonathan Bell

Storage Performance Benchmarking Basics

Blog Post created by Jonathan Bell Employee on Jul 31, 2017

Introduction-

 

In a universe not long ago in a data center near you, storage benchmarking was a fairly easy affair.  Hook up some hosts you have laying around, load up IOmeter, and press go.  In most cases you would be able to easily saturate the backend storage and understand how the array performed under pressure.  With the industry wide flash revolution, this has become more difficult.  Not only are the arrays more performant, but they have a wide variety of data reduction features that can complicate testing.  This has led to the popularity of more complicated test tools such as Vdbench, as well as a need for more horsepower in benchmarking rigs.  In this blog I will be examining some of the most frequently seen bottlenecks in testing, as well as providing guidance on how to avoid them.

 

 

Expectations-

 

The first step in successful testing is setting your expectations correctly.  Make sure you understand the equipment you are testing, and where the limits might be.  For example, I would not expect an entry level array to perform at the same level as a high end solution.  As such, I would first consult with product experts to determine a ballpark figure for the testing.  I have seen this pendulum swing in both directions.  Both with expectations being too high and being disappointed by results, as well as being ecstatic with results when the system is only performing at 50% of actual ability.  Setting expectations, and testing to them, helps to keep things realistic and to customize systems to meet your needs. 

 

Block sizes come to mind immediately when considering expectations.  One misconception I frequently see is that if an array can perform X amount of operations a second, it should be able to meet that number regardless of block size.  The truth of the matter is that you should be taking into consideration the testing conditions used to arrive at X.  If this measurement was taken using a 4k block size, it would be misguided to assume you could meet the same IOPS count with 128k blocks.  When setting expectations around performance, always seek to understand the conditions your reference numbers were based on.

 

 

Infrastructure-

 

The infrastructure you are using to benchmark is just as important as the end device you are testing.  Do you use a 10 year old PC to run your applications that require high performance?  Then why would you expect it to be able to push a modern storage system to it's limits?  

 

Be aware of the tool you are using to benchmark and the overhead it will require to generate IO.  While some tools such as Vdbench are light weight and easily generate IO, other tools that perform tasks such as sending IO transactions through a database stack will require more resources.  Consider the tool you are using and the effect it will have on your testing hosts.  Generally speaking, you do not want to push the test host past 90% utilization of CPU or memory.  If experiencing unexpected results during testing, verify the system resource usage and be sure it is within acceptable boundaries.

 

Transport infrastructure is also a very important piece of the stack.  Having the right amount of connections at the right speed can contribute greatly to the results.  If your storage array is capable of multiple GB/s of throughput, how are you going to get the data from point A to point B?  Be aware of the theoretical limitations of the technology you are using.  Fibre Channel connectivity is known for being fast with low latency, but a single 8Gb port is only capable of driving ~800MB/s of throughput.  The same applies for 10Gb Ethernet, a single connection will likely yield around 1,000MB/s.

 

Understanding the topology of the transport network is also very important.  Always evaluate how many switches or pieces of equipment there are in the link, as well as the uplinks between them before you begin testing.  Eight connections into the transport will not matter if you only have two uplinks.

 

 

Multipathing

 

With multiple links configured for the transport, understanding how and when each of these links are utilized is key.  Whether using Fibre Channel or iSCSI, hosts will have a method of determining the best path to send IO.  Windows uses a native multipathing stack with Device Specific Modules or DSMs and Linux typically uses the built in DM-Multipath stack with options set in /etc/multipath.conf.  Virtualization hosts will also have settings configured for controlling multipath, such as VMware's Path Selection Policies.  

 

In an optimally configured environment, you should see traffic flowing across all active paths evenly.  If only seeing IO use one path, or if paths appear to have uneven amounts of traffic, the first place to look is your multipath settings.  Make sure that the multipath configuration is configured to allow traffic across all of the active paths using either a round robin or least queue depth methodology.  Consult your vendor's implementation guides for specific settings.

 

 

Troubleshooting

 

You have your beefy test host setup, your transport squared away, your test harness ready.......You press go and immediately are disappointed.  You checked with your vendor and they assured you the array would perform to an expectation, but it just isn't getting there.  What should you check first?

 

iSCSI:

 

Start with the basics.  Double check the methodology matches the way your expectation was measured.  Verify block sizes, access patterns, read/write ratio, thread count, and data reduction settings.

 

Check your host utilization, make sure all resources are <90% utilized.

 

Check your transport and be sure all links are up and negotiating to the right speed, duplex, and MTU.  Also a good time to double check topology and uplinks.

 

Check your number of iSCSI connections per volume.  By default this is typically 2, more connections will be needed to drive max performance, I recommend 8 as a starting point.  With Nimble arrays this can be configured using Nimble Connection Manager, Nimble Linux Toolkit, or the VMware host plugin.

 

Verify multipathing and all paths being equally utilized.

 

If high latency is observed, compare array reported latency vs host reported latency.  If host reported latency is off by >.5 ms, investigate host side queues/thread count and transport latency.

 

Fibre Channel:

 

Start with the basics.  Double check the methodology matches the way your expectation was measured.  Verify block sizes, access patterns, read/write ratio, thread count, and data reduction settings.

 

Check your host utilization, make sure all resources are <90% utilized.

 

Check your transport and be sure all links are up and negotiating to the right speed.  Also a good time to double check topology and uplinks.

 

Check that your FC queue depths are properly configured for benchmarking.  Typically you want these as high as possible.  Check your HBA vendor's instructions for how to increase this count in their drivers.  Some are set to as low as 32 by default.

 

Verify multipathing and all paths being equally utilized.

 

If high latency is observed, compare array reported latency vs host reported latency.  If host reported latency is off by >.5 ms, investigate host side queues/thread count and transport latency.

 

 

 

When all else fails, double check the settings again methodically.  With all of the different places in the stack to look, it can be very easy to overlook a setting.

Outcomes