Someone asked a great question from our joint vmw + Nimble performance video:
The question is the following:
" in esxtop, DQLEN is set to 128 in your case, is that at the Queue Depth at LUN level? what is the typical Q-depth at the Array Port in Nimble storage? 128 seems high, is that because Nimble is a Flash storage and LUNs are capable of processing high amount of commands?"
If you are wondering what "DQLEN" is and where it shows up, you simply go into esxi shell, invoke esxtop, type "U" for device stats mode - "DQLEN" can be spotted easily:
What does it really mean? Thanks to @Eric Forgette from engineering & @Rick Jooss from product management, I got a thorough explanation from both of them:
Per VMW KB, DQLEN is defined as "The value listed under
DQLEN is the queue depth of the storage device. This is the maximum number of ESX VMKernel active commands that the device is configured to support."
Does this strictly mean queue length for the device?
Actually the queue is for both ends of the wire since it's queuing IOs that are actively being worked on. The ESX host has to have a slot to hold + remember the IO, so does the storage device like Nimble on the target side.
Where does the "128" value come from?
The 128 is actually set by Nimble. iSCSI is better than FC because the target can tell the initiator how much queue depth it has and then the initiator can use that value (or a lower one). In our case we respond with 128. That value is per session so you if you have multiple paths you’ll actually get (#_paths * 128) as the total queue depth for that LUN.
Is "128" the Queue Depth at LUN level?
The queue depth is actually at the connection level. If there are 2 connections/sessions, then it would actually be 256 per LUN/volume.
What is the typical Q-depth at the Array Port in Nimble storage? 128 seems high, is that because Nimble is a Flash storage and LUNs are capable of processing high amount of commands?
The value of 128 was selected so that queue_depths would not generally be a limitation in terms of performance, so yes, because the Nimble arrays are high performance. The cost of more queue depth is also not high in our architecture. It should probably be noted the q-depth is not per port but per iSCSI session/connection.
Last but not least, please remember queue depth has a direct impact on latency. If the queue depth is set too low, the array does not get utilized to its full potential. If the queue depth is too high, then the queued IO sits in line too long and thus translates to higher latency. Always question storage vendor's performance claim on their latency numbers, if they are executed with queue depth of 1, of course latency is super low.