recommendations to turn off TCP Delayed ACK on a VMWare hypervisor are usually associated with this VMWare KB: VMware KB: ESX/ESXi hosts might experience read or write performance issues with certain storage arrays
In that VMWare KB, let me point out one statement in particular:
"The affected iSCSI arrays in question take a slightly different approach to handling congestion. Instead of implementing either the slow start algorithm or congestion avoidance algorithm, or both, these arrays take the very conservative approach of retransmitting only one lost data segment at a time and waiting for the host's ACK before retransmitting the next one. This process continues until all lost data segments have been recovered."
In addition to the above statement, TCP Delayed ACK's "can" have an impact in some other, very specific, scenarios:
1) Certain very specific packet loss scenarios
2) Some situations where an MTU mismatch occurs, specifically in equipment between the host and the array.
Discussions on exactly why these scenarios can allow TCP Delayed ACK to cause a performance impact can be quite lengthy.
Suffice it to say, however, that:
1) Nimble arrays do not fit the description, given by VMWare, of arrays where Delayed ACK should be disabled. We have a fully functional TCP Congestion Control mechanism, including slow start and congestion avoidance.
2) Nimble Storage does not suggest that you disable TCP Delayed ACK, but it is your prerogative to do so.
3) If disabling TCP Delayed ACK on the initiator (as per the VMWare KB) does have a positive impact on performance against a Nimble Array, then there is something unusual occurring in that environment. Unless the problem is a bug in the initiator Delayed ACK code, disabling TCP Delayed ACK is probably not the ideal way to address the unusual circumstance.
Now... to your followup about the difference in latency numbers between VEEAM and the array... this may not indicate anything unusual.
VEEAM, by nature will request large sequential reads on the target dataset.
- From the Windows OS perspective, this would translate to reading something like 1MB or 2MB (some sources would indicate as much as 8MB) at a time from the Windows "Local Disk".
- iSCSI, however, will likely have negotiated to a much lower "burst" size, usually between 64k and 256k.
- For a 2MB SCSI request, the Windows iSCSI initiator may transmit 32 iSCSI requests of up to 64K, or 8 iSCSI requests of up to 256K
- Assume that each 256K iSCSI request takes ~6ms within the array (as per your graph, assuming 256k iSCSI operations)
- Assume relatively small 10Gbps network delays (this would usually be ~.1ms or less)
- Latency for the entire 2MB request would be (~6ms per iSCSI operation * 8 operations) = ~50ms
(note: it is also possible that the Windows iSCSI request would be for 8MB, but iSCSI would still have to request between 64k and 256k per "burst"... In terms of how latency is measured, the net result is the same and therefore not that interesting)
- If VEEAM measures the latency on "disk operations" as opposed to "iSCSI operations", then VEEAM would report about 50ms latency, but the array would report an average of 6ms.
This same concept is confirmed to be true of VMWare latencies. Borrowing a diagram from a Nimble KB ("VMWare reports I/O latency increased", KB-000180):
If VMKernel received an 8MB SCSI request, and used 64K iSCSI requests (not uncommon), the VMKernel latency on the single 8MB request might be as high as 180ms, even if each of the 125 individual 64K iSCSI requests was only 1.25ms.