Nick Dyer

NimbleOS 3 - VMware Copy Offload / XCOPY VAAI Primitive

Blog Post created by Nick Dyer Employee on Aug 31, 2016

YES - you read that right - NimbleOS 3 brings support for the final (and somewhat elusive) VMware VAAI feature - Copy Offload - also known as XCOPY!


First off a quick refresher on the Copy Offload feature.

Copy Offload is a primitive that forms part of the VAAI integration introduced by VMware a few years ago (alongside other primitives such as Atomic Test & Set, and SCSI UNMAP). It allows for offloading virtual disk copying/migration operations away from the network and servers and to keep it within the storage array. So operations such as "Storage vMotion", "Clone or Deploy from Template" are now operations kept within the same storage system.

This is because moving something from location A to location B within the storage platform natively is much faster and less resource intensive than having VMware issue thousands of reads and write IO over the network and perform host copy processes, and therefore reduces CPU overhead on the VMware host and reduces network traffic. This also leads to faster vMotions, clones and VM deployments.


How do you know what VAAI primitives are in use from your SAN vendor? Run the following command in a shell on one of your VMware hosts:

esxcli storage core device vaai status get

Here's the output from a volume presented from an array running NimbleOS 2.3. Notice that Clone Status == "unsupported".

Screen Shot 2016-08-31 at 13.18.29.png

Here's the output from a volume presented from an array running NimbleOS 3. Notice that all are now "supported"

Screen Shot 2016-08-16 at 13.46.43.png

By default, when Copy Offload is enabled all transfers are set to run at 4MB in size. It's possible to further enhance this by changing the "MaxHWTransferSize" setting to 16MB.

Adjusting the transfer side on the hosts to 16MB can improve performance significantly in some cases, especially where fewer than 6 concurrent Copy Offload operations are being performed. A larger transfer size results in fewer concurrent Copy Offload I/O using up the host queue depth... and as a result, more queue depth is available for non-Copy Offload host I/O. Your mileage may vary, of course, but worthwhile knowing.

To change the transfer size, the following command needs to be issued on all attached VMware hosts:

esxcfg-advcfg -s 16384 /DataMover/MaxHWTransferSize

And to verify if the change was successful (or to check what the current setting is):

esxcfg-advcfg -g /DataMover/MaxHWTransferSize

Screen Shot 2016-08-31 at 14.24.22.png

That's it! Nice and easy. Enjoy!