In a virtualized environment, you have probably heard of some, if not all of the following terms:
- NTFS allocation unit (cluster size)
- VMFS block size
- Storage Array volume block size
Ever wonder how they are correlated across all layers of the I/O path from VM down to VMFS to array volume? And what to choose when formatting your NTFS filesystem, VMFS volume and/or storage array volume? Here’s a quick explanation of what each one means and how I/O gets passed down through the virtual SCSI layer to the array. NTFS allocation unit size (a.k.a cluster size)
- allocation unit size represents the smallest amount of disk space that can be used to hold a file; if the I/O issued by the application is larger than the allocation unit size, then multiple ones would be used to hold the file. The default size is 4KB for filesystems less than 16TB . Myth #1: application I/O request size will be restricted to the allocation unit size. This is NOT true! Just because you set this to 4KB in size does not mean the application will start issuing I/O at 4KB! Best practice? Follow Microsoft's recommendation! For Exchange/SQL, their recommendation is 64KB
Now moving down the stack, the VMFS filesystem – with VMFS5, the block size is set to 1MB without any other choices. Given that VMFS version 5 uses the GUID partition table layout, it now has the ability to address up to 64TB in size without having to choose a larger block size like previous versions. *NOTE* if you upgrade from VMFS3 to VMFS5, it still retains the original MBR partition table until it has grown beyond 2TB… Two more myths on VMFS block size:
MYTH#2 VMkernel breaks I/O coming from VM into 4KB chunks
MYTH #3 All writes seen by the array are now in 1MB blocks as that’s the block size for VMFS
Both of these are NOT true: First of all, VMkernel does not break I/O into chunks – whatever the block size coming from the guest OS, that’d be it going down VMFS layer. Secondly, if/when all the zeroed out 1MB blocks are used up for I/O by the guest OS, then new 1MB blocks are allocated to accommodate new writes (if you use eagerzerothick VMDK, then there’s no wait for the block zeroing to take place; nowadays with most storage vendors supporting WRITE_SAME VAAI primitive, the overhead for this operation is very small). From the array side, it’d be ideal to have an architecture that supports variable block sizes – reason being the mixture of applications/workloads that are expected to run on shared storage infrastructure.
Different types of apps issue I/O in different block sizes – it’s best to have the array allocating blocks matching the block size from the application. That way a write operation does not occupy more blocks than it should on the storage side, and a read operation does not require fetching more blocks than necessary. This portion is applicable to you if you are a Nimble customer – in a virtualized environment with VMware ESX, the general recommended performance policy to use is VMware ESX/ESX5, with 4KB block size + cache enabled. You may wonder, why is that?
- given VMFS is a cluster filesystem shared amongst groups of ESX servers, it is expected that it would serve VMs with different workloads; from array perspective, it is good practice to go with smallest block size when there are mixture of various I/O block sizes expected from the virtual machines (you don’t want array block size allocation size to be larger than the application I/O size)
- when VMDKs are used, vmware snapshot is expected to be taken if applications such as Exchange/SQL require quiescing. In such a case, having a smaller block size on the array volume side would yield better efficiency
- you have less to worry about when using storage vMotion/Storage DRS to optimize space usage (or for whatever reason you use svmotion) – if you have a mixture of volumes with various block sizes, then you are more constrained when it comes with migrating the VM’s VMDKs across your datastore cluster
It is indeed more metadata overhead on the array side to manage smaller block size volumes – however, the trade-off for simplicity and ease of management is well worth it for the few percentage of efficiency gain. If you want to dedicate a volume for only one VM, then you should consider mounting the volume directly inside the iSCSI sw/ initiator, and set the performance policy according to your application type. Do keep in mind this option requires manual work for VMware Site Recovery Manager (SRM) as the in-guest iSCSI storage needs to be mounted manually, or through custom scripts invoked via SRM Recovery Plan. In conclusion, keep it simple and stick with the default recommendation. If you are the exploratory type, remember to look beyond the vSphere layer.