This is a continually annoying problem. The heart of the issue is that the guest OS, ESXi and NimbleOS each have to be told that the data is gone. If you've deleted the files in the guest, it only updates the filesystem metadata to note this deletion. The real data isn't changed on disk. Aside from building a new VM, migrating all of the data over the network and deleting the old VM, the only way I know of to reconcile the books of all three layers is to put zeroes over the old data.
Thanks to Nimble's inline zero detection and discarding, that's all it will take to reconcile the guest with Nimble. There's another (long) conversation that's been happening on NimbleConnect which has produced a script which will use powershell to clean up a Windows guest. Be aware though! This script will inflate the vmdk. Be sure you have enough space on your datastore to handle this or else Bad Things Will Happen. If your guest isn't powered by Microsoft, you'll need to find another script or whip something up perhaps using dd.
As for getting VMware on board with the changes, your self response about taking the disk offline and using vmkfstools will do it. If you can't accomodate the downtime, I've heard (though never tried first hand) that you can do a storage vmotion to another datastore that has a different block size. This will apparently invoke an older version of the vmotion routine that actually inspects every block and thus will recognize and discard all-zero blocks. Again, I've never tried so a little Googling is advisable. This will though leave the old data in the datastore which Nimble won't know to discard so you'll have to put a vmdk there and zero it out to update Nimble's allocation records (just creating a VM with a eager zeroed disk will do the trick, no need to actually clone another VM or install an OS).
I hope that helps!
IMHO, the source of the annoyance with this is the double thin provisioning. I've learned things are so much easier if I don't thin provision the VMDK. Let nimble handle making the space efficient. Frankly, I have yet to see anything to be gained by thin provisioning at the VMWare layer (except maybe that it's easier to see how much data can be reclaimed by comparing the VMDK size to the OS "used" size....but you could also accomplish this by using one VM per VMFS volume, a messy solution, but that's where VVOLs come in). If you stay thick provisioned, you don't actually use extra space on the nimble side, and reclaiming space is easy. Just use the method Jonathan linked to above or use SDelete and you're done. No taking he VM offline to shrink the volume or storage-motioning it through a VMFS-3 filesystem to convert to thick/thin.
Well said, Jonathan. When we moved to Nimble last fall, we kept thin provisioning vmdks and luns since our old storage array didn't have any data reduction features beyond thin provisioning. In that situation, VMware was the "truth" of how much storage was actually used. However, since Nimble utilizes very effective compression, zero detection and soon dedupe, the accounting of VMware just seems wrong. In fact, once NimbleOS 3.x is out for hybrid arrays, we're planning on migrating all our VMs to new datastores (deduped of course) and thick provisioning them all at the same time.
Could you also use this on VMWare? (Assuming you are removed vmdk files rather than just data within a VM):
SSH to one host
Run: esxcli storage vmfs unmap --volume-label=volume_label
To add on to what everyone's saying here I've got a process I use when our thin-provisioned VMDKs and datastores get out of sync. The scripts and methods Jonathan linked are excellent. We just happen to be small enough that I can maintain things by hand with this quick process. vMotion to a different block size datastore is supposed to work but it's been harder and harder to do since block sizes tend to be converging (if not forced) to one size. Mixed environments may fare better than a single-version one like ours.
As context, Nimble doesn't auto-reclaim space from VMware because ESXi no longer automatically issues the iSCSI UNMAP command to open up space on a LUN. There were some compatibility issues once upon a time, I've read, so VMware turned it into a manual process to leave you in control and responsible for any problems. I wish they'd add a GUI button for this somewhere, though.
These steps might be redundant or old news to you. I've spent some time in the past looking for straightforward directions to do this, though, so I hope this is helpful to someone someday if not today.
Empty the space in your VMDK:
- For Windows, use sdelete to zero-out free space on a target partition from inside the OS. The syntax is "sdelete -z x:\" and sdelete will erase the balloon file when it's done. For Linux, use dd with "dd if=/dev/zero of=/path/fill-file bs=1M". You'll need to delete fill-file when you're done (I've had trouble using && to concatenate my dd and rm commands before, so just be prepared to intervene before the system gets too angry). Both options will temporarily fill all free space with zeroes so use caution since your drive will run out of storage. The zeroes will be recognized as empty space in the next step, whereas deleted file data is only seen as empty by the filesystem.
- Enable SSH on a host with access to the target datastore. You'll probably need to start the service manually unless your environment leaves SSH to ESXi hosts on all the time. It's in the host's security profile configuration section.
- Shut down the VM (down time alert!).
- Use vmksfstools (as you found) from the SSH CLI session to actually remove the empty data and shrink the VMDK. I use the command "vmkfstools --punchzero /vmfs/volumes/<volume_identifier>/vm-folder/vm-disk.vmdk" and it can take a while to complete (like an hour or more for large enough drives). I like using --punchzero (same as -k) because it makes explicit what's going to happen. The volume_identifier can be found from the datastore's properites. I like using this because it makes the path explicit, though there are symlinks that alias the datastore's friendly name to this path. I made a cheat sheet for myself of our datastores and their corresponding IDs so I don't have to find them every time.
- Start the VM.
- Disable SSH on the host.
- Wait a bit and then check your VMware datastore usage.
Reclaim the space on Nimble:
- Ensure you have the vSphere CLI on your admin machine.
- Use esxcli from vSphere CLI to unmap unused blocks. I use the command "esxcli --server host.domain.local --username root storage vmfs unmap --volume-label=mydatastorename --reclaim-unit=5000". You can do this from the SSH CLI as well, of course, Mine are separate because I usually just don't run these processes as part of the same maintenance cycle. So, my SSH isn't running and I want to connect with vSphere CLI instead. relciam-unit is how much to unmap at once and can impact datastore (or array) performance if it's too large. I've found 5000 to be fast to execute against our Nimble without significantly impacting normal traffic.
- Wait for Nimble to run garbage collection (a day or so) and then check your utilization stats.
Hope this helps, or adds to those above!
Hello, We have a Nimble CS300 with about 9.24TB Space used. We are running VMware ESXi 5.5U3.
When I create new disks in VMware I always make them thin provisioned. A common issue with thin disks is that the size will grow when required, but never shrink. When you require the capacity only once you might want to get it back from the virtual machine.
I ran into a small issue when cleaning up some old file shares I removed about 2.5TB of data and would like to reclaim that space.
Is there any way to do this?