2 Replies Latest reply: Jan 14, 2016 8:57 AM by Jonathan Zygmunt RSS

    VMware on Nimble - Identifying "real" vs perceived usage?

    Mark Edwards Newbie

      Hi all,

       

      I have a query on capacity usage when using VMware on Nimble (ie vSphere 5.1 on Nimble CS260G's).

       

      We are running active volumes on both arrays, and replicate each volume to the partner array (for DR).

       

      If I log into my Nimble arrays, and check volume usage (the blue section) via the home tab, it tells me i'm using 43TB total across both arrays.

      If I log into VMware and look at total usage across all my Datastores, it says i am using 30TB total, ie a 13TB discprepancy.

       

      Q1. Our first assumption was that this discrepancy was due to the replication data. However we aren't sure whether the replication data counts as part of the volume usage (blue) or snapshot usage (green) on the pie chart? As they are both volume, and snapshot data (as snapshots were used to create them). Can someone confirm?

       

      Q2. Our second thought was that this discrepancy could also be due to the SCSI unmap behaviour. Ie when you delete or move VMs between datastores within VMware, VMFS knows it is deleted data (and removes it as"used space"), but the Nimble doesn't see this space as having been released, is that correct? Our testing certainly seems to support this, ie if we empty a datastore in VMware and then check the volume usage at the Nimble end, it still shows it as half full, so ultimately the array appears to think it is more full than it actually is. Our understanding is that to see the correct free space in the Nimble, we have to manually run scsi unmap via the vmkfsktools? Otherwise the blocks are not actually released until they are needed for performance reasons (as per VAAI Space Reclamation(SCSI_UNMAP) myth buster). So is running a manual clear out via vmkfstools regularly, the best practise way of dealing with this, or is there some other way?

       

      Unless i'm missing something, it seems that without running vmkfstools on these volumes, we don't really know what our "real" free space is on the Nimbles? It seems like it can also lead to a scenario where the arrays think they are full, which could then impact performance, when in fact they are not. Maybe its just something we need to be aware of, that we hadn't appreciated previously, but i guess i just want to be sure we aren't missing something.

       

      I hope that makes sense. If someone could clarify our understanding on this, that would be great.

       

      TIA

        • Re: VMware on Nimble - Identifying "real" vs perceived usage?
          Jonathan Zygmunt Adventurer

          You are correct that you do have to run unmap manually from the ESXi side to release the storage.  The only other way I know to effectively release storage on the nimble side with regard to ESXi is to storage motion all your stuff off the volume in question, and then delete the entire volume....but who would want to do that?  Another important thing to realize is that if you're using thin provisioned volumes, depending on the operating system you're using, your thin volumes may actually be larger than they need to be and you may want to reclaim space from them and shrink the VMDK files prior to running your unmap.

           

          As far as running unmap as a scheduled task, be careful.  While the nimble seems to do a decent job at scheduling the actual garbage collection to happen at a time when the array can handle the extra work load, the unmap operation form the ESXi side can be rather impactful on performance.  Also if you're on an older version of the nimble firmware, be aware that there were bugs that would cause the storage controller to crash if you run more than one unmap at a time.  Additionally, I haven't checked this recently, but with some versions of the firmware, unmap can make a mess of your storage utilization graphs in the Nimble UI (I've had cases where the storage graph slowly decrements the storage used all the way to zero and I'm getting concerned that I've lost all my data, but after a number of hours it corrects itself), so don't freak out if you see something like that.