21 Replies Latest reply: Jul 23, 2013 10:31 AM by Justin Rich RSS

    Number of VMs per Datastore

    Justin Rich Adventurer

      Just setup protection (volume collection snapshots) on my VMWare datastore and they seem to be working well but i keep getting an error msg that says that it cant delete the snapshots.

       

      "Failed to delete vCenter snapshot associated with volume collection test schedule sched1 since the vCenter virtual machine snapshot tasks have not yet completed."

       

      I sent this in to support and they are saying I shouldnt have any more than 10 VMs on a datastore to avoid this. I only have 28 right now and none have massive amounts of data.

       

      He said he can SSH in and change the timeout for that to help prevent this issue.

       

      Is it just me or does 10 vms per datastore seem really low?

        • Re: Number of VMs per Datastore
          Ted Joffs Newbie

          The key here is the way that VMWare handles storage sharing.  Basically, when a VM wants to write to a datastore, it tells ESX.  ESX then puts a little token on the datastore locking it for write for that VM.  Once the VM writes the data, the locking token is removed.  This is VERY simplified, but gets my point across.  Now, what happens with 25 VMs on 10 different ESXi hosts during a mass snapshot operation.  Well, now they are all battling for the token, and it slows a few things down.  This is basically why you should size your datastores to hold around 8-12 VMs each (depending on the load).  There are variables, that can change this, but this is a pretty well known standard.

            • Re: Number of VMs per Datastore
              Justin Rich Adventurer

              yeah that makes sense. but basically those IO's should be fairly small, again as you stated depending on the IOps of that vm. this is basically the quiescing process, which the tech stated i can actually get some details on (how long each VM takes and if there is a problem VM) which i havent had a chance to dig in to yet. but with SSD (we've got the 4TB cache system) it should be fairly quick.

               

              i completely understand the problem, i just think that unless im doing massive IO (for the most part these VMs should be idle) this shouldnt be too much of a problem.

               

              I guess once i can dig in to this a bit more to see what the VMs are doing i might be able to identify whats causing the issue.

               

              thanks for the response

                • Re: Number of VMs per Datastore
                  Ted Joffs Newbie

                  Generally speaking, true.  But...snapshots result in changed metadata on the LUN.  This then forces a LUN/Volume level lock from the ESXi Hosts that triggered the operation.  While these are very tiny, they when triggered all at once can create a very noticeable storage I/O latency issue due to random interval retries that stack up.  If you are using ATS (VAAI) the locks are then moved away from the ESXi hosts, and onto the array where rather than locking out the volume as a whole, it only locks the specific data being accessed.  This can help with the issues and allow for a theoretical (not in reality) unlimited number of VMs per datastore.

                   

                  Now, you must take Queue Depth into account.  Basically each LUN in a VMWare environment has a pre-defined Queue Depth of, I believe, 32.  That is 32 active I/Ops threads per host per data store.  Lets look at two options in math terms:

                   

                  Data:

                  2 Hosts

                  20 VMs (10 VMs Per Host)

                  Number of LUNs Variable

                   

                  Option One:

                  5 Datastores * 2 Hosts * 32 QD Streams / 20 = 16 Available I/O Streams Per VM.

                   

                  Option Two:

                  2 Datastores * 2 Hosts * 32 QD Streams / 20 = 6.4 Available I/O Streams Per VM.

                   

                  So, as you can see, there are multiple factors at play and I have not even touched on the other I/O factors like how the CPU, Memory, Network, Etc. can come into play.  I suggest sticking with the rule of 8-12 VMs per Datastore -- this is a pretty tried an true configuration.  Of course each environment is different, so test away  I have seen some places get upward of 20-30 per datastore, but they were not doing snapshots, and had pretty low I/Ops requirements.

                    • Re: Number of VMs per Datastore
                      Ted Joffs Newbie

                      By the way, just in case you want to dig in, you can change the Queue Depth on your ESXi volumes and while this may help some, there are the storage vendors and/or HBA (Hardware iSCSI Card) settings to contend with.  Rather than re-writing their KB, I am giving a link to the VMWare KB; use with extreme care though as modifying such without knowing the vendor (what is Nimble's optimal Queue Depth?  Anyone?) matching specifics, you can make things much worse really fast.

                       

                      VMware KB: Controlling LUN queue depth throttling in VMware ESX/ESXi

                        • Re: Number of VMs per Datastore
                          Justin Rich Adventurer

                          so the lun im trying to snapshot is a development env. Im building up this base env, and then when someone needs a new copy of the env (agile teams) i can use that snapshot as the zero write clone and mount and sysprep.

                          because this is the base system, it isnt used at all. this is mostly why i think the 20+ vms shouldnt be an issue.

                            • Re: Number of VMs per Datastore
                              Matthew Andersen Adventurer

                              Justin,

                              You are correct.  It comes down to what is an "acceptable" io pause for your environment.  If you aren't experiencing problems this should be fine.  We like 8-10 as a general rule, but there are always exceptions to the rules.  You should be fine, but if you start seeing issues as described below you will know why

                               

                              Thanks,
                              Matthew

                                • Re: Number of VMs per Datastore
                                  Ted Joffs Newbie

                                  Exactly, and thank you for the response.  It has been a busy day here and I was not yet able to reply!

                                  • Re: Number of VMs per Datastore
                                    Justin Rich Adventurer

                                    well see thats the problem, no one is using these vms, i've just installed software on them to prep them so its not like there is high IO, or really any IO actually... when i look at the nimble perf info its usually single digit MBs and under 100 iops (not uncommon to see it register under 10 iops)

                                     

                                    when i look at infosight the CPU is registered at a MAX of 4% with an average cache usage of 5% or less (these are based on upgrade %, so i dont think its actually utilization %)

                                     

                                    basically what im saying is... its the 460G-X2 with a total of like 30 idle VMs

                                     

                                    also im only using 7.6% of the space, which gives me a near perfect cache hit ratio..

                                      • Re: Number of VMs per Datastore
                                        Ted Joffs Newbie

                                        I have seen this type of issue in other, non-nimble, environments.  I wonder, what are the specs on your vCenter server?  I suggest 4 CPU and 16GB memory minimum.  Without that, I have seen where the snapshots from other solutions can not get executed by vCenter in a timely manner.  May not be your issue, but 5.1 has a much higher resource requirement than 4.x versions.

                            • Re: Number of VMs per Datastore
                              Sammy Bogaert Wayfarer

                              While this was indeed true in the past, locking is handled by the Nimble on block level by VAAI ATS.  10 VMs per datastore is more like ESX 3.5 guideline.  We go way above that on  vSphere 5 (on different kinds of storage but all with VAAI ATS enabled) and don't see those issues...

                               

                              Find it strange that Nimble recommend such a low number per datastore?  I mean if you use VDI on nimble with linked clones, most will go with more than 10 VMs per datastore.

                               

                               

                              Interesting to see how this thread will develop :-)

                                • Re: Number of VMs per Datastore
                                  Ted Joffs Newbie

                                  Sammy,

                                   

                                  In VDI, you are looking at a whole different beast.  The issue is more apparent with NON-VDI work loads (especially without VAAI) due to Queue Depth, The Number of Hosts, and VM Per Host, Per Data Store Saturation Levels.  In every scenario, you need to do the math, check the Queue Depth ratio, the I/Ops requirements, read/write profile (look at 50/50 for Linked Clone Datastore, 80/10 for persistent Disk, and 100% Read for Replica Datastore -- If you mix these it gets more interesting).  Ultimately with storage, I/Ops are great but if you can't request them -- well then they are just a feature set that does not matter.  When dealing with iSCSI storage especially, you must look at the whole picture (I/Ops. Queue, Network (Pipe, Jumbo Frames, QoS, Etc.), Host Capabilities, Etc.).

                                   

                                  I think the key take away from this whole discussion is perhaps two points:

                                   

                                  1) Everyone's scenario is different, and vendors will normally suggest industry standard practices as a rule.

                                  2) When designing storage, look at the whole picture, do the math and put in what works best for your needs.

                                    • Re: Number of VMs per Datastore
                                      Sammy Bogaert Wayfarer

                                      Completely agree Ted,

                                       

                                      i was pointing more to the locking issue they were referring to.  With VAAI ATS, i don't see that as an issue with 10+ VMs.  And when doing snapshots etc. you are doing quite some locks.

                                       

                                      It indeed all depends a bit on the situation and environment!

                                        • Re: Number of VMs per Datastore
                                          Justin Rich Adventurer

                                          So based on all of these discussions, I dont see any reason i'd have an issue here.

                                           

                                          Also i checked the vcenter host this morning it was 4cpu with 8GB, i've since upped that to 16.

                                           

                                          there are two data stores on the env with 8 hsots.

                                           

                                          4 hosts are in a cluster with one data store (due to MS lic) this is considered core services. vcenter, DC's, other such utility type boxes. there is a total of 19 VMs on this cluster (no snapshots)

                                           

                                          The other side is again 4 hosts, with a total of 39VMs, of those 28 of them are on the datastore im trying to snapshot and the rest are on the datastore from above.

                                           

                                          the 28 hosts on datastore im trying to snapshot is to be used like VDI. We are building up the dev env based on production level code with a scale back of number of nodes. This is being used by no one currently and the snapshots are schedules to get VSS versions to be used for the clones for the developers. this base env (datastore) will never have users on it.

                                           

                                          so in all of these questions there is nothing that says to me this should fail.

                                          plenty of power with the hosts (HP chassis system, C7000 with beefy hosts)

                                          almost all VMs are idle since im building all of it from scratch and there are no users of any kind.

                                           

                                           

                                          Looking at the perf data i am seeing a spike in utilization of that volume between 12am-2, but the spike is 3.6K iops and 60MBs.

                                           

                                          I'll dig in to see what this spike is from but even these numbers dont seem nuts compared to what this box should be able to do.

                                  • Re: Number of VMs per Datastore
                                    Justin Rich Adventurer

                                    hmm that very well could be it, i'll give that a try tomorrow, thanks

                                      • Re: Number of VMs per Datastore
                                        Wayfarer

                                        Hey Justin,

                                         

                                        Do you need to quiesce/snapshot the VMs? Seems like crash consistent, array based, snapshots would be good enough in this situation. Then, it wouldn't matter how many VMs you had in the Datastore.  Especially if you have broken out the DB volumes from the OS volumes.

                                          • Re: Number of VMs per Datastore
                                            Justin Rich Adventurer

                                            for the time being the DBs are on the same datastore, which is why i was trying to use the nimble protection policy to generate the snapshots so that they would be crash consistent.

                                             

                                            from another post what i found was that the only way to trigger VSS was to do a protection policy

                                              • Re: Number of VMs per Datastore
                                                Justin Rich Adventurer

                                                So the memory addition made no difference.

                                                 

                                                Im trying to track down events within vmware to indicate what caused the problem but so far no luck. anyone know where i might find such events? with powercli im poking around with the get-vievent but they dont seem to have much for details.

                                                  • Re: Number of VMs per Datastore
                                                    Sammy Bogaert Wayfarer

                                                    Take a look on the 'Tasks & Events' tab of the VM to see if the 'Create snapshot' task succeeds.  All that Nimble does is send a Create Snapshot command to vCenter.

                                                     

                                                    If it fails, you can look in the Event Log of the VM to see the actual VSS error.

                                                      • Re: Number of VMs per Datastore
                                                        Justin Rich Adventurer

                                                        I dont see any failures, but some of them dont have snapshots on them. Im tried taking snapshots of them and they all go successfully. I suspect they only do one at a time and its not finishing in a timely manner. The tech said he can extend the timeout period (default 10min) to something longer.

                                                         

                                                        I've felt i've had other performance issues so I might work with a tech to try and tack that down.

                                                         

                                                        I still feel that this unit should be able to execute the snapshots in under 10 min.

                                                          • Re: Number of VMs per Datastore
                                                            Sammy Bogaert Wayfarer

                                                            Taking a snapshot doesn't take longer than a minute normally in VMware.  Unless VSS is having trouble inside the VM.

                                                             

                                                            I have checked one of our collections, and there is a one second gap between each 'Create Virtual Machine Snapshot' command of the VMs on a single datastore.  So for 10 VMs, it would trigger all of the snapshots in like 10 seconds.

                                                             

                                                            fyi: there is about 90 seconds between my 'Create Snapshot' and 'Remove Snapshot' task on the VMs hosted on Nimble.  So well below the 10 minute marker.