5 Replies Latest reply: Nov 6, 2014 9:20 AM by julez RSS

    Virtualized Exchange DAG Failover During VM Snapshot

    julez Adventurer

      This question is for anyone running a virtualized 2010 or 2013 Exchange DAG and leveraging snapshots of the actual datastores the VM is located on.

       

      In our environment we run a couple Exchange 2013 DAG members and take an array snapshot of the datastores those members are on.  When the snapshot is taken like all snapshot/backup systems that leverage VMware snapshot technology, VMware takes a snapshot.  When the VMware snapshot occurs it causes a DAG failover.  I've gone through and read several blog entries about changing cluster delays etc... to eliminate that issue.

       

      None however seem to stand out any more than the Veeam article here.

      KB1744: Tips for DAG Exchange Backup and Replication in vSphere

       

      Granted we aren't leveraging Veeam, but again like I said above Nimble and more or less everyone who leverages VMware for backups and snaps leverages it about the same way.

       

      So...
      Question 1, has anyone else seen similar behavior and how did you fix it?
      Question 2, the 5th line in that Veeam KB above talks about snapshot.maxConsolidateTime being reduced to 1 second to stun the VM.  The up side to this of course is that it decreases that stun time the VM would have to be stunned.  The down side is it doesn't give the array near as long as the default 6 seconds VMware makes that value and could cause the array to fail the snapshot if it doesn't have enough time to complete the snap.  So do the Nimble engineers see any problem with this?

        • Re: Virtualized Exchange DAG Failover During VM Snapshot
          valdereth Adventurer

          How about protecting the passive nodes with the vCenter synch template and the active nodes without the vCenter synchronization?

           

          I've seen agent based backup solutions run into the same problem, where the VSS snapshot that gets triggered causes a failover.  So its not entirely accurate to say that VMware snapshots are the cause. 

            • Re: Virtualized Exchange DAG Failover During VM Snapshot
              julez Adventurer

              There isn't really a "passive node" in this case, they all have active databases on them.  Database location shouldn't really matter as the volumes with those databases are not snapped during the VM snapshot, those are Windows iSCSI guest volumes and not VMDKs.  Ultimately the problem I'm sure is the fact that the timeout for the failover is the problem since DAG failover is touchy to say the least.  They don't always seem to failover, but have definitely done so more than a few times.

               

              But was curious as to what others have seen.

            • Re: Virtualized Exchange DAG Failover During VM Snapshot

              Q1 - Yep we had this feature when we upgraded to Exchange 2010 and took a backup. We resolved it by making those cluster changes to exchange (in the KB). A google search shows it's quite a common fix for anything that runs MS clustering (also fixed our SCOM ops mgr failover too)

              Q2 - I haven't set this and have no issues.