To add onto this.
If you are taking verified snapshots that take a long time and are snapping multiple volumes.
Try to leverage all the servers in your DAG to get around the single iSCSI MS VSS stream issue. Normally MS VSS has to queue up each volume in a protection policy. It doesn't do them all at the same time and only leverages a single NIC.
(MPIO Doesn't help snapshot verification speeds)
So lets say you have multiple database and log volumes in your DAG.
Normally, if you throw them all into a single protection policy with it'll leverage one server, you'll be stuck with VSS waiting in line to go to each volume.
+Volume Protection Policy
INSTEAD TRY THIS:
+Volume Protection Policy1
+Volume Protection Policy2
This way you can actually have both policies run at the same time, but will leverage the VSS of each server in the DAG, which would then also leverage a single NIC on each of those servers.
The DAG will replicate the changes to each server at the end. In theory this should cut your snapshot verification time down.
Just to confirm your understanding is correct:
There are two methods to backup Exchange using the Nimble VSS requestor/hardware provider. The first in the protection policy is "unverified" - this calls in to take a full VSS snapshot backup but doesn't truncate logs. The second is a verified backup - this does the same in terms of the full VSS snapshot backup, but afterwards verifies the backup using eseutil against a cloned copy and then when complete our VSS requestor calls in to Exchange to state backup with verification is complete and log truncation can occur on that database. Note it doesn't matter whether you configure this on an Active or Replica database, the process is the same. Once log truncation is complete locally the Exchange replication service will call to other DAG members with a db copy and instruct them to truncate also. Typically aim the verification protection schedule at the replica copy so that it offloads the heavy verification process from the active database.
For others that might not have seen it yet (I know you have Julez) Nimble OS 2.1 provides an additional option with verification to skip the database verification (only supported if at least 2 copies of a database). See the Nimble OS 2.1 blog series for more info.
Yep I understand how it works. 2.1 is also something we plan to go to here soon hopefully.
As for right now, we leverage the full verified snapshots daily to truncate our transaction logs. Per a discussion with Nimble engineering VSS writers are locked during the eseutil job preventing any other job from starting and basically just queues up everything in that protection policy. Even worse this process cannot leverage MPIO as it's only a single iSCSI data stream.
With a DAG since you have an active and passive copy of a database, IF you have multiple volumes for the databases you can utilize a VSS writer from each DAG member as well as an additional iSCSI path to effectively shorten that time, explained above.
We came across this solution because we're looking at about 20+hrs to do a verified snap since we have more than 3TB of database storage split between a couple volumes.
In a similar situation, Exchange 2013 DAG (2 servers). Just wanting to make sure i'm understanding the best practice. Any advice, suggestions?
I have 2 Exchange Mailbox Servers:
I have 5 Nimble Volumes:
MAIL-OS - C: drive as VMDK for each server
MAIL1-Data - D: drive as M$ iSCSI for MAIL1
MAIL1-Logs - E: drive as M$ iSCSI for MAIL1
MAIL2-Data - D: drive as M$ iSCSI for MAIL2
MAIL2-Logs- E: drive as M$ iSCSI for MAIL2
I have 10 mailbox databases split 5 active per server. I realise having all the DBs in a single volume means I can't see which databases are having higher IO but this isn't a big deal for me.
I have 2 Protection Policies:
MAIL1-PP - Contains MAIL1-Data and MAIL1-Logs
MAIL2-PP - Contains MAIL2-Data and MAIL2-Logs
I use CommVault to take a full backup of the databases and truncate the logs every night.
Hey Christoph, assuming those are just standard databases you only need to snap the database/logs from one of the servers, you don't actually need a protection plan for both. The way the DAG will work is that once the snap is taken, the log truncation will get replicated to the other DAG member.
The reason why I had multiple protection policies above was because we have 2 volumes for databases and 2 volumes for logs, on each server (one is a set of volumes for active databases the other is a set for our archive databases).
I would say if you want to retain 2 protection policies like that, that you stagger them so they're not trying to overlap each other, I could see that causing problems if both servers try to take a VSS snap of the databases at the same time.
The Nimble volume layout looks solid though along with the VMDK. Feel free to send me a private message though I have a whole slew of websites, best practice guides from VMware and things I could shoot your way.
However, I'm not familiar with the way CommVault works with the system, so if anyone has more insight to that, feel free to chime in.