The most important job of a DBA is to secure a good backup of all production databases. There are many ways to keep a good backup of production databases such as:
- - Full or incremental RMAN backups to tape or to disk
- - CPIO *nix commands to tape or to disk
- - Third party backup software such as CommVault, NetBackup, etc.
- - Storage snapshots
- - Storage replication or software replication
While all of these options are good for certain scenarios, let’s concentrate on Disaster Recovery. Many moons ago, I remember backing up Oracle databases to tapes and send them offsite for storage. Imagine how many tapes are needed if you have terabytes or petabytes of data? Not to mention the retention period for these tapes. Over time, the amount of tapes for offsite storage can be huge and costly. Now, let’s say that a disaster strikes (i.e. fire, tornado, hurricane, etc.), and your datacenter is no longer in service, you either bring up your production at another datacenter (if you have one) or a third party datacenter such as SunGard.
Here are some high level tasks that I could think of that would get your production up and running minimally.
- - Database servers
- - Application servers
- - Storage
- - Networks
If you’re lucky, you can get all this up and running in a day. Now here comes the time consuming task. If the tapes have been recalled and onsite, you need to restore the most business critical databases first. Again, if your databases are large, it can take hours or even days to completely restore the data. Let’s assume that the data restoration went smoothly, now you need to verify the data before opening up to the users. Assuming the best case scenario, you say “hah, I can get my production up and running in a day. What’s so difficult?” Well, the difficult part is when you move back to the original datacenter after it has been repaired. Also, the cost associates with the DR infrastructure and time can be mind boggling. If you say that this does not happen frequently, and you are okay with spending more time than required, then more power to you. After all, you are responsible for your company’s data and you know what’s best.
As for me, I’d take a different approach. Nowadays, almost all storage vendors have replication feature that would allow you to replicate your production data from one storage array to a second storage array. An ideal replication architecture is to have two datacenters as far from each other as possible. The distance between the two datacenters can vary depending on your disaster recovery concern. For example, if both of my datacenters reside in Florida (say 50 miles apart), this will not help me, God forbid, if Florida were hit by a hurricane. Now if one of my datacenters is located in Texas or California, the odds of having both datacenters destroyed at the same time are slim.
So, let’s assume that my main datacenter is located in Miami and my Disaster Recovery datacenter is located in Los Angeles. I could use my storage replication feature to replicate most, if not, all of my data. This way when a disaster strikes, I can bring up my data at the DR site in minutes and not hours or days. The best thing about having a DR site is you can test your DR procedure on a regular basis. This will give you peace of mind knowing that you can bring up your production without fear.
Another point I want to make is you don’t have to dedicate a datacenter just for DR. You can spread your production workload across both datacenters and replicate to each other. This way you can protect your data against disasters at both sites and also not wasting money and resources for maintaining a DR site.
Can’t afford a second or third datacenter? Try replicating to the cloud! Nowadays, there is a number of service providers providing DR as a service, meaning you replicate critical data from primary datacenter to a datacenter managed/owned/operated by the service provider. Folks such as Virtacore offers various connectivity options (private circuit, IPSEC VPN or MPLS) to allow for replication, as well as reservation of compute/network resources (CPU/memory & dedicated VLANs) for DR testing, and/or failover. Even if you have a slow pipe, or don’t want to pay for seeding costs, you can have a storage array shipped to you for replication within the datacenter (for seeding), and then send back to SP’s datacenter for delta sets.
In summary, unless there is a compelling/regulatory compliance reason to use tape, get off that legacy **** and move on to storage replication!