Is the DBA / you just using SQL Jobs to create the backups or are you using a 3rd party program to initiate SQL backups?
If you're not using SQL, I would try that first. Just run a manual backup task on the database to see what performance you get.
I've got an SQL server which has all the disks inside VMDK files. We have 5 volumes (OS, SQL Data, SQL Logs, tempDB, backups). I've just kicked off a manual backup, uncompressed and the Nimble is sitting around a solid 100-120MB/sec.
I'm not sure what throughput we get when backing this up with the commvault agent.
He's using SQL Jobs to create the backups for each database. The database itself is not compressed, but the backups are. The volume that the backups are going to has compression turned off just because we don't want to get that occurrence where we get negative compression on the storage because we're compressing compressed files.
I'll elaborate on the whole setup.
Basically all drives are on the same Nimble array.
VMware 5.1 infrastructure, all NICs on the VM data and iSCSI for the guest are all VMXNET3, all network connections are 1gbe on the server side. 2 iscsi connections 1 data.
On the SAN side we have 10GbE connectivity up to our Nexus 5k switches.
Using Nimble OS 2.x with the Nimble connection manager software.
Server 2008 R2 SP1
SQL Server 2008 SP1
And the drive layout looks like this for us.
C: and D: are VMDKs.
C: we normally use for the system drive and D: for a sort of apps drive, so SQL server is actually located there.
The rest from here on out are Windows iSCSI guest initiator volumes:
I: Backup Volume
V: Actual SQL DBs
So we're backing up to I: in this case.
Average bandwidth we're seeing is 20-40MB/s
What is SQL performance like normally? Have you run SQLIO on the SQL data volumes?
I would check:
1. Performance Policy for all volumes. I use VMware ESX 5 for system drives. I made a VMware-No-Cache for Backup drives that aren't iSCSI attached directly in the VM (4k, compression, no cache). Use the appropriate SQL performance policy for the SQL volumes.
2. Do you run Jumbo Frames? Remember to set Jumbo frames on the network adapters inside the VM. Run "ping <nimble iSCSI> -l 9000 -f" to check that jumbo frames are working end-to-end.
3. Run SQLIO on the data volume (in a quiet period) to make sure the sql data volume is working correctly.
We don't really see any SQL Performance problems. I'm not sure if we've run SQLIO specifically on these volumes since they've been created, we can attempt to do that during the evening to see what it looks like. But I would like to think considering we used Nimble and Microsoft best practices for SQL server using the SQL Performance policies per Nimble and 64K allocation unit size per Microsoft's best practice we should see solid performance.
1. Performance policies for all volumes are set to SQL (with the exception of our C and D drives seen above which are VMDKs use the ESX 5 policy, and I drive for backups use the default policy and have the default NTFS allocation unit size).
2. No jumbo frames per suggestion of Nimble engineers because it wasn't worth the gain, plus it seems as though our Nexus 5ks are an all or nothing type switch, you can't enable jumbo for certain ports or port groups, which is problematic.
Dave could you give me some insight as to how you did your SQL backups and what kind of hardware it runs on? I used you as an example with Nimble support yesterday and they were wondering how you were configured to see if there was just a difference in setup (CS260 vs 460 or 10gb end to end, multithreaded backup etc...) That'd be very helpful.
Sure. We're only new to Nimble (in the last month), so we did some performance tests with SQLIO as you do, but that's no where close to what it will do in real life.
The setup I'm using here is:
Nimble 460G with 10gbit iSCSI interfaces.
Connected to a Nexus 5548 10gbit switch.
Connecting to a HP C7000 Blade Chassis via 10gbit Flex Connect.
Each blade is then provisioned with 2x 2.5gbit iSCSI interfaces. This is what shows up in ESX.
So theoretically, each ESXi host has 5gbit connection through to the Nimble, with a max throughput available of 600MB/s.
Our SQL server is set up with 5 disks, configured as 5 separate volumes all residing on the Nimble. They're all attached to VMware, we have no iSCSI from within our SQL Server VM directly to the Nimble (against Nimble best practices).
vol1: C Drive, Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)
vol2: SQL Data, Standard VMware ESX 5 performance profile (4KB blocks, Compression, No cache)
vol3: SQL Logs, Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)
vol4: TempDB, Standard VMware ESX 5 performance profile (4KB blocks, Compression, No cache)
vol5: Backups: Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)
I've purposely disabled cache on C Drive, Logs and Backups to maximise read cache available for other workloads. I *may* enable it again for the C drives as with pagefile, registry and so on it may help.
Test 1: Standard backup to backups volume.
Avg transfer of 90MB/s
Test 2: I re-ran setting all Nimble disks in VMware to change round robin path after every IO.
Avg transfer of 90MB/s
Test 3: Manually copied a .BAK file from the backups volume to the SQL data volume, just to test normal windows transfer speed
Transfer started around 200MB/s but over time dropped to a steady 130MB/
Test 4: Manually copied that .BAK file from the SQL data volume to the backups volume.
Transfer speed stayed perfectly at 190MB/s the whole time.
So the two things I got out of this was:
1. The major speed killer here seems to be SQL servers ability to read the data from the database and write it to the .bak file. This could be RAM / integrity checking ? CPU usage was fine throughout the process <20% usage.
2. Copying data, even large sequential still seems to perform better from read cached volumes.
I'd be interested to see how you go with this, as again we're only new to Nimble and my SQL DBA skills certainly aren't the best so I'd be keen to see what extra performance we can squeeze out of it.
Sounds like what you guys have Dave is a little higher end than what we're on actually. We actually have two CS260s right now, on one side it's a 10gig but still using 1gig connections to the servers until we can make a switch to UCS or something similar. The other side which is where our SQL boxes are is a CS260 also, but is currently the 1gig model. We plan on upgrading both sides within a year or so to 10gig end to end, but until then this is what we have.
I think we're in a similar boat though, this is actually my second company I've helped deploy Nimble at now, but previously we were using all VMFS volumes even for the SQL since we were using 3rd party backup utilities we weren't leveraging Nimble VSS verified snaps, so nearly identical to what you are doing. This company is different however and we're using a lot of windows guest initiator iscsi volumes. Our DBA however is more comfortable at this point in time to do the native SQL backups rather than to use the Nimble snaps.
I do understand why it's working differently than what our old storage was. We were backup up to a different disk aggregate at that point, so we never expected to see a 1 to 1 performance trend for this type of oddball action. But we also didn't expect to see no disk pressure, cpu pressure, memory pressure or network pressure while seeing marginal at best throughput on the array. At worst I would have expected 2x the performance loss, but not 5x in this case. I'm sure it's a configuration issue of some sort on our end, but I just don't know what.
I'm with you though, I personally am no DBA, my area of expertise is usually VMware and Windows Server(s). The rest of my knowledge is anywhere from somewhat knowledgeable to I know enough to get myself into trouble. Storage is a middle ground between I generally know what i'm doing and somewhat knowledgeable. SQL stuff though is the lower end of the stick
Thanks for the info though Dave, knowing your performance metrics on those various tests is very helpful!
Just for giggles and grins, trying running the SQL backup to multiple .bak files, even on the same volume. So rather than having SQL go to backup.bak, have it go to backup1.bak, backup2.bak, backup3.bak, etc. I run our 200 GB database to 10 backup files like this and went from around 40MB/sec to around 130 MB/sec. Restores are faster when restoring from multiple files as well.
Sorry Joshua, I hadn't updated this thread for a little bit. Our DBA has since tried that and it does increase the speed, but still doesn't explain why we see slower backup performance than before. Basically we're not really doing apples to apples when we do this, but it has worked as a band-aid for the time being, we've got the time closer to where it was. We're still experimenting with this right now to see if we can't figure out where the bottleneck is, because we've yet to find one anywhere, disk, network, memory or CPU. I'd like to see something somewhere show me a bottleneck.
We are in fact seeing some similar behavior, not from SQL but from a different database system. We are a brand new Nimble customer so I haven't had a chance to start beating the bushes to solve the problem yet -- indeed, it's not really a problem, more a curiosity.
It's a back office system is a program running on a database called Progress, and once a day we run a "live backup" (basically a snapshot). On our crappy old HP MSA2012i it would take around 10 minutes to complete this, and the entire company would slow to a crawl if they were trying to access that program. After 10 minutes everything was just fine. Since moving everything over to Nimble, it now takes 72 minutes to run the identical backup process (it's only about 15 GB of data).
The server is 2008r2 on a VMDK version 3 volume -- it's one big vdisk, everything under the C: volume. Performance Policy is "ESX" (the Nimble's default settings). So like you mentioned, it's reading and writing the same data to and from the same volume, but it's dog slow. I do notice the overall response time for this system is better during the backup period with the Nimble , probably due to the caching.
I was going to look at moving just the data files off to a directly mapped iSCSI volume (instead of running inside a VMDK) and see if it's better.
Thought I'd chime in as we're running a similar setup. Our enviornment:
- SQL 2008
- Windows 2008R2, iSCSI drives
- H: (Data)
- O: (Logs)
- T: (Backups - see what we did there? )
- Servers connected to Nexus 5K
- 2 MPIO 1GB NICs for iSCSI
- 2 Teamed 1GB NICs for standard traffic
We moved our SQL from a physical cluster connected to an old XIO SAN to a virtual cluster inside a Hyper-V cluster on the Nimble. We run full nightly backups and they finish just under an hour, similar to our old setup. We've always been more focused on uptime than performance, and with a window that didn't change much we didn't really think to investigate from that end. Anyhow, it seems like a similar enough setup that it could be helpful. If it gets down to it I could probably run some tests for comparison.
Nice drive layout
A couple questions for you.
First how are you doing your SQL backups, I'm assuming from within SQL server to a bak file on T:
Second how are your drives mapped. Are you using the 2008 R2 guest initiator from directly in that VM or are you passing it through Hyper-V as a virtual disk?
Last so you say you're getting the same performance on your backup. When the backup runs how long does it run and during the backup what sort of Nimble bandwidth spike do you see on the array? 20MB/s 50MB/s, 200MB/s?
Correct, we are using running our backups within SQL to a .bak on the T drive. All SQL drives are mapped directly with the 2008R2 initiator with 2 1GB virtual NICs, MPIO (least queue depth), and Jumbo Packets. The OS drive is a virtual disk also on the SAN.
Our backup runs for ~50 minutes nightly and the sum of our .baks for a night are ~45GB, here's what I'm seeing:
- Data drive
Read spikes to 130MB/sec, averages around 60 MB/sec
- Backup Drive
Write spikes to 60 MB/sec, averages around 20 MB/sec
I'm getting averages by eyeballing, for some reason I can't use a small enough custom window so the calculated numbers are being skewed by big periods of inactivity.
So still not completely sure what's going on with this since we're not seeing any bottleneck really anywhere. However, we have found that if you split up the backup into multiple files (even if it's on the same disk volume) where it can essentially use multiple threads it runs faster. I haven't had a chance to really look to see what the array is doing since our DBA has modified this. I know it's sort of a band-aid, but still doesn't explain why we're seeing this hidden bottleneck.
I'm sorry about your issue here. It looks like the backup software here is copying the files in a single threaded fashion.
If you can try two test:
1. Manually copy one large from one drive to another drive. Measure the performance.
2. Manually copy TWO large files from one drive to another drive. Measure the performance.
If test number 2 is faster, that indicates a single threaded operation with the backup software. Depending on the backup software, it may or may not be tunable.
If the performance numbers on both number 1 and 2 are the same, then it may be the array, network, or host.
Feel free to reply back on the results of your test
Thanks Huy, as seen above in a response to Joshua Post we have verified that to be helpful, but still does not explain why the slowdown is occurring when using the same backup procedure and same server hardware and same software. Even single threaded we should be seeing a bottleneck somewhere I would think.
The problem is single threaded I/O processes would send 1 I/O and wait, send one I/O and wait.
You can think of this like an amusement park ride. Let's say the ride can do 32 at a time. You send one person at a time, wait until the person finishes the ride then send one more person to line up for the next ride. It would be more efficient to send two, or three, or up to 32 people in the line, have them go on the ride and then fill the line again with 32 people.
This is what is happening with single threaded. The thread would send 1 I/O in the queue (line), wait for it to come back, and then fill the queue again with 1 I/O event. By threading it out, you are filling the queue with more I/Os at the same time.
The support engineer is correct that in general Jumbo frames does not help with most workloads. With random workloads, it does not help much. With Sequential workloads like we are seeing here, it MAY help, but no guarantee. From experience, you do sacrifice manageability when configuring jumbo frames, that's why most customers don't implement them.
Right which is all well and good, but still doesn't explain why it was faster before, but 6x slower now on the same VM on the same hardware. Single threaded process would be based on the server hardware it's located on, if it's on the same server hardware, using the same CPU to process the backup, I don't know why there's a difference, that makes absolutely no sense. I'm still searching for something tangible I can visibly see bottle-necking the job. Maybe there's that much of a difference in line speed to cause the problem, I don't know yet, but until I see some sort of bottleneck somewhere like in queue depth, cpu, memory, network, something, somewhere, I'm going to have to keep searching.
Think of it this way. Old system we saw 30 min backup jobs. New system we see closer to 4hrs. Same server, same backup job, same process.
If I were to multithread with multiple backup files on the old system as we are now with the new, would we not see that 30 min backup job decrease as well? Logic says yes.
If I run 8 backup files on that 4hr job, and get 30 min backups. What would I get on the 30 min backup job? 4min?
That's the logic I'm looking at here.
The the only differences I'm seeing is the fact that we no longer have a secondary disk aggregate (but we're not seeing any disk pressure on the new storage anyway, so that shouldn't be the problem).
The other would be we're using 1gb multipathed iscsi connections instead of 4gb multipathed fc...(but again I'm not seeing any network pressure either, we're not flooding those iscsi connections).
Thanks for the additional info.
The network latency for 4 Gbit FC is 1/4 compared to 1Gbe. The latency of 10Gbit Ethernet is 1/10th that of Gbe.
Normally it doesn't matter since we now live in a multithreaded world which would parallelize the I/O request. But with single threaded there is no parallelization.
But with single threaded, you send 1 I/O and wait, the transit time from port to port can now be significant. With multithreading it's not an issue since you're doing more work collectively.
Gotcha, so it's going to be a network latency issue, not a network bandwidth issue. So if we make a transition from 1GbE connectivity to 10GbE connectivity on our servers we should be golden and actually see a lower backup time (single threaded) than we were seeing previously. Hopefully we can try this out in the near future since our Nimble already has the 10GbE controllers, we just need to finish the connectivity on our servers. In the mean time I'll keep this thread open, we hope to test this soon.
I would also try to turn SQL client side compression off for the dump file, and then turn file system compression on for the dump volume. For logs and dump volumes you want to turn Cache off, but not compression. The Array get's better write performance when compression is on. Since the Nimble is going to compress the data in the array, you should see a higher MB/s transfer rate, and the same if not better compression on your storage.
I verified with our DBA that we were already doing this, I was pretty sure we were, but just wanted to verify again, he said that is correct.
The rest is just the Nimble performance policy for SQL logs, which we are using for the appropriate volumes, along with using MBP for log/database/etc... volume layout.
All that being said I think that just leaves us with the suggestion above of trying 10GbE which we hope to do soon and see a significant performance increase.
1. We're using 64K, I think that was Dave who was using 4K.
2. Not sure about the VLF as overall performance was slower for the backup with all databases.
The temporary band-aid was that we are using multiple backup files (which we weren't doing at before, but probably should have been) until we can migrate 100% to 10GbE. The belief from Nimble support is that we are being bottle-necked by our 1GbE connectivity which is probably exacerbated by the fact that we're not using the maximum number of paths since we're going from 1GbE servers to a 10GbE controller. Like it's a queue problem or something of that nature.
Unfortunately until we make it to 10GbE end-to-end I don't think we can really say that it fixed it, but it's also my belief that the 1GbE to 10GbE is causing trouble with snapshot duration also. Personal recommendation and I by no means am a Nimble engineer or anything. But I think if you have 1GbE servers get a 1GbE controller until you can actually go to 10GbE end to end. Either that or make sure you have additional NICs to use in your servers to make use of the maximum number of paths supported.