21 Replies Latest reply: Dec 7, 2016 9:49 AM by Adam Bond RSS

    vCenter Snapshots freeze MS-SQL server

    Adam Bond Wayfarer

      We're battling an issue with a virtual SQL server and vCenter sync'd snapshots.  We're not having snapshot failures, but the snap is causing just a bit of disruption so that some of our apps are timing out with VSS snap creation.  We don't have the problem when using Veeam's application consistent snaps so I think there must be something else wrong on my guest OS.  I know I can disable unnecessary VSS writers, but I'm a bit unsure which ones it would be safe to disable and still get the consistent snaps in Nimble.

       

      Here is what we have on our list of enabled VSS writers.  There is a way to disable them using this article: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031200

       

      vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool

      (C) Copyright 2001-2005 Microsoft Corp.

       

       

      Writer name: 'Task Scheduler Writer'

         Writer Id: {d61d61c8-d73a-4eee-8cdd-f6f9786b7124}

         Writer Instance Id: {1bddd48e-5052-49db-9b07-b96f96727e6b}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'VSS Metadata Store Writer'

         Writer Id: {75dfb225-e2e4-4d39-9ac9-ffaff65ddf06}

         Writer Instance Id: {088e7a7d-09a8-4cc6-a609-ad90e75ddc93}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'Performance Counters Writer'

         Writer Id: {0bada1de-01a9-4625-8278-69e735f39dd2}

         Writer Instance Id: {f0086dda-9efc-47c5-8eb6-a944c3d09381}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'System Writer'

         Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}

         Writer Instance Id: {27c00a72-3215-48b6-a699-7bd01b674fa8}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'SqlServerWriter'

         Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}

         Writer Instance Id: {c7c46f7d-8a70-4530-9445-30a5a80d7fd2}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'ASR Writer'

         Writer Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}

         Writer Instance Id: {c29e355b-c672-4ef3-8d34-7e75407e96ae}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'Shadow Copy Optimization Writer'

         Writer Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}

         Writer Instance Id: {0048c718-52fd-416e-bdcc-fc6efac7c369}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'Registry Writer'

         Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}

         Writer Instance Id: {2c79e02d-8724-469f-8bb8-280e73a21a89}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'BITS Writer'

         Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}

         Writer Instance Id: {abc3b9c3-61f6-4de9-a4de-873133b6c8d9}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'COM+ REGDB Writer'

         Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}

         Writer Instance Id: {f0908cd6-c87c-4dc3-9bb1-2d4b7862e538}

         State: [1] Stable

         Last error: No error

       

       

      Writer name: 'WMI Writer'

         Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}

         Writer Instance Id: {b8fc017d-7c54-439d-a612-e70e80bb98f7}

         State: [1] Stable

         Last error: No error

       

       

      Which of these are necessary in order to get a quiesced snap in VMware?

        • Re: vCenter Snapshots freeze MS-SQL server
          Scout

          Hi Adam,

          I see it's been more than 24 hours without a response to this question. Please feel free to contact support@nimblestorage.com and they'll be happy to help you resolve the time-out problem.

          Thanks for being part of the NimbleConnect community, and sorry we couldn't get this one answered promptly.

          Michael

            • Re: vCenter Snapshots freeze MS-SQL server
              Dan Bauder Wayfarer

              I would like to add that they are not the only ones seeing / experiencing this issue.  We too have had to deal with this and would appreciate a response a fix to this problem.

               

              Thanks

                • Re: vCenter Snapshots freeze MS-SQL server
                  Adam Bond Wayfarer

                  Thanks for the responses Michael and Dan.  We did have a ticket open with support, but it didn't really lead us anywhere. 

                  • Re: vCenter Snapshots freeze MS-SQL server
                    Dan Bauder Wayfarer

                    Further testing as led me to believe this is a Nimble Schedule replication issue:

                    Take a VMware snapshot including memory and quiescing the file system - no errors

                    From Nimble volume management click on the Take Snapshot button - no errors

                    Wait for the schedule Nimble protection scheduled snapshot / replication to occur - errors occur

                     

                    To me this pretty much screams there is a bug in the protection schedule routines, or they are doing something much differently from the take snapshot button.  Either case is bad news.  Hello Nimble, time to fix this.

                      • Re: vCenter Snapshots freeze MS-SQL server
                        Adam Bond Wayfarer

                        This is not the issue we have.  We get the same freeze/pause with schedule, manual sync'd snaps from the array, or by checking the quiesce box in VMware.  We don't ever use the "snap memory" option and my understanding is that the Nimble snaps don't use that option either.

                        • Re: vCenter Snapshots freeze MS-SQL server
                          Nickolay Milovanov Wayfarer

                          Hello Dan,

                           

                          I am sorry you are having an issue with your snapshots. Have you opened a case with Support to troubleshoot the issue you are having?

                           

                          From the little information I have from your post, I can state the following:

                          Replication engine does not involve any host/guest operation and the transfer is done after the snapshot already exists on the source array.

                          When you are making a snapshot from the volume level, the snapshot is Nimble Array crash-consistent. The vCenter is not engaged because this is managed by the Volume Collection.

                          When you are making a snapshot through vCenter you are making a snapshot of a single VM.

                          When you are making snapshots via a schedule on Nimble Storage array with volume collection which is configured for vCenter sync, all the volumes under the volume collection are calling to vCenter to quiesce all the VM's within those volumes.

                          If the vCenter is unable to finish the quiesce operation within reasonable time (due to performance or errors), Nimble Storage array will still take crash-consistent snapshot.

                          You may duplicate the behavior of the scheduled snapshots by taking a manually-triggered snapshot collection under the Volume Collection in question.

                            • Re: vCenter Snapshots freeze MS-SQL server
                              Dan Bauder Wayfarer

                              Unfortunately I have had zero luck with support when reporting these kinds

                              of issues in the past.  I see nothing to be gained by going that route

                               

                              You can have a stand alone volume that is protected and don't have the

                              ability to take a volume collection snapshot.  In this case I would hope

                              the take a snapshot button would use the settings from the stand alone

                              protection and integrate with vCenter.  It does not appear to work that way.

                               

                              Thanks for the info.

                               

                               

                              Dan Bauder, VCP 3, 4, 5, DCV

                               

                              UNC Charlotte, Information & Technology Services

                               

                              9201 University City Blvd., Charlotte, NC 28223

                               

                              Phone: 704-687-0274

                               

                              dbauder@uncc.edu <username@uncc.edu>* |* http://www.uncc.edu

                      • Re: vCenter Snapshots freeze MS-SQL server
                        Nickolay Milovanov Wayfarer

                        Hello Adam,

                        I sincerely apologize the information shared with you during the course of the case was not satisfactory answer to your question. I would like to clarify the behavior from the Nimble Storage Array perspective.

                         

                        When Nimble Storage makes an API call to vCenter to create a snapshot, the array has to wait until vCenter reported to array that freeze was successful before array can take a snapshot and only then let vCenter know that it has done so. The vCenter quiesce operation depends on the volume collection volumes membership on Array, VM's contained on each volume and the VSS writers present on each VM, which at times is not optimal.

                         

                        I completely understand the need for the consistent application state for the SQL database and the best course of action is the following:

                         

                        a) Create separate Nimble Array volume for database and one more for log

                        b) Connect the VM with direct connections to Nimble Storage Array

                        c) Move the DB to database and log to log volumes on Nimble Storage array

                        d) Install the Windows Toolkit on the VM which is directly attached

                        e) Disable vCenter sync on the Operating System , which is on VMware datastore

                        f) Enable Nimble Array SQL VSS integration on the volume collection with DB and log volumes for this VM

                         

                        With direct connection to Nimble Storage array, the quiesce time is minimized because the hypervisor layer is by-passed. Since array is only engaging single VSS writer on a single VM, the time to quiesce is also minimized.

                         

                        Please let Nimble Storage Support know if you need any assistance in completing above steps and ensuring the snapshots for your SQL server are application-consistent.

                          • Re: vCenter Snapshots freeze MS-SQL server
                            Adam Bond Wayfarer

                            I appreciate your reply.  Your recommendations are similar to those of support, however, we have just P2V'd this machine and migrated it to VMDKs for the data/log volumes so we aren't interested in doing the reverse of this.  We were not aware that the snapshots would act differently otherwise we wouldn't have chosen this route, but it is where we are currently.

                             

                            I'd really hoped to find a way to get the quiesced snaps to happen more efficiently so that we could keep our RPO low with Nimble replication like we had previously when it was a physical server directly connected to the array.

                              • Re: vCenter Snapshots freeze MS-SQL server
                                Chris McQueen Newbie

                                Hi Adam/Dan,

                                     This sounds very similar to a problem which we experienced and went through the process or Nimble Support, VMWare Support and finally on to Microsoft...

                                 

                                In Windows Event log we had errors about disks being surprised removed, we actually found that all VMs were having these however it was only the latency sensitive ones which we noticed it on.

                                 

                                We use Veeam along with VMWare, what we found was that Veeam did not display any errors as it uses its own VSS, VMWare calls on Microsoft VSS, Nimble calls on VMWare which in turn calls on Microsoft VSS

                                 

                                This link relates to the original issue which we experienced, and the proposed workaround

                                https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2006849


                                We tested the theory by building new VMs to test with, both as MBR and GPT, we found that the MBR VMs continued to have the issue, the GPT ones were resolved. Quite a lot of work as its a full infrastructure rebuild but it did resolve the problems

                                 

                                Further to this we had the same issues with EMC AppSync with RecoverPoints, same solution for us

                                 

                                Thanks

                                 

                                  • Re: vCenter Snapshots freeze MS-SQL server
                                    Dan Bauder Wayfarer

                                    Chris - Thanks for the input.  We've got about a dozen VMs running SQL server that we placed on volumes with vCenter Synchronization implemented.   This is where we are seeing the issues, yes they do have MBR disks.  If I understand your post is sounds like rebuilding these with GPT disks might help us out.  Or do we have to do both the GPT disks and implement the VMware KB work around as well?

                                     

                                    Thanks

                                      • Re: vCenter Snapshots freeze MS-SQL server
                                        Chris McQueen Newbie

                                        Hi Dan,

                                        Unfortunately the answer for us was to rebuild the VMs as GPT, we have since updated all templates in VMWare to use GPT as a standard and do not use MBR anymore

                                         

                                        On the VMWare link, the first Microsoft KB 2853247, Resolution 2 is the GPT fix. Resolution 1 outlines having a single partition on an MBR disk which is not possible on newer OS's so ruled that straight out.

                                         

                                        It sounds exactly along the lines of what we experienced and we went through all the proper support channels to get this outcome and it was VMWare which pointed us to this article, I'm sure you would but spin up a VM and give it a test to ensure it is the same issue which you have

                                         

                                        Thanks

                                      • Re: vCenter Snapshots freeze MS-SQL server
                                        Nickolay Milovanov Wayfarer

                                        Chris,

                                        Thank you very much! This is very good bit of information.

                                      • Re: vCenter Snapshots freeze MS-SQL server
                                        Nick Dyer Navigator

                                        Hello Adam,

                                         

                                        Sadly this is the downside to using VMDKs and relying on vCenter snapshots, and it's why for ultimate best practice we do recommend using in-guest iSCSI. Having said that, I'm told there are significant improvements to vCenter snapshot procedures in vSphere 6. I personally haven't tested this myself - here's a blog detailing the benefits from a friend at Veeam: http://www.virtualtothecore.com/en/vsphere-6-snapshot-consolidation-issues-thing-past/.

                                         

                                        All of the above disappears with Virtual Volumes and NimbleOS 3 you'll be pleased to hear.

                                          • Re: vCenter Snapshots freeze MS-SQL server
                                            Adam Bond Wayfarer

                                            Hi Nick,


                                            Sorry I let this one go dark.  I did upgrade to ESXi6 and the snapshots seem to be better, however I am having an unrelated problem where the OS doesn't seem to be recognizing the quiesced snapshots as there are no entries in the application log indicating the freeze, thaw, and I/O resumed like we would expect.  We're working with Microsoft now to see who wins the finger pointing match - Microsoft or VMware. 

                                             

                                            I have a question about the VVols comment you had above.  What will the migration to that look like?  Will there be a way to convert the volumes over to this, or will it be a manual data migration in the OS to the new volumes?  (I fear it is the latter) 

                                              • Re: vCenter Snapshots freeze MS-SQL server
                                                Nick Dyer Navigator

                                                Adam Bond wrote:

                                                 

                                                I have a question about the VVols comment you had above.  What will the migration to that look like?  Will there be a way to convert the volumes over to this, or will it be a manual data migration in the OS to the new volumes?  (I fear it is the latter)

                                                Hi Adam!

                                                 

                                                The great news is that it's very simple to move to and from VVol deployments - as it is all controlled via Storage vMotion. It's possible to move a vmdk in VMFS to VVol, and vice versa without any gotchas. And now that we finally support XCOPY in NimbleOS 3, it takes all the copy burden away from the ESX host and network - meaning these conversions will take place even more rapidly than before

                                              • Re: vCenter Snapshots freeze MS-SQL server
                                                Adam Bond Wayfarer

                                                Nick Dyer wrote:

                                                All of the above disappears with Virtual Volumes and NimbleOS 3 you'll be pleased to hear.

                                                Nick - can you expand on this perhaps?  Specifically - are there differences in the way application aware snapshots are done with VVols?

                                          • Re: vCenter Snapshots freeze MS-SQL server
                                            Duke Boles Wayfarer

                                            Adam,

                                             

                                            I dealt with this since we purchased our Nimbles. I have solved the problem. It is as @Nick Dyer says. If your application is at all latency sensitive you must use direct attached iscsi disks. There are lots of gotchas that I would be glad to share if you decide to go that route. There are many but one is that you don't want to go with in-guest iscsi if you use Vmware Site Recovery Manager and array based recovery. In that case you need physical mode Raw Device Mapping. These are identical to in guest on the Nimble side but instead of the iscsi initiator in the guest attaching the hosts attach instead and present to the VM.

                                             

                                            There are many more gotchas I am glad to share if you want.

                                             

                                            Duke

                                             

                                            edit: P.S. I love Nimble support and they do a great job but if you take this to them they will lead you down many wrong paths. I got the final key pieces of info from an internal engineer. I was only offered that after complaining to my account rep. Once I had access to this person I had the whole thing solved in two weeks after eight months of gut wrenching struggle bus action.