7 Replies Latest reply: Feb 26, 2015 12:00 PM by Jeremy Mooney RSS

    Nimble Replication Priority

    Chris Haynes Wayfarer

      Is there anyway to prioritize the order of volume replications?  When I add a new volume to replicate, all my other replicating volumes that only have MBs to update each day get queued up behind the one large replication job that is GBs or TBs in size.  I don't care how long the one new volume takes to finish its initial replication, but I want the other replication jobs that are just updating hourly or daily changes to have priority so they stay current.

       

      - Chris

        • Re: Nimble Replication Priority
          Dmitriy Sandler Adventurer

          Chris,

          Are you saturating the replication bandwidth or any QoS definitions? There are multiple replication streams (3 by default) that are open, each corresponding to a different volume collection. Is the new vol-coll the only one that is replicating a large amount of data?

            • Re: Nimble Replication Priority
              Chris Haynes Wayfarer

              Yes I am using all my available bandwidth.  I’ve studied this a little closer and it is indeed using 3 streams, but it appears that one stream, which is being used by the large newly added multi-TB volume, appears to be using the majority of the bandwidth, severely hobbling the other two streams.  I assume the streams are assigned based on the order the jobs are kicked off (largest lag time).

               

              I am transferring from a source site with a 100M connection to a destination site with 50M and have QoS throttled to 25M during business hours M-F and unlimited after hours (12 hrs each), and unlimited on weekends, which is a fair amount of bandwidth and probably an average or greater representation of your typical customer’s bandwidth.  I just brought these two sites online with Nimbles a couple months ago, with prod data at both sites and needing the data at both sites replicated to the opposite site.  I was able to seed one of the site’s data by locating the Nimbles side by side on the same LAN at the 1st site, but I couldn’t do the same for the 2nd site after moving the Nimble to its final destination, so I’m left to seed that site’s volumes across the WAN one volume at a time, which is what is causing my problems.

               

              Besides the new large volumes that I need to get replicated over the WAN one at a time, I have an already replicated volume collection containing most of the VMs, which has a daily change rate of between 50-100GB.  Then I have about a dozen other critical Exchange & SQL volume collections that are already replicated that fire on various hourly & daily schedules.  The hourly jobs typically only have MB or low GB change rates.  So what’s happening is the one big job that is doing the initial replication is taking up 80-90% of the bandwidth, the 50-100GB daily VM job is in the #2 stream (insert joke here),  getting almost all of the remaining bandwidth, which ends up running almost all day.  That leaves the 3rd stream cycling through all the remaining Exchange & SQL hourly replication jobs which are typically only MB in size, but are only getting what appears to be <1% of the bandwidth available.  Each job ends up taking hours to complete and those jobs continue to queue until they can eventually get caught up on the weekend when it can run the full un-throttled bandwidth from Fri night to Mon morn.

               

              What I would like to be able to do is give the more critical hourly replication jobs the highest priority and/or the initial replication job the lowest priority.  Otherwise I am going to be behind on all my replication jobs for months as I cycle through each of the large multi-TB volumes that need to replicate across the WAN.  Another nice option would be the ability to seed these volumes via non-Nimble storage, which would circumvent this whole issue to begin with.

               

              - Chris

                • Re: Nimble Replication Priority
                  Malcolm Watson Newbie

                  Chris,

                   

                  Can I ask you if you received an answer to your question from any source?. I have received delivery of 2 Nimble SANs and will be where you were at the start of your problems in just a few weeks from now, with exactly the same issue.

                   

                  Thanks,

                   

                  Malcolm

                    • Re: Nimble Replication Priority
                      Chris Haynes Wayfarer

                      I haven’t received an answer yet, but after asking a couple local Nimble engineers, there doesn’t appear to be a solution for this yet, other than a 3rd Nimble as a “floater” seeding system.  In retrospect, knowing what I know now, I would have done things a little differently at the 2nd site.  I would have started with the largest (and least critical) volumes first and just moved all those one at a time until they were replicated and all caught up, then followed that with the general VM datastores, and then moved the most critical volumes (SQL/Exchange) last.  That way I could have kept my most critical volumes replicated on our old storage solution until we cut them over last and then we wouldn’t have had the replication back log on the Nimble.

                       

                      We also had the luxury of not being in a big hurry to migrate, which may not be the case for you or others.  Also, if your large volumes are also critical volumes, that doesn’t help much either.  Of course after seeing how ridiculously low the latency was on the Nimble, I was chompin’ at the bit to get my heaviest workloads migrated over so everyone would see the benefits of the big purchase we just made, so sure enough, that’s what I did.  Didn’t consider the replication impact of that decision until it was too late.  20/20.  Lesson learned.

                       

                      Bottom line: No solution yet, but let’s hope Nimble is working on it.  Anyone else who would like to see an external replication seeding process and/or replication priority functionality, please comment here, so Nimble is aware of the demand.

                       

                      - Chris

                    • Re: Nimble Replication Priority
                      Jeremy Mooney Newbie

                      My first thought would be to look at the links - are they native at that rate, or delivered as a limited handoff? If it's just a 100Mbps and 50Mbps on say a native 1G link with the provider, it might be an option to just temporarily upgrade the links for a week or two. Alternatively if you can control traffic yourself and can switch to burstable service with 95th percentile billing, you could burst for up to 36hrs in a billing cycle (just make sure to limit to the 100 or 50Mbps for the rest of the month).

                       

                      If the links are set as-is, it might be worth examining the replication traffic itself. I'd imagine on that size link there's some traffic shaping device. If replication is implemented as a TCP connection per stream, and a connection is per-replication-job (rather than reused, or re-established regularly), you may be able to just use traffic shaping on long-lived streams to force the priority to less than the shorter streams (or maybe no more than 30Mbps each). That may allow relaxing the limits during the day as well (other traffic would still take precedence). If they're not nicely bundled per-job but are at least broken out to TCP-connection-per-stream, you could at least enforce some connection fairness distribution.