19 Replies Latest reply: Apr 16, 2015 11:07 AM by Ian Noble RSS

    Pulling the Trigger On A CS300

    pokermunkee Newbie

      Hi all,

       

      This will be my first shared storage system. I did help setup an EqualLogic for a sister company a year ago with a 2 node VMware cluster and it's been working great other than a few bad firmware upgrades.  I have a great VAR that I have worked with for a number of years but want another opinion before I spend a lot of money on my employer's behalf. I've been impressed with the Nimble presentations and don't think there is any other product out there I'd want.  I do think VMware's VSAN is interesting, but too new and expensive to get on board at the moment.


      My current environment is 8x ESXi hosts with about 45 VMs.  Not a lot of I/O intensive apps.  I have 3x physical SQL 2000 servers that I can't upgrade for at least another year.


      I'm a Dell shop and have no reason to change.  I'll be going with 3x Dell R630 servers for my hosts. Probably 320GB each, dual procs with 12 cores each, VMware on mirrored SD cards.

       

      Will buy two HP ProCurve switches dedicated for iSCSI traffic.  My core is a ProCurve 5412zl but will keep iSCSI isolated on it's own switches.

       

      Our CS3000 will have 24TB raw (12x2TB, 4x160GB SSD).

       

      Now to my questions...

       

      My VAR is recommending we use 1GbE instead of 10GbE since our requirements don't justify the costs of 10GbE....with having multiple 1GbE ports trunked and VMware's MPIO, performance will be fine they tell me.  Is this sound advice or should I bump up to 10GbE?  Could I order my servers with 10GbE but connect to 1GbE ports to future proof?  I believe 10GbE is backward compatible, but is that OK for connecting to the iSCSI switches?  Not sure how much more 10GbE controllers are over the 1GbE ones on Nimble.  Upgrading on Dell servers is less than $1K.

       

      Having all of my eggs in the Nimble basket makes me nervous, won't lie.  I have never had an outage in the 7 years I've managed this environment, all with local storage.  Without replicating to another Nimble, what's the most cost effective way in having a Disaster Recovery Plan in the event of a complete Nimble outage?  I'll probably switch from Unitrends to Veeam for our backup solution and I've heard Veeam has a decent replication feature at no additional cost.  Does anyone use Veeam as their DR plan?

       

      In the event a controller fails or we failover on purpose, is an outage created?  Would SQL or Exchange have a problem?

       

      Is there a single point of failure on a Nimble unit?  I can't find any horror stories online with a unit going completely down.  Has anyone seen one go down?  What spare parts are good to keep onsite?

       

      I think that's about it for now.

       

      Appreciate any feedback!

        • Re: About to pull the trigger on a CS300
          Nick Dyer Navigator

          Hi Pokermunkee, welcome to the forum. I'll try to answer your questions as honestly as possible...

          pokermunkee wrote:

           

          My VAR is recommending we use 1GbE instead of 10GbE since our requirements don't justify the costs of 10GbE....with having multiple 1GbE ports trunked and VMware's MPIO, performance will be fine they tell me.  Is this sound advice or should I bump up to 10GbE?  Could I order my servers with 10GbE but connect to 1GbE ports to future proof?  I believe 10GbE is backward compatible, but is that OK for connecting to the iSCSI switches?  Not sure how much more 10GbE controllers are over the 1GbE ones on Nimble.  Upgrading on Dell servers is less than $1K.

          From my experience, 99% of the time environments don't require 10Gb networking, as performance is mostly random IO and thus more bottlenecked on IOPS rather than MB/sec (which is typically attributed to sequential IO). The most optimal networking configuration for a 1Gb Nimble unit would be to reserve at least one NIC for management per controller (two is best practice for resilience, but not needed if you have a single management switch) - this will leave you with four or five 1Gb ports available for iSCSI data. And using Nimble's MPIO toolsets for VMware and Windows this will ensure you get all the required bandwidth as possible, although you'll be limited to as many host-side NICs for throughput.

           

          Having said that, you have a few options available here. Firstly, any Nimble system you purchase (aside from a CS210) can be upgraded from 1Gb to 10Gb non-disruptively in the future should you need to make the step up - although the costs associated with doing so are slightly more than when purchasing upfront. Secondly, you also have the option to buy a Nimble system populated with 10Gb Base-T (RJ45 connectors) which speed down to 1Gb should they need to. This could be a good way for investment protection should you ever want to step up to 10Gb in the future. Also all Dell servers come with 10Gb Base-T on the motherboard as an option to make it even simpler.

          Having all of my eggs in the Nimble basket makes me nervous, won't lie.  I have never had an outage in the 7 years I've managed this environment, all with local storage.  Without replicating to another Nimble, what's the most cost effective way in having a Disaster Recovery Plan in the event of a complete Nimble outage?  I'll probably switch from Unitrends to Veeam for our backup solution and I've heard Veeam has a decent replication feature at no additional cost.  Does anyone use Veeam as their DR plan?

          I'll let other forum users comment on their direct experiences on Veeam; however with Nimble we have no single point of failure so anything that can go wrong (disks, SSDs, fans, controllers, PSUs) are automatically protected through resiliency, and using our Infosight data analytics engine we can proactively spot and divert these issues before they become a problem. This is also true with firmware and software on the arrays too.

           

          There certainly is a need for backups, and either Veeam or Unitrends are good options. With the latter there's a toolset called ReliableDR which actually integrates into Nimble, and can provide VMware SRM-like functionality for DR and failover... this does require Nimble replication so you will need two Nimble systems (don't have to be the same models though).

          In the event a controller fails or we failover on purpose, is an outage created?  Would SQL or Exchange have a problem?

          We've designed the Nimble platform to be more intelligent than standard active/active or active/passive solutions, and so as long as the Nimble MPIO toolkit is installed within VMware and Windows then no outages will be registered within VMware or any Windows apps such as SQL and Exchange. In fact during these types of failovers (even when we perform firmware upgrades) we'll typically register anywhere between 3-10 data packets dropped before connectivity resumes. In fact during any POC or installation we deliver this is a standard test.

          Is there a single point of failure on a Nimble unit?  I can't find any horror stories online with a unit going completely down.  Has anyone seen one go down?  What spare parts are good to keep onsite?

          Again, i'll leave this question for our end users to answer (my customer feedback has always been very positive). Things that can fail are drives, SSDs, fans, power supplies etc (as previously mentioned). If you have a 4 hour SLA for support then that should cover you for parts onsite within a short period of time... however it's always a good idea to get a drive and an SSD on a shelf spare should you want to replace the component quicker.

          • Re: About to pull the trigger on a CS300
            christoph.berthoud@vista.co Wayfarer

            G'day!

             

            Your story sounds exactly like mine. We just purchased and installed a CS300 (36TB + 3.2TB), 10G iSCSI and 4x R630 VMware hosts all at 10G.

             

            We already had a pair of Extreme Networks X670V switches (stacked @ 160Gbps) so we had plenty of 10G available. While Nick is correct, most traffic will probably suffice on 1Gb, where 10G truely shines is vMotion (& storage vMotion when migrating data onto the Nimble)

             

            Yes, 10G is backwards compatible to 1Gb so I'd highly recommend making the host NICs 10G from day 1. You'll have to talk to your sales rep about the cost for 10G vs 1Gb cards in the Nimble controller, again, if the cost is minimal I'd suggest doing this from day 1 (having left over parts from upgrades is just another pain). This will just leave to to purchase some 10G switches further down the track.

             

            We have CommVault already (this was doing our backups & replication). Had I not managed to convince the boss to buy CommVault, Veeam was our next choice

             

            Like you, I was very conscious of the 'whole eggs in one basket' concern but many, many, many hours of research convinced me I'd be fine. We're planning the purchase of another CS300 for a remote office (and replication site) now.

             

            Can't comment on the spare parts, haven't had anything go wrong yet

            • Re: About to pull the trigger on a CS300
              valdereth Adventurer

              pokermunkee wrote:

               

              My VAR is recommending we use 1GbE instead of 10GbE since our requirements don't justify the costs of 10GbE....with having multiple 1GbE ports trunked and VMware's MPIO, performance will be fine they tell me.  Is this sound advice or should I bump up to 10GbE?  Could I order my servers with 10GbE but connect to 1GbE ports to future proof?  I believe 10GbE is backward compatible, but is that OK for connecting to the iSCSI switches?  Not sure how much more 10GbE controllers are over the 1GbE ones on Nimble.  Upgrading on Dell servers is less than $1K.

               

              I've found a lot of customers are stepping up to 10GbE switching when purchasing Nimble arrays.  Depending on your vSphere licensing and Host config you could significantly cut down on the number of cables a Host requires with a dual nic card or two.  In the past I've dealt with mixed 1GbE and 10GbE iSCSI and I'll be honest, it always left me suspicious when looking into latency or networking issues, I really wish I would have had consistent bandwidth (all 1GbE or all 10GbE) just to rule that out.

               

              Having all of my eggs in the Nimble basket makes me nervous, won't lie.  I have never had an outage in the 7 years I've managed this environment, all with local storage.  Without replicating to another Nimble, what's the most cost effective way in having a Disaster Recovery Plan in the event of a complete Nimble outage?  I'll probably switch from Unitrends to Veeam for our backup solution and I've heard Veeam has a decent replication feature at no additional cost.  Does anyone use Veeam as their DR plan? 

               

              In the event a controller fails or we failover on purpose, is an outage created?  Would SQL or Exchange have a problem?

               

              Is there a single point of failure on a Nimble unit?  I can't find any horror stories online with a unit going completely down.  Has anyone seen one go down?  What spare parts are good to keep onsite?

               

              Unitrends and Veeam both have the ability to create NFS shared storage from their repositories so you can fire up protected VMs immediately without having to restore them to new storage first.  Obviously you sacrifice some performance in order to gain this ability but its a life saver when you find yourself in that situation.  I've used the feature more for testing and running through mock DR scenarios to ensure the recover-ability of my VMs - always a good idea to be prepared for a failure ahead of time!

               

              Unitrends and Veeam both have replication features built into the base products but Veeam has a WAN accelerator that can be utilized with their Enterprise Plus licensing.

               

              In my experience Nimble has provided the fastest controller failovers of all the arrays I've worked with.  vSphere, MSCS, SQL, and Exchange have all continued to function with no hiccup when failing over controllers during controller maintenance.  I think the key here is that you're following Nimble best practice guides and utilizing Nimble's integration kits when appropriate.

                • Re: Pulling the Trigger On A CS300
                  pokermunkee Newbie

                  Thanks everyone.

                   

                  I pulled the trigger and signed the PO today.  Went with recommended 1GbE to save around $10K. 

                   

                  This is what I have coming:

                  1x CS300 24TB Raw

                  3x R630 (8x1GbE, 2x 12 Core 2.5GHz, 256GB, dual SD card)

                  ESXi 5.5 Essentials Plus for 6 procs

                  2x ProCurve 2920-24G for iSCSI network

                   

                  I'm excited to get rid of all of my various ESXi hosts and have a fast, redundant, and compact system.  Can finally start migrating off Server 2003 and not have storage issues.

                    • Re: Pulling the Trigger On A CS300
                      valdereth Adventurer

                      Congrats!

                       

                      Sounds like you've got some projects to look forward to - good luck on those!

                       

                      In the meantime I'd recommend reading up on Nimble best practice guides since it sounds like you'll have some fresh Switches and Hosts to work with - your VAR or Nimble SE should be able to get some material together for ya. 

                  • Re: Pulling the Trigger On A CS300
                    Mark Harrison Adventurer

                    I've deployed 5 nimble arrays for organisation we provide IT support the first 2 arrays went in 2 years ago and have done 5 upgrades on these. Absolutely flawless and no outage whatsoever. 

                    10GBe has been excellent from our Commvault backup solution. We replicate and have CoRaid NAS appliance on 10 GBe also. Replicated snaps are kept for 7 days and CoRaid keeps 40 days of backups. Monthly to tape for long tern retention. Nimble support is OUTSTANDING. 

                    • Re: Pulling the Trigger On A CS300
                      pokermunkee Newbie

                      She's racked and running!  I'm still wiring up switches, cables, labeling, etc.  Taking a long time!

                       

                      I have a consultant coming out for 3 days at the first of Feb. to help me get everything setup and working the right way.

                       

                      The fans are spinning pretty fast (loud), is this normal?  Can't find anything in the web GUI about temperature/RPM.

                        • Re: Pulling the Trigger On A CS300
                          Mark Harrison Adventurer

                          The temperature is under MANAGE: ARRAYS: then select your array and you will find a Temp icon. Just select that and you will see the current temperature of the array controller or and extension shelves if you have any.  Hope this helps?


                          Kind Regards,

                                                 Mark.

                            • Re: Pulling the Trigger On A CS300
                              pokermunkee Newbie

                              Thanks, found it.  Everything is GREEN.

                               

                              Going to create a ticket.  Too loud for my taste, server room is across room from our offices and is overbearing.

                               

                              Server room temp is 70-72.

                              Fans are at 13K RPM!  My Dell servers are under 3K RPM.

                               

                              Array is saying 35C on mb and 22C on back plane.

                               

                              Anyone know how I can make it stop sounding like a jet engine?

                          • Re: Pulling the Trigger On A CS300
                            pokermunkee Newbie

                            Support had me update to the latest software (2.3).

                             

                            Fans are still at 13K RPM.

                             

                            What RPM do you guys see on the fans?  I would think they should be around 2-3K?

                              • Re: Pulling the Trigger On A CS300
                                valdereth Adventurer

                                Here are the stats from our Demo array (low average I/O), its a CS210 and in a room with an ambient temp of around 75F.  Our production array CS460G is right around the same temp/fan but more average IOPs and cooler temp in its data center:

                                 

                                Motherboard temp: 31C

                                Backplane temp:     40C

                                Fans all range between 10K-11K RPM

                              • Re: Pulling the Trigger On A CS300
                                pokermunkee Newbie

                                Thanks.  I've never had a device that made this much noise when temps were at 70F.  I suppose 10-13K RPM is normal then.  I'm still going to see if the RPMs can be lowered.  All of my servers have run at 2-3K RPM.  Not sure what other SANs run at, but don't recall a EqualLogic PS4100 being this noisy.

                                • Re: Pulling the Trigger On A CS300
                                  pokermunkee Newbie

                                  CS300 in it's new home.  Fans are normal at 10-12K, will get used to new noise   I did order some acoustic sound foam to mount on wall directly behind rack.  Will be interesting if that helps.

                                   

                                  10940534_10100949651757461_4194756511100547324_n.jpg?oh=9e455b7ae84d7a2f890532d0af3b3d23&oe=55275145

                                  • Re: About to pull the trigger on a CS300
                                    Ian Noble Wayfarer

                                    We required new switching for the project we were doing when we bought Nimble, so we got 10gb switching and upgraded our hosts to 10g at the same time.

                                     

                                    We got a really good deal on some nexus 9372TXs as they were our first nexus purchase, that they weren't much more than a good 1gb switch but give us good investment protection.