6 Replies Latest reply: Nov 30, 2015 12:32 AM by Walter Van Hoolst RSS

    Nimble Storage Throughput Calculation

    An Tran Thanh Linh Wayfarer

      I have a question how calculate CS-Series throughput exactly. I mean the throughput including all layers of storage (Backend disk, network throughput)

      For example : how much throughput we have with CS300 with 6x1GbE ports? (in case of 64KB large block size and in case of 4KB small block size )

      With NL7k2 + CASL technology --> CS300 has 30.000 IOPS with 4KB block size so that it will have 120.000 KB/s ~ 117 MB/s throughput.

      And its network has 6x1GbE ports so that it will have total 6GbE ~ 500MB/s throughput.

      At the result, we will just have the lowest throughput is around 117 MB/s. Is it correct?

      If we upgrade network to 2x16Gb FC ports, how much throughput will it increase ?


        • Re: Nimble Storage Throughput Calculation
          Mitch Gram Adventurer

          The calculations for throughput are really universal regardless of storage platform or protocol.    iops x block size = throughput.    You are correctly pointing out that at 30,000 IOPs using a 4K block size you will drive roughly 1 Gbps / 120 MBps.   If you increase the block size of your test to 8K, the throughput will roughly double based on the formula. 


          I think your questions might be more about what the limiting factor is.  Assuming that is true, lets discuss it further.  First, you need to know that you cannot simply change an CS300 initially configured with iSCSI interfaces to be a FC array just by changing the cards in the PCI interface slots.  At initialization time in production of the array, the OS determined the available hardware in the array and set the system to optimize the use of the interfaces that were enumerated.   So, if you want to run iSCSI, order the array with Ethernet interfaces and if your infrastructure is already FC, then order the array as FC.    That said, when I have tested two arrays side by side, one of each protocol type, had identical compute to run the test sets from,  and where I had confirmed that the switches rather Ethernet for FC that served as the storage fabric was not a limiting factor to performance, I observed that for small 4K/8K block IO on iSCSI performed better than 4K/8K block IO on FC.   When you get to larger 64K block and above sequential IO performed better on FC.    In any case the delta between the two is was normally less than 10%   


          Since most IT infrastructure environments have a mix of IO in all different blocks sizes with both random and sequential patterns, I am of the opinion that selection of protocol in the big scheme of things does not matter.


          Unlike other storage technologies where the limiting factor is the performance of the individual spinning disks, Nimble's limiting factor is really the amount of CPU cores you have available to do the work.   As such, if you are looking to get more than 120 MBps of 4K block IO, what you need to look at doing is upgrading the controllers to CS500 or CS700 series, or clustering multiple CS arrays together.   Changing the protocol will not make any real appreciable difference.

            • Re: Nimble Storage Throughput Calculation
              An Tran Thanh Linh Wayfarer

              I mean not compare between FC and iSCSI protocol.

              I just wanna talked about even if the throughput 16Gb FC larger than 1Gb iSCSI or 10Gb iSCSI larger than 1Gb iSCSI, we just only get 120MBps of 4KB block IO.

              So that what reason do we need choose 16Gb FC or 10Gb iSCSI on each CS array that makes more expensive than 1Gb iSCSI, or why Nimble build 16Gb FC on CS-Series but not 8Gb/4Gb FC, why we need increase the throughput every year (maybe Nimble will make the Infiniband with 40Gb in the future) although all of them are limited by IOPS afterward (120MBps of 4KB in my example)

              Please explain it more details for me,

              Thanks a lot

                • Re: Nimble Storage Throughput Calculation
                  Mitch Gram Adventurer

                  A CS300 with 6 x 1Gb ports has the capability of delivering more than 30,000 IOPs at 4K block size.    I have personally tested the CS300 model doing in excess of 60,000 4K random read IOPs.  


                  If you deployed the array in a "flat network" configuration where a 6 x 1Gb interfaces were in the same subnet, you could achieve roughly 120 MBps x 6 or 720 MBps maximum throughput.   The limiting factor on IOP performance from a Nimble array is primarily CPU.   The CS300, having 2 x 4 core processors per controller, you can expect 30K-60K IOPs at 4K depending on your mix of reads, writes, random, and sequential data patterns.  So, since bandwidth = IOPs x block size, you can expect 120MBps to 240MBps on 4K block.   Thus, if you only had  interfaces assigned to data, this would be more than enough to serve the bandwidth.   If you did the same test at 8K block, you bandwidth realized would double to 240 MBps to 480MBps which could be accommodated by provisioning 4 x 1Gb paths between the storage and the hosts.   .   If you have all 6 interfaces provisioned to provide iSCSI traffic, you will max out at roughly 720 MBps if your block size is large enough.  To exceed 720MBps, you would need to switch to 10Gb iSCSI interfaces.   If the interfaces are large enough, you will find that the CS300 will top out at roughly 1100 MBps.  Therefore, if you are going to run larger block IO, you  would be well advised to provision the array with 10Gb or FC interfaces.


                  Nimble FC interfaces are 16Gb by default, but they auto-negotiate to 8Gb and 4Gb based on the FC switch infrastructure they connect to.   So, 2 FC ports connected to a 8Gb FC switch would provide more bandwidth that the CS300 can theoretically deliver.


                  If your goal is to get more than the 30-60K IOPs noted above, you would want to look at the CS500 with 16 cores per controller or the CS700 with 20 cores per controller.   These arrays have the capability of nearly 3X and 4X the performance of the CS300 respectively.   For example, the CS500 running a mixed read, write, sequential and random workload of 4K block can drive in excess of 90,000 IOPs.   At 4K, this would be about 360 MBps, so it would still fit into a data pipe consisting of 3 x 1Gb interfaces.

                    • Re: Nimble Storage Throughput Calculation
                      An Tran Thanh Linh Wayfarer

                      Mitch Gram I read some document which they talked that if we increase block larger, the IOPS will reduce as well.

                      E.g: CS300 will have 30.000 IOPS with 4K block size --> 120MBps

                      But CS300 should be only 15.000 IOPS with 8K block size, or 7.500 IOPS with 16KB block size so that the throughput is the same above?

                      Why can you calculate the throughtput will reach 240MBps with 8K block size or 480MBps with 16K block size?

                      Please let me know


                        • Re: Nimble Storage Throughput Calculation
                          Mitch Gram Adventurer

                          Wayfarer host, I observed at 4K block size, about 36,000 IOPs and ~ 145 MBps.   At 8K block size, still about 36,000 IOPs and 290 MBps.   At 16K block size, about 24,000 IOPs and 388 MBps.    In the first two tests, the limit of IOP is likely the host driving the test or the CS300 controllers as we have not consumed all the available pipe in the network fabric.  In the third test, the IOPs dropped off likely because we approached the pipe size.   


                          Basically, calculating max IOPs in an environment with Nimble Storage is most often a function of the controller model, as long as your application and its hosts are capable of driving it hard enough.   If you hit the maximum bandwidth of the connectivity between the storage array and the application host, the max IOPs you can achieve will be bandwidth / block size.


                          I hope this helps.

                          • Re: Nimble Storage Throughput Calculation
                            Walter Van Hoolst Adventurer

                            Hello An Tran Thanh Linh,


                            Your assumption that at 8KB a CS300 can only do 15.000 IOPS and at 16KB only 7.500 is not 100% Correct. It is not 100% linear and overhead will be lower with bigger block sizes.

                            When your CS300 has 10Gb or 16Gb connectivity we can push it over 1000MB/sec on reads and 875MB/sec on writes.