4 Replies Latest reply: Apr 21, 2016 7:12 PM by Tien Lee RSS

    Latency metrics

    Andrew Rolt Wayfarer

      In the Nimble GUI and also Infosight, do the average read/write latency numbers include performance data from all volumes, including Non-Cached?  I'm seeing some very high average read times (ie: 30+ ms) and I'm finding it more work to track down where the high latency is coming from.  I'm assuming that it is looking at the whole array, which means its averages don't really mean much when troubleshooting high latency like that? I know that my cache is also a bit under-provisioned, so I'm looking at upgrading that.

      Overall, my IOPs, Throughput, and CPU are very low (CS210), but I have higher average latency overall than I think I should be achieving.

       

      ie: I have an Exchange 2007 VM that has 3 volumes/datastores attached - os, db and logs.  Logs is uncached using the "exchange log" policy.  Infosight tells me that the average latency on this VM is 9ms.  But, is that solely because of the uncached logs volume? I have to drill down into each volume to determine that.

       

      I'm not sure why I'm writing this post. Perhaps to ask if it's possible to only see cached volume latency performance? We all know that uncached volumes are going to have high latency.  Those volumes are throwing off the performance data that we call care about and making the overviews and averages much less useful.

        • Re: Latency metrics
          Scout

          WHat OS are you on?  IN the 2.2 code release Nimble change the method of caching to catch issues like this and increase performance.

            • Re: Latency metrics
              Andrew Rolt Wayfarer

              I'm on the latest OS; 2.3.9.2-303808-opt

                • Re: Latency metrics
                  Tien Lee Wayfarer

                  Hi Andrew,

                   

                  I believe there may be multiple problems you are experiencing, please engage support to assist you.  The array is experiencing shortage of cache, and additionally possible network/host side issue. 

                   

                  VMVision was able to show that vmware believes the top latent vm Datastore in question is very latent consistently.  If you look at the same volume performance on the array for the same period, you may see array isn't performing great due to cache shortage but should not account for the consistent latent behavior on that datastore.  If this is true, then there is additional cause for latency than just the storage array.  I saw increment of flow control packets as well as retransmit on the array nightly autosupport, which makes me think that there is other factors involved (switch or host).  Over webex or a support engagement, we should be able to nail this down. 

                   

                  We are also eager to hear any feedback and understand how end users are using infosight and features they wish to see.  All data in infosight VMvision section are counters from VMware hosts, we enhance it further and present the graph with our array numbers.  You are absolutely correct that the array itself report latency across the board, all ops are timed, counted and calculated, I would not worry about log volume latency since majority should be write and not read, when it is read, it is sequential.  This should not skew the overall latency to give an inaccurate reflection. 

                   

                  I hope this helps. 

                   

                  Tien

              • Re: Latency metrics
                Jonathan Disley Wayfarer

                Hi Andrew,

                 

                You mention Datastore volumes in your question, so I'm hoping that means you're running a VMware environment?  If so, have you enabled the 'VMVision' feature within InfoSight (via Administration->VMVision)?  Using VMVision you can pull up quick reports to show the top 10 VMs by latency over the last 24 hours, or show a 'heatmap' that displays all your VM's by IO activity and latency.  You can then select an individual VM to drill down on and see the latency for the datastore, and even for each individual VMDK that's attached to the VM.  Nick Dyer posted a 'how to' guide a while ago on enabling VMVision that might be of use if you haven't already got it turned on: Nimble Storage VMVision Per-VM Monitoring is LIVE!

                 

                Hope this helps!

                Jonathan