This last week, I visited a company who's particular problem was understanding how to improve the performance of a particular application. One core application had been virtualised and wasn't performing particularly well for them, in fact performance seemed to have steadily degraded over time. Their old legacy storage system had come bundled with software to monitor their storage arrays and report on the virtual infrastructure, but it was complex to setup and understand, and though deployed it was not readily used. The customer's initial decision was to throw infrastructure at his problem, more memory, faster CPU's, 10GbE Network and fast storage (which is what brought us to his attention!). I spent a while understanding his challenges and quickly decided to show him 'Per VM monitoring' feature that Nimble's InfoSight team have been working on and are currently rolling out across our customer install base.
Those who follow the blog-o-sphere may have come across an Infosight feature called 'Per VM Monitoring' back at Storage Field Day 6, the video of that session and it's introduction is here. There is also a great overview and whiteboard from Rod Bagg (Nimble VP of Support) and George Crump (Storage Switzerland Analyst) available here. Our InfoSight Engineering team have been hard at work refining that concept and taking the early customer feedback and evolving it's capability and I'd thought I'd share this with you here.
How do I enable this feature?
Firstly, in order to run Per-VM Monitoring, you need to purchase - nothing (it's part of the standard support package)!
Next, you need to install - Nothing! (just register your vCentre credentials with the Nimble array - in fact if you have the vSphere plugin running, you have already done this)!
Finally, you need to setup - you guessed it - NOTHING! Assuming your array is sending home data and is running 2.2.6 (which is true for the majority of Nimble customers).
The capability is based on an opt-in mechanism, so you will see more details how to enable it when you login into Infosight after the next major update! (we are in the final stages of testing with a subset of customers so you can expect to see this feature become available in the next major update of InfoSight).
Note: Please do not call Nimble Support to ask them to enable this, they are unable to do so. You will see the instructions on how to opt-in as soon as the feature is available.
So what does it look like?
Nimble customers who are already familiar with InfoSight will notice a few improvements in the recent update, along with changes to some menu options. When enabled, underneath the Manage tab you will see Virtual Environment option:
Selecting this will take you to the registered vCentre plugin (the datasource where we are collecting virtual information). Below we can see the two vCentre instances we are polling in one of Nimble's internal environment. Expanding the vCentre tabs shows the Datacentres, ESX Nodes and VM's in each of the vCentre servers. This can be navigated to individually and reported on or you can select the higher level objects and run reports on those:
The above view shows the selected the Datacenter HQ, with the right hand view showing the performance of all the hosts in that data centre. We can also see there are more reports available.
- Host Activity - report on the busiest hosts during the last period
- Top VM's - show the busiest VM's in the Datacentre over the last 24 hours by IOPS and latency
- Inactive VM's - shows us which VM's have been dormant and therefore candidates to clean to recoup
I'm going to focus on the Datastore Treemap view, as it was the capability that particular impressed the company that I spoke about at the beginning of this blog.
Clicking on the Datastore Treemap view, displays a tree/heat map of all the datastores in that Datacentre:
Each square in the screenshot above denotes a datastore. The bigger the square, the more IOPS that datastore has seen over the last 24 hours; a smaller the square means fewer IOPS. The colour also represents latency - a blue colour means all VM's have been showing low latency, while a red square means a VM has been experiencing abnormal latency and therefore ought to be investigated. Hovering the mouse over the squares reveals the underlying figures.
In this case, we'll click on the red square to see what VM is in trouble. Clicking on the red datastore opens up that datastore to show the VM's:
We now get the see the same view but form the VM's that are hosted on that datastore. Hovering over shows us the IOPS for each VM and average latency:
Looks like this particular VM is in trouble. So let's drill down and look at that VM in more detail. Clicking the VM now gives us a historical view of that VM's performance over time. Mousing over the charts shows us the VM's performance with regards to latency and also what was resource was contributing to that latency, Host, Network or Storage:
We can see from the above graph that storage latency has been fine and in fact host is the major culprit.
You can also click the icon in the bottom left. Infosight will then ask you to pick a time form the graph and will then analyse the performance of the VM and neighbouring VM's so that competing workloads (historically) can be identified and managed:
As you can see this is all based on the historic knowledge of the data that Infosight has polled and allows Infosight to provide visibility on how to optimise the infrastructure and plan for future changes. I'm really looking forward to see how we develop this in the future to do further event co-realation, similar to some of the analytics we have done in Storage reports to show the impact of software upgrades and infrastructure changes.
You can also imagine how excited the customer I spoke to was to see this functionality... the term he used was 'Game changer'
I've included a video demonstration of the above below: