11 Replies Latest reply: Aug 12, 2016 3:22 PM by Patrick Gavin RSS

    Nimble storage monitoring with check_mk or snmp

    Manuel Rampp Newbie

      Hello together,

       

      How can I monitor the nimble storage with check_mk?

      Does exist an plugin for check_mk or is snmp only possible?

      Can I query in snmp more than interfaces, for example volumes?

       

      Best regards,

      Manuel

        • Re: Nimble storage monitoring with check_mk or snmp
          Patrick Gavin Newbie

          I would love to see an answer to this.

           

          InfoSight is okay, but I would prefer to get that (and more) telemetry into check_mk where I can really work some magic on it.

           

          Probably the only course is to write a custom check_mk snmp plugin. I haven't had time to look at the MIBs.

           

          -P

          • Re: Nimble storage monitoring with check_mk or snmp
            Rick van Vliet Adventurer

            Hi Manual,

             

            You can check all volumes on performance (IOps and stuff) and the overall performance of the Nimble itself.

             

            I use Zabbix and I have never worked with check_mk, but with zabbix I can check all volumes, this oid: .1.3.6.1.4.1.37447.1.2.1.3.

            In Zabbix you can create a discovery rule, which uses the result of this oid as an index to collect all other data, I'm not sure if check_mk can do that to, otherwise you can get all your volumes by running an snmpwalk and figure out which oid is which volume.

             

            Hope it helps.

            Rick.

            • Re: Nimble storage monitoring with check_mk or snmp
              Rohan Fallon Newbie

              Hi Manuel, This is the custom check that I did to monitor two nimble units I have with check_mk I have included a perfometer as well, hope that helps

                
                  #
                  # Nimble Volume Check (Supports inventory and performance data)
                  #
                  # Author: Rohan Fallon 
                  #
                  # FileName: nimble
                  # Location: ~/local/share/check_mk/checks
                  # Usage:  cmk --checks  nimble -II your_nimble
                  #
                  #
                   
                  nimble_default_values = (95.0, 98.0)
                   
                   
                  def inventory_nimble(info):
                     # Debug: lets see how the data we get looks like
                     # print info
                     # return []
                     inventory = []
                     for vol, state, connections, volsize, volusage in info:
                         if state == "1":
                            inventory.append( (vol, nimble_default_values) )
                     return inventory
                   
                  def check_nimble(item, params, info):
                     # unpack check parameters
                     warn, crit = params
                   
                     for vol, state, connections, volsize, volusage in info:
                        if vol == item:
                           if state == "1":
                              size_gb = int(volsize) / 1024.0
                              usage_gb = int(volusage) / 1024.0
                              usage_percent = float(volusage) / float(volsize) * 100.0
                              perfdata = [ ( "percent", usage_percent, warn, crit ) ]
                              if usage_percent > crit:
                                 return (2, "Critical - Volume online - iSCSI Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{:6.2f}".format(usage_percent)+ "%", perfdata )
                              elif usage_percent > warn:
                                 return (1, "Warn - Volume online - iSCSI Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{:6.2f}".format(usage_percent)+ "%" , perfdata)
                              else:
                                 return (0, "OK - Volume online - iSCSI Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{:6.2f}".format(usage_percent)+ "%", perfdata)
                           else:
                              return (2, "CRITICAL - Volume %s offline " % vol)
                     return (3, "UNKNOWN - Volume not found")
                   
                   
                  check_info["nimble"] = {
                      'check_function':            check_nimble,
                      'inventory_function':        inventory_nimble,
                      'service_description':       'Nimble Volume %s',
                      'has_perfdata':              True,
                  }
                   
                  snmp_info["nimble"] = (".1.3.6.1.4.1.37447.1.2.1" , [ "3", "10", "11", "4", "6" ] )
              
              

              Code for Perfometer

              #
              # Perf-o-meter for Nimble Volume Check
              #
              # Author: Rohan Fallon 
              #
              # FileName: nimble.py
              # Location: :~/local/share/check_mk/web/plugins/perfometer
              # Note:  "Service check command" is the key when registering in the perfometers dictionary
              #
              #
               
              def perfometer_nimble(row, check_command, perf_data):
               
                  used = float(perf_data[0][1])
                  warn = float(perf_data[0][3])
                  crit = float(perf_data[0][4])
                  if used > crit:
                      color = "#ff0000"
                  elif used > warn:
                      color = "#ffff00"
                  else:
                      color = "#00ff00"
               
                  return "%.0f%%" % used, perfometer_linear(used, color)
               
              perfometers['check_mk-nimble'] = perfometer_nimble
              
              
              
              
              
              • Re: Nimble storage monitoring with check_mk or snmp
                Patrick Gavin Newbie

                Hi Rohan-

                 

                I made some changes to your check_mk plugin that add more per volume performance metrics.

                 

                It seems to work, but I think there may be some kind of undocumented unit mismatch because my numbers appear to be off by at least a factor of 1000.

                 

                My question to the nimble developers is...

                 

                Is it possible that this MIB entry-

                 

                volStatTimeEpochSeconds OBJECT-TYPE

                    SYNTAX      Counter64

                    MAX-ACCESS  read-only

                    STATUS      current

                    DESCRIPTION

                    "Time at which the sample was taken, measured in seconds since UNIX epoch."

                    ::= { volEntry 12 }

                 

                should actually be milliseconds instead of seconds?

                 

                I was thinking maybe byte counters might actually be kbyte counters, but the rates are low for IOPS as well.

                 

                It's also possible that I screwed something up and can't see it.

                 

                Here's the code...

                 

                -------------------------------------------------------------------------------------

                 

                #

                # Nimble Volume Check (Supports inventory and performance data)

                #

                # Author: Rohan Fallon and Patrick Gavin

                #

                # FileName: nimble_vol

                # Location: ~/local/share/check_mk/checks

                # Usage:  cmk --checks  nimble -II rgtnimbleprod

                #

                #

                    

                nimble_default_values = (95.0, 98.0)

                    

                    

                def inventory_nimble_vol(info):

                   # Debug: lets see how the data we get looks like

                   # print info

                   # return []

                   for vol, volsize, volusage, state, connections, stat_time, read_ops, read_bytes, write_ops, write_bytes in info:

                       if state == "1":

                          yield (vol, nimble_default_values)

                    

                def check_nimble_vol(item, params, info):

                   # unpack check parameters

                   warn, crit = params

                 

                   for vol, volsize, volusage, state, connections, stat_time, read_ops, read_bytes, write_ops, write_bytes in info:

                      if vol == item:

                         if state == "1":

                            size_gb = int(volsize) / 1024.0

                            usage_gb = int(volusage) / 1024.0

                            usage_percent = float(volusage) / float(volsize) * 100.0

                            read_iops = get_rate("read_ops.%s" % item, int(stat_time), int(read_ops))

                            write_iops = get_rate("write_ops.%s" % item, int(stat_time), int(write_ops))

                            read_bw = get_rate("read_bytes.%s" % item, int(stat_time), int(read_bytes))

                            write_bw = get_rate("write_bytes.%s" % item, int(stat_time), int(write_bytes))

                            perfdata = [ ("percent", usage_percent, warn, crit ),

                                        ("connections", connections, 0, 0),

                                        ("read_iops", read_iops, 0, 0),

                                        ("write_iops", read_iops, 0, 0),

                                        ("read_bandwidth", read_bw, 0, 0),

                                        ("write_bandwidth", write_bw, 0, 0),

                                     ]

                            if usage_percent > crit:

                               return (2, "CRITICAL - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%", perfdata )

                            elif usage_percent > warn:

                               return (1, "WARN - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%" , perfdata)

                            else:

                               return (0, "OK - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%", perfdata)

                         else:

                            return (2, "CRITICAL - Volume %s offline " % vol)

                   return (3, "UNKNOWN - Volume not found")

                 

                    

                check_info["nimble_vol"] = {

                    'check_function'       : check_nimble_vol,

                    'inventory_function'   : inventory_nimble_vol,

                    'service_description'  : 'Nimble Volume %s',

                    'has_perfdata'         : True,

                    'snmp_info'            : (".1.3.6.1.4.1.37447.1.2.1" , [ "3", "4", "6", "10", "11", "12", "13", "15", "34", "36" ] )

                }

                  • Re: Nimble storage monitoring with check_mk or snmp
                    Patrick Gavin Newbie

                    Here's the code that assumes milliseconds instead of seconds. The values it produces are consistent.

                     

                    #

                    # Nimble Volume Check (Supports inventory and performance data)

                    #

                    # Author: Rohan Fallon and Patrick Gavin

                    #

                    # FileName: nimble_vol

                    # Location: ~/local/share/check_mk/checks

                    # Usage:  cmk --checks  nimble_vol -II rgtnimbleprod

                    #

                    #

                        

                    nimble_default_values = (95.0, 98.0)

                        

                        

                    def inventory_nimble_vol(info):

                       # Debug: lets see how the data we get looks like

                       # print info

                       # return []

                       for vol, volsize, volusage, state, connections, stat_time, read_ops, read_bytes, write_ops, write_bytes in info:

                           if state == "1":

                              yield (vol, nimble_default_values)

                        

                    def check_nimble_vol(item, params, info):

                       # unpack check parameters

                       warn, crit = params

                     

                       for vol, volsize, volusage, state, connections, stat_time, read_ops, read_bytes, write_ops, write_bytes in info:

                          if vol == item:

                             if state == "1":

                                stat_secs = int(stat_time) / 1000

                                size_gb = int(volsize) / 1024.0

                                usage_gb = int(volusage) / 1024.0

                                usage_percent = float(volusage) / float(volsize) * 100.0

                                read_iops = get_rate("read_ops.%s" % item, stat_secs, int(read_ops))

                                write_iops = get_rate("write_ops.%s" % item, stat_secs, int(write_ops))

                                read_bw = get_rate("read_bytes.%s" % item, stat_secs, int(read_bytes))

                                write_bw = get_rate("write_bytes.%s" % item, stat_secs, int(write_bytes))

                                perfdata = [ ("percent", usage_percent, warn, crit ),

                                            ("connections", connections, 0, 0),

                                            ("read_iops", read_iops, 0, 0),

                                            ("write_iops", read_iops, 0, 0),

                                            ("read_bandwidth", read_bw, 0, 0),

                                            ("write_bandwidth", write_bw, 0, 0),

                                         ]

                                if usage_percent > crit:

                                   return (2, "CRITICAL - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%", perfdata )

                                elif usage_percent > warn:

                                   return (1, "WARN - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%" , perfdata)

                                else:

                                   return (0, "OK - Volume online - Connections = " + connections  + " - Size = " + str(size_gb) + "Gb - Usage = " + "{0:6.2f}".format(usage_percent)+ "%", perfdata)

                             else:

                                return (2, "CRITICAL - Volume %s offline " % vol)

                       return (3, "UNKNOWN - Volume not found")

                     

                        

                    check_info["nimble_vol"] = {

                        'check_function'       : check_nimble_vol,

                        'inventory_function'   : inventory_nimble_vol,

                        'service_description'  : 'Nimble Volume %s',

                        'has_perfdata'         : True,

                        'snmp_info'            : (".1.3.6.1.4.1.37447.1.2.1" , [ "3", "4", "6", "10", "11", "12", "13", "15", "34", "36" ] )

                    }