Hi, I am hoping someone may be able to help with an issue we are having in our production environment.
We have recently upgraded our servers to 5 new Cisco UCS 220 M4 servers. These are connected to our Nimble CS300 via two Brocade ICX6610-24 Switches configured in a stack. Each host is running the latest Cisco ISO of VMware. During installation everything worked fine, but now that we have a production load on the environment we are seeing the iSCSI Nic's disconnecting. The timing is random and is mostly one at a time. On occasions though both Nic's disconnect removing access to the storage.
It appears to be a VMWare issue as when the disconnect occurs both the Brocade and the UCS see there adapters as still up. VMware though are adamant that it is a Brocade issue.
I currently have open cases with Cisco, VMware and Brocade but really don't appear to be getting anywhere. I have also opened a case with Nimble but they are not seeing any issues on the storage at all.
Today I have tried downgrading the Cisco Firmware. This also hasn't made any difference.
The other piece of information that may be relevant is the problem becomes more frequent when we put the system under load. We are using Veeam as a our backup solution and when this runs at night the disconnects are more frequent.
Below is our complete setup. What I am looking for is anyone who has had a similar problem or any advice on how to continue troubleshooting.
Servers (5 in Total):
BMC Version Info: 2.0(3i)
BIOS Version: C220M220.127.116.11d.0
Product Serial Number : [FCH1850V0A4]
VMware ESXi 5.5.0-2068190-custom-cisco.18.104.22.168
+ Updated one to ESXi 5.5.0-2068190-custom-cisco.22.214.171.124 to see if this made any improvement
Nimble CS300 storage
NOS Version: 126.96.36.199-22959-opt
Storage Switches (2 in Total):
Boot Image: 10.1.00T7f5
10 Gbe Connections (10 in Total):
Cisco – Twinax Cables
Product # SFP-10G-AOC3M
Part # 10-2847-01
Description: Cisco UCS Dual Port 10Gb Ethernet and 4G Fibre Channel CNA SFP+
Description: Cisco UCS 1227 Dual Port 10Gb Ethernet and 4G Fibre Channel CNA SFP+ MLOM
Version: Version 188.8.131.52, Build: 1331820, Interface: 9.2 Built on: Jun 12 2014
UCS support matrix: Checked UCS HCL, FNIC drivers seems supported with C220 M4/ESXi 5.5 U with VIC 1225/1227
The Cisco UCS servers are connected to the Brocade Switch using the 10 GB Twinax Cables. There are two of these plugged into Slot one on each server. One is connected to one of the Switch One in the Brocade Stack. The other into Switch Two.
As part of the trouble shooting process we have also tried the following:
Brocade 10G Active 3M FCoE – Part # 58-1000027-01 Cable
A Cisco SFP-10G-SR (10-2415-03) with Brocade 57-0000075-01.
These connections also experience the drop outs.
Have applied VMware KB Article 1030265 (Interrupt Mapping) as recommended by Cisco
Open Case with Cisco is SR 634639537
Have spoken to VMware and they believe the issues is in the Brocade
Open case with VMware is Support Request # 15657062404
Have now opened case with Brocade. Case#1438624
setup.JPG 71.9 K