3 Replies Latest reply: Jul 10, 2014 9:40 AM by julez RSS

    MS Server 2012 R2 VMs problems with e1000 and e1000e NICs

    Sean Patterson Wayfarer

      Summary: In VMware Guest running Windows Server 2012, or Windows Server 2012 R2, there is a problem running the Standard e1000e and e1000 drivers, resulting in loss of connections and possible corruption.


      After over a month of random problems I have come to conclude the issue is affecting only our MS Server 2012 R2 (and non R2) VMs.It may affect the Windows 2008 R2 (and non R2) Servers as well, just not yet to a critical point. Please feel free to add any information you may have.

       

      All Server 2012 VMs were built using the default e1000e NICs. They all functioned well, until we received the notice to immediately stop using the e1000e drivers as data corruption could occur. suggested fix was to switch to the e1000 or VMXNET3 driver. Given the fact we  are a 1GB environment, I decided the e1000 would be the better driver. Also, some others informed of performance being better with the e1000 driver over VMXNET3 driver. With the change in place we ran for a week and maybe longer before we started noticing drives missing on the File Server (Server 2012 R2). A restart brings the drives back, as well as a disable, re-enable of the iSCSI NICs. This would happen randomly at all times day or night. Many searches later returned little about the errors, as it was not really producing any events, other than a basic network no longer available. After frustration, I decide to return to the e1000e driver on the ISCSI NICs. Problem solved, somewhat. We then started having the standard NIC (still on e1000) disappear and the server becoming unreachable. So, I believe the issue is the e1000 driver itself. I also have notice some less often, and shorter outages on the e1000e driver. Which lead me to the next test. I have now swapped all NICs to the VMXNET3 drivers on the Server 2012 R2 machines. No errors have been reported and no random (quick, or non quick) disconnects/reconnects. Drives seem stable, NICs are staying up.

       

      So I guess now I need to know of any issues running a VMXNET3 driver in a non 10GB environment. No speed issues seem present as others have warned about. Having a clean Event Viewer is the goal here, and so far the VMXNET3 driver has been the best option yet.

       

      ^^^^^^ This last bit is the Question part! ^^^^^^

        • Re: MS Server 2012 R2 VMs problems with e1000 and e1000e NICs
          Alex Goltz Adventurer

          In my opinion, you are on the right track with the VMXNET3 adapters.  We have a similar environment with both 2012 R2 and 2012 non-R2.  We started off our builds with the VMXNET3's for both the regular VM NICs, and the VM's iSCSI adapters.  The only reason we did that was because our VMware sales engineer suggested it, regardless of what SAN we purchased.
          As far as performance goes, I'm pretty sure it negligible.  If you're running your iSCSI distributed switches in VMware with Jumbo frames turned on, the VM's iSCSI adapter(s) have jumbo frames ON (often missed), the physical switches allow jumbo frames, etc.,  you should be good to go.

          I've read the same forum posts you have regarding the 'better' performance with e1000.  However, I haven't seen any benchmarks with these statements.  It might have been true in past VMXNET versions.
          I can see how you're trying to squeeze every ounce of performance out of your 1Gb setup.

           

          Take this with a grain of salt:

          If you're locked in a 1Gb network with your Nimble, switching, and hosts, I would just make sure you leverage Jumbo frames across the board, use some sort of multipathing, and use direct iSCSI (custom block sizes, custom NTFS cluster sizes, and VM adapter jumbo frames) for your SQL volumes.

          • Re: MS Server 2012 R2 VMs problems with e1000 and e1000e NICs
            Bryan Beulin Adventurer

            Good Afternoon Sean,

                I'm posting this here as well as it is in relation to another thread open regarding e1000 NICs

             

            This thread has been open for several months & I'm just checking to see if you have received an answer.  I'm not sure if you've seen the alert bulletin on InfoSight pertaining to the e1000 NICs in VMware, but this has been an issue we've tried to communicate to our customers as much & often as possible.  Here is a link to that notification: https://infosight.nimblestorage.com/cgi-bin/viewPDFFile?alert=NimbleStorageFieldAlertExt0003.pdf

            as well as a link to the corresponding VMware KB Article:

            http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=d

            isplayKC&externalId=2058692

             

             

            Please let us know if this assists in resolving your issues & please don't hesitate to reach out to our Nimble Storage Support for any additional questions as well.

             

            If you could let us know if your question has been answered by one of the responses, it would be greatly appreciated.

            Thank you for your time,

            Bryan Beulin

            • Re: MS Server 2012 R2 VMs problems with e1000 and e1000e NICs
              julez Adventurer

              Sean I almost always use VMXNET3 unless specifically told not to for some reason (which I actually don't ever recall that happening now).

              We've been using VMXNET3 for all Network and iSCSI connections on our 5.1 environment since we've put the CS260 in.  Our colocation facility is still a 1Gig controller.  The HQ is a CS260G, but is bottlenecked by our servers right now at 1Gig (not something I'd recommend).

               

              I've yet to see any issue with either of our arrays in 5.1.

              In a previous life I had purchased a CS220 for a 4.1 environment I built, and again always used VMXNET3.

               

              But then I also use PVSCSI for nearly every secondary drive with some level of moderate to high I/O. (secondary meaning not the boot drive)