ESXi 5 vmnic stops passing traffic - HP DL380p Gen8

Hi All,

This one is very similar to this post.

Really hoping the coummunity might be able to provide some direction on some networking issues I've been getting since the ESXi 5 upgrade for my site. Some details:

ESXi 5
8x HP DL380p Gen8 servers
HP PioLiant networking infrastructure

Basically since the upgrade (or I should say, fresh installation of ESXi 5) there's been 2 networking based issues that have occured.

1. Randomly a vmnic will lose connectivity to the physical network.

2. The physical network can no longer talk to the VM network through a vSwitch

The network configuration has 4 links going to 2 seperate switches (not aggregated). They tag some VLANs however ignore that element for now (and yes default VLANs are the same).

I'll start with issue 1 as I've been working through a support case with VMware that's got no where at this stage and can't progress until the issue occurs again. This morning I came to site and found that one of the ESX servers in my HA/DRS cluster was disconnected. A ping from my workstation suggested the machine was off the network. When I went to the host's console I restarted management services and found everything was OK again - with the exception that some VMs network connectivity was still down.

When I jumped into vSphere I found that one of the 4 vmnics could NOT see any observed IP range - the rest were OK. This is a single NIC too.

A then jumped into VMA and found the VMs that didn't seem to have networking connectivity were also on this vmnic. So to work-around, I placed this vmnic in the Not Used on the vSwitch and the inherited port groups of which those VMs belonged, then have connectivity. I'm willing to bet that the management interface was on that vmnic before the restart of services.

SO right now you're thinking, faulty NIC or switch configuration variation on that port? Perhaps, but what makes this odd is that this exact same issue occured on another server with another NIC (same models however). And that, I decided to do some network troubleshooting with mirrored ports. Some results:

Host A (physical) pings Host B (VM on vmnic3)

ARP broadcast gets fowarded from the switch to the ESX host however the VM doesn't get the request

Host B pings Host A

ARP broadcast leaves the vSwitch, out the uplink and makes it to Host A, Host A responds which I can see on the mirror port get sent back to vSwitch and it doesn't make it to VM

Host A pings Host B again

So now it has the physical address/IP mapping (ARP) so a directed ICMP echo is sent, it get sent to vSwitch but never hits VM

Host B pings Host A again

I had to add a static ARP entry to get the ICMP happening but the ICMP goes out to the physical device, a ICMP reply comes back to the host, but never reaches the VM.

Weird huh? VMware support said the same thing.

So I've only been testing this with the failed vmnic, so it's not going through other vmnics. I can talk across the vSwitch, but not out to the physical (or rather, the physical's responses aren't making their way back through the vSwitch.

~ # esxcfg-nics -l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

vmnic0 0000:03:00.00 tg3 Up 1000Mbps Full 2c:76:8a:51:e6:64 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic1 0000:03:00.01 tg3 Up 1000Mbps Full 2c:76:8a:51:e6:65 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic2 0000:03:00.02 tg3 Up 1000Mbps Full 2c:76:8a:51:e6:66 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

vmnic3 0000:03:00.03 tg3 Up 1000Mbps Full 2c:76:8a:51:e6:67 1500 Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet

~ # ethtool -i vmnic3

driver: tg3

version: 3.120h.v50.2

firmware-version: 5719-v1.24 NCSI v1.0.60.0

bus-info: 0000:03:00.3

I've checked the HCL a number of times here and the server and NIC hardware and firmware versions are supported. I did have to use the HP ESX image however, but I'm told that's still supported by VMware. I've also taken VLANs out of the mix here as. I've also swapped switches, cables, ports (both new ones, and already working ESX links) to rule out anything non-VMware.

In the end, I have to maintenance mode and restart the server to get the NIC working again. I can only assume, it's a VMware issue, the hardware is not supported (when said it is) or I've got really unlucky and there's a bad batch of Broadcom NICs getting about.

Now, as for item 2 that's a bit more intermittant. Basically vSphere administators find we can't manage VMs through the VMware console. We find that when this occurs, if we ping the ESX host's management interface we don't get a response. Other parts of the network seem OK as they have that ARP lookup in their cache. This is likely why HA remains OK.

We see that ARP request again makes it to the uplink switch, seems to get to the management vmk0 and the ARP reply goes back (I confirm this via tcpdump on the SSH console). From there I can't determine if it make it to the vSwitch but in any event, doesn't make it to the pinging workstation.

This goes on for a few minutes and then after a time, everything starts working OK. Usually triggered by another host making a connection to that host.

Any help here would be great! I've raised 2 cases with VMware but I'm not getting anywhere, I'd rather not have to wait for the issues to occur again. TO make matters worse we're looking at upgrading our control systems virtual infrastructure and calling contractors into support that process from overseas. I have to delay that until I can determine this issue.

Let me know if I've been too vague or more some specific information is needed.

Thanks muchly!

Message was edited by: Daza, topic update

ESXi 5 vmnic stops passing traffic - HP DL380p Gen8 - HCL fine

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Bureau of Internal Revenue: Regional Offices (Directory)

Adolescence A Stage of Growth and Change Class 7 Extra Questions and Answers...

Vocational Training Instructor (Carpenter) at States of Jersey

'My best friend looked possessed, then he stabbed me', teenager tells court

[Visual Studio] 開発ツール対応 OS 一覧

I want to a weather coin buyer genuine buyer r welcome

Karimnagar District Tahsildars Phone Numbers-Mobile Numbers Telangana-State

(get) Tej Dosa Letter 81 - How To Make An Extra $200-$500/Week (In 2025)

JACOB FORREST OGDEN Arrested by Clackamas County Sheriff's Office on Dec 30,...

$22.6m payout to workers fired under UNC govts

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

HP P2000 Storage Error Controller A Unknown Issue Resolution Request

The 10 Tennessee Cities With The Largest Black Population For 2021

Named and shamed: a round up of cases heard by Essex magistrates

FortiLink mode supported over a layer-3 network

ページングファイルサイズの推奨設定とその背景について

ZARIA CUMMINGS

Serial child killer David Threinen’s reign of terror

Philly Mobster Ronnie Turchi Took Last Ride In October ’99, Turned Up Trunk...