We had an odd issue come up today in our VSphere installation. We have some HP C7000 chassis with a number of BL465 blades (mostly G7 and G8 models) with 128 or 256GB of RAM. All are running the HP customized image of VSphere 5.1 update 2. All drivers and firmware are up to date. Over the last few months we've been switching our systems over to using 10Gb uplinks into our Cisco Nexus 5000 infrastructure. We would migrate all VM's off of a chassis, rebuild the chassis, blades, and VMware configuration, do some verification, then migrate the VM's back to the chassis.
Each host has four 10Gb uplinks. Two are dedicated to the VM networks and are on a virtual switch with the various port groups/VLAN. Each uplink goes to a different core switch. Management and VMotion are on a separate vSwitch, each with it's own 10Gb port, with the vmotion and management nic being "standby" for the other. We're not using distributed switches at this time.
Monday afternoon we got reports that one of the Windows 2008R2 VM's was showing poor network connectivity. Pinging it seemed to be OK, but trying to connect to it with RDP or other management tools was erratic. Connecting to the console through VCenter we saw that the Windows logs were showing warnings related to network communications, talking to the domain controllers, etc. Nothing showed any excessive load on task manager or in VMWare's performance tabs.
After checking a few things (and not finding anything), we tried migrating the VM to another blade. Immediately everything on the VM started working correctly. Thinking it was a blade issue, we checked some other VM's on that host. A couple showed similar issues, but most seemed OK.
We moved the original VM back to the "problem" blade, and it continued to work OK. Migrating the other VM's showing issues "fixed" them, even if they were moved back to the original blade.
We checked some VM's on other blades in the same chassis, with the same configuration, built the around the same time, but they all seemed OK.
This chassis and blades have the same configuration we set up on another chassis a few months ago; that one has been working fine. The other blades in the same chassis seem to be OK.
I've put the blade in maintenance mode until I can figure it out (or just rebuild it), but I can't figure out what would be causing this type of behavior.
Any help would be appreciated.