Bug 789158

Summary: KVM Network Crash
Product: Red Hat Enterprise Linux 6 Reporter: Marc Mercer <mmercer>
Component: kernelAssignee: Michael S. Tsirkin <mst>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.2CC: acathrow, areis, dallan, juzhang, mst
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-15 14:03:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marc Mercer 2012-02-09 23:58:58 UTC
Physical Connections:
EL6 host ETH0 => Juniper EX2200 ge-0/0/24
EL6 host ETH1 => Juniper EX2200 ge-0/0/26

Logical Configuration:
802.3ad AE2 aggregated interface on ex2200

Configuration from server:

[root@eng-vhost-02 network-scripts]# cat ifcfg-b*
DEVICE="bond0"
BOOTPROTO="none"
ONBOOT="yes"
BRIDGE="br0"

DEVICE="br0"
TYPE="Bridge"
BOOTPROTO="static"
ONBOOT="yes"
NETWORK=10.1.20.0
PREFIX=26
IPADDR=10.1.20.12
GATEWAY="10.1.20.1"
USERCTL=no

[root@eng-vhost-02 network-scripts]# cat ifcfg-eth*


Packages
DEVICE="eth0"
HWADDR="00:25:90:58:17:36"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
USERCTL="no"
SLAVE="yes"
MASTER="bond0"

DEVICE="eth1"
HWADDR=00:25:90:58:17:37
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
USERCTL="no"
SLAVE="yes"
MASTER="bond0"

Steps to reproduce:
virt-install -n eng-bcouch-00 -r 2048 --disk pool=vg_vmimg,size=80 --network=network:default --nographics -l ftp://192.168.122.1/pub/EL/5/os/ -x console=ttyS0,115200
Get to the screen where it asks if you want to configure eth0:  Select no

HOSTS networking "crashes".  It does not *terminate*, but you cannot do anything for 5+ minutes while everything catches up.  5 minutes later everything resumes and you get
back to where you left off.

Package summary:
qemu-kvm-0.12.1.2-2.209.el6.x86_64
libvirt-0.9.4-23.el6.x86_64

Comment 4 Marc Mercer 2012-04-30 18:30:55 UTC
Surprisingly, I had thought this was specifically related to bonded interfaces under the circumstances, but I did experience the same issue a couple days ago without bonded interfaces.  Unfortunately, since these are active machines, it is difficult to take time to troubleshoot, but it does hang for 5 or so minutes under any given occurrence of the bug, whatever it may be.

Comment 5 Michael S. Tsirkin 2012-04-30 19:03:27 UTC
best without bond.
when next you see this happen, tun tcpdump on the physical
interface, ping outside and see what does 'crash' mean
practically.

Comment 6 Marc Mercer 2012-05-17 18:02:19 UTC
Well, it will please you to know i found the actual root cause.

Chalk this up to another of supermicro's ASPM issues.

In random searching, I came across bugs listed against the e1000e driver in fedora, centos, rhel, et cetera.... (don't have the links handy atm), that would cause networking to completely fail.  In reading through several of them, I noticed an observable pattern, a majority of them were based on supermicro, and revolved around the e1000e.... futhermore, continued reading lead to the discovery of many people finding issues with aspm on supermicro systems.

I do not have the exact links I found, and they were not against the specific chipset I have, but the symptoms were similar.

In disabling aspm on the system at boot time in grub.conf, the issues has been resolved, along with numerous other issues that the system suffered.

Not sure how you want to resolve/close this out, or handle notes or anything.

Let me know if you need anything on this.

Comment 7 Marc Mercer 2012-05-23 22:39:25 UTC
Anything else needed or just want me to close this ticket ?

Comment 8 RHEL Program Management 2012-07-10 08:17:40 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 9 RHEL Program Management 2012-07-10 23:32:53 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.