Description of problem: I have machine with three NICs, eth0 is normally connected to network and I'm running ssh over it. eth1 and eth2 are slaves for bonding interface bond0. When I try to ifdown bond0 (or if system does it while it reboots) system does to some kind of deadlock. Version-Release number of selected component (if applicable): 2.6.18-128.1.1.el5 on i686 but I have the same results with 2.6.18-131, for upstream kernel (2.6.29-rc6 in my case) this issue do not occur and it works well. How reproducible: always on my machine - I had no luck on dell-pe2850-01.rhts.bos.redhat.com for example. Steps to Reproduce: I do following on my system: [root@localhost ~]# ifdown bond0 Actual results: After this I never got command line back, cannot ssh to the machine, cannot write on console, but it replies pings. dmesg says: bonding: bond0: Removing slave eth1 bonding: bond0: Warning: the permanent HWaddr of eth1 - 00:1F:1F:01:2F:22 - is still in use by bond0. Set the HWaddr of eth1 to a different address to avoid conflicts. bonding: bond0: releasing active interface eth1 bonding: bond0: Removing slave eth2 bonding: bond0: releasing active interface eth2 --- Same messages in upstream kernel, where it's working. ps uax says: root 2814 0.3 0.6 4612 1300 pts/0 S+ 16:13 0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-bond0 root 2907 0.3 0.6 4616 1300 pts/0 D+ 16:13 0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-eth2 --- pid 2907 cannot be killed even with -9 Expected results: Command line gets back, system is running normally. Additional info: [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 BOOTPROTO=none ONBOOT=yes NETWORK=10.0.0.0 NETMASK=255.255.255.0 IPADDR=10.0.0.1 USERCTL=no [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ DEVICE=eth0 BOOTPROTO=dhcp HWADDR=00:E0:7D:C2:D9:38 ONBOOT=yes [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ DEVICE=eth1 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no HWADDR=00:1f:1f:01:2f:22 [root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2 # Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ DEVICE=eth2 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no HWADDR=00:1f:1f:01:17:69 [root@localhost ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:1f:1f:01:2f:22 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:1f:1f:01:17:69 It doesn't matter in which mode bond0 is.
I reproduced this issue on another machine. Also with Realtek 8139 NIC's.
It could to be 8139 specific, but I will try the same steps on my machine with one tg3 and two r8169 based cards.
Indeed, this issue is 8139too specific. We were digging into this and Michal Schmidt found the upstream patch which fixes the issue: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=83cbb4d2577174e27a91e63a47a2a27c3af50d4e I've backported this into rhel5 and tested with positive results.
I use 2.6.18-164.el5 on i686 but it doesn't work completely. I have read changelog ,but this issue didn't fix. 8139too NIC driver version 0.9.27 is used in 2.6.18-164.el5. But 8139too NIC driver version 0.9.27 4D0198C0EF38F3D25A3DCF7 is used in 2.6.9-89.0.7.EL. The bonding interface bond0 always work in 2.6.9-89.0.7.EL completely. Maybe this issue is 8139too specific and rhel5. I hope this issue should be fixed in next kernel.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-168.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
@Yasuhiro Could you confirm whether or not the latest kernel available resolves this issue? http://people.redhat.com/dzickus/el5 Thank you!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days