Bug 487346

Summary: ifdown bond0 causes a deadlock
Product: Red Hat Enterprise Linux 5 Reporter: Jiri Pirko <jpirko>
Component: kernelAssignee: Jiri Pirko <jpirko>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: agospoda, cward, dzickus, ivecera, rkhan, yasuhiro.ozone
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:43:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 526775, 533192    

Description Jiri Pirko 2009-02-25 15:33:21 UTC
Description of problem:
I have machine with three NICs, eth0 is normally connected to network and I'm running ssh over it. eth1 and eth2 are slaves for bonding interface bond0. When I try to ifdown bond0 (or if system does it while it reboots) system does to some kind of deadlock.

Version-Release number of selected component (if applicable):
2.6.18-128.1.1.el5 on i686 but I have the same results with 2.6.18-131, for upstream kernel (2.6.29-rc6 in my case) this issue do not occur and it works well.

How reproducible:
always on my machine - I had no luck on dell-pe2850-01.rhts.bos.redhat.com for example.

Steps to Reproduce:
I do following on my system:
[root@localhost ~]# ifdown bond0

Actual results:
After this I never got command line back, cannot ssh to the machine, cannot write on console, but it replies pings.

dmesg says:
bonding: bond0: Removing slave eth1
bonding: bond0: Warning: the permanent HWaddr of eth1 - 00:1F:1F:01:2F:22 - is still in use by bond0. Set the HWaddr of eth1 to a different address to avoid conflicts.
bonding: bond0: releasing active interface eth1
bonding: bond0: Removing slave eth2
bonding: bond0: releasing active interface eth2
---
Same messages in upstream kernel, where it's working.
ps uax says:
root      2814  0.3  0.6   4612  1300 pts/0    S+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-bond0
root      2907  0.3  0.6   4616  1300 pts/0    D+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-eth2
---
pid 2907 cannot be killed even with -9

Expected results:
Command line gets back, system is running normally.

Additional info:
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0 
BOOTPROTO=none 
ONBOOT=yes 
NETWORK=10.0.0.0 
NETMASK=255.255.255.0 
IPADDR=10.0.0.1 
USERCTL=no
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=00:E0:7D:C2:D9:38
ONBOOT=yes
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=none 
ONBOOT=yes 
MASTER=bond0 
SLAVE=yes 
USERCTL=no
HWADDR=00:1f:1f:01:2f:22
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
HWADDR=00:1f:1f:01:17:69
[root@localhost ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:17:69

It doesn't matter in which mode bond0 is.

Comment 1 Jiri Pirko 2009-06-09 14:05:05 UTC
I reproduced this issue on another machine. Also with Realtek 8139 NIC's.

Comment 2 Ivan Vecera 2009-06-09 14:41:11 UTC
It could to be 8139 specific, but I will try the same steps on my machine with one tg3 and two r8169 based cards.

Comment 3 Jiri Pirko 2009-06-15 10:41:30 UTC
Indeed, this issue is 8139too specific. We were digging into this and Michal Schmidt found the upstream patch which fixes the issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=83cbb4d2577174e27a91e63a47a2a27c3af50d4e

I've backported this into rhel5 and tested with positive results.

Comment 5 Yasuhiro Ozone 2009-09-15 05:00:08 UTC
I use 2.6.18-164.el5 on i686 but it doesn't work completely.
I have read changelog ,but this issue didn't fix.

8139too NIC driver version 0.9.27 is used in 2.6.18-164.el5.
But 8139too NIC driver version 0.9.27 4D0198C0EF38F3D25A3DCF7 is used in 2.6.9-89.0.7.EL.

The bonding interface bond0 always work in 2.6.9-89.0.7.EL completely.

Maybe this issue is 8139too specific and rhel5.

I hope this issue should be fixed in next kernel.

Comment 6 RHEL Program Management 2009-09-25 17:36:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Don Zickus 2009-10-06 19:36:33 UTC
in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 9 Chris Ward 2010-03-19 12:54:29 UTC
@Yasuhiro

Could you confirm whether or not the latest kernel available resolves this issue?

http://people.redhat.com/dzickus/el5

Thank you!

Comment 11 errata-xmlrpc 2010-03-30 07:43:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 12 Red Hat Bugzilla 2023-09-14 01:15:22 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days