Bug 487346 - ifdown bond0 causes a deadlock [NEEDINFO]
ifdown bond0 causes a deadlock
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
i386 Linux
low Severity low
: rc
: ---
Assigned To: Jiri Pirko
Red Hat Kernel QE team
:
Depends On:
Blocks: 533192 526775
  Show dependency treegraph
 
Reported: 2009-02-25 10:33 EST by Jiri Pirko
Modified: 2015-05-04 21:16 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 03:43:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
cward: needinfo? (yasuhiro.ozone)


Attachments (Terms of Use)

  None (edit)
Description Jiri Pirko 2009-02-25 10:33:21 EST
Description of problem:
I have machine with three NICs, eth0 is normally connected to network and I'm running ssh over it. eth1 and eth2 are slaves for bonding interface bond0. When I try to ifdown bond0 (or if system does it while it reboots) system does to some kind of deadlock.

Version-Release number of selected component (if applicable):
2.6.18-128.1.1.el5 on i686 but I have the same results with 2.6.18-131, for upstream kernel (2.6.29-rc6 in my case) this issue do not occur and it works well.

How reproducible:
always on my machine - I had no luck on dell-pe2850-01.rhts.bos.redhat.com for example.

Steps to Reproduce:
I do following on my system:
[root@localhost ~]# ifdown bond0

Actual results:
After this I never got command line back, cannot ssh to the machine, cannot write on console, but it replies pings.

dmesg says:
bonding: bond0: Removing slave eth1
bonding: bond0: Warning: the permanent HWaddr of eth1 - 00:1F:1F:01:2F:22 - is still in use by bond0. Set the HWaddr of eth1 to a different address to avoid conflicts.
bonding: bond0: releasing active interface eth1
bonding: bond0: Removing slave eth2
bonding: bond0: releasing active interface eth2
---
Same messages in upstream kernel, where it's working.
ps uax says:
root      2814  0.3  0.6   4612  1300 pts/0    S+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-bond0
root      2907  0.3  0.6   4616  1300 pts/0    D+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-eth2
---
pid 2907 cannot be killed even with -9

Expected results:
Command line gets back, system is running normally.

Additional info:
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0 
BOOTPROTO=none 
ONBOOT=yes 
NETWORK=10.0.0.0 
NETMASK=255.255.255.0 
IPADDR=10.0.0.1 
USERCTL=no
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=00:E0:7D:C2:D9:38
ONBOOT=yes
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=none 
ONBOOT=yes 
MASTER=bond0 
SLAVE=yes 
USERCTL=no
HWADDR=00:1f:1f:01:2f:22
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
HWADDR=00:1f:1f:01:17:69
[root@localhost ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:17:69

It doesn't matter in which mode bond0 is.
Comment 1 Jiri Pirko 2009-06-09 10:05:05 EDT
I reproduced this issue on another machine. Also with Realtek 8139 NIC's.
Comment 2 Ivan Vecera 2009-06-09 10:41:11 EDT
It could to be 8139 specific, but I will try the same steps on my machine with one tg3 and two r8169 based cards.
Comment 3 Jiri Pirko 2009-06-15 06:41:30 EDT
Indeed, this issue is 8139too specific. We were digging into this and Michal Schmidt found the upstream patch which fixes the issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=83cbb4d2577174e27a91e63a47a2a27c3af50d4e

I've backported this into rhel5 and tested with positive results.
Comment 5 Yasuhiro Ozone 2009-09-15 01:00:08 EDT
I use 2.6.18-164.el5 on i686 but it doesn't work completely.
I have read changelog ,but this issue didn't fix.

8139too NIC driver version 0.9.27 is used in 2.6.18-164.el5.
But 8139too NIC driver version 0.9.27 4D0198C0EF38F3D25A3DCF7 is used in 2.6.9-89.0.7.EL.

The bonding interface bond0 always work in 2.6.9-89.0.7.EL completely.

Maybe this issue is 8139too specific and rhel5.

I hope this issue should be fixed in next kernel.
Comment 6 RHEL Product and Program Management 2009-09-25 13:36:44 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Don Zickus 2009-10-06 15:36:33 EDT
in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 9 Chris Ward 2010-03-19 08:54:29 EDT
@Yasuhiro

Could you confirm whether or not the latest kernel available resolves this issue?

http://people.redhat.com/dzickus/el5

Thank you!
Comment 11 errata-xmlrpc 2010-03-30 03:43:56 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.