Bug 487346 - ifdown bond0 causes a deadlock
Summary: ifdown bond0 causes a deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: i386
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 526775 533192
TreeView+ depends on / blocked
 
Reported: 2009-02-25 15:33 UTC by Jiri Pirko
Modified: 2023-09-14 01:15 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:43:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Jiri Pirko 2009-02-25 15:33:21 UTC
Description of problem:
I have machine with three NICs, eth0 is normally connected to network and I'm running ssh over it. eth1 and eth2 are slaves for bonding interface bond0. When I try to ifdown bond0 (or if system does it while it reboots) system does to some kind of deadlock.

Version-Release number of selected component (if applicable):
2.6.18-128.1.1.el5 on i686 but I have the same results with 2.6.18-131, for upstream kernel (2.6.29-rc6 in my case) this issue do not occur and it works well.

How reproducible:
always on my machine - I had no luck on dell-pe2850-01.rhts.bos.redhat.com for example.

Steps to Reproduce:
I do following on my system:
[root@localhost ~]# ifdown bond0

Actual results:
After this I never got command line back, cannot ssh to the machine, cannot write on console, but it replies pings.

dmesg says:
bonding: bond0: Removing slave eth1
bonding: bond0: Warning: the permanent HWaddr of eth1 - 00:1F:1F:01:2F:22 - is still in use by bond0. Set the HWaddr of eth1 to a different address to avoid conflicts.
bonding: bond0: releasing active interface eth1
bonding: bond0: Removing slave eth2
bonding: bond0: releasing active interface eth2
---
Same messages in upstream kernel, where it's working.
ps uax says:
root      2814  0.3  0.6   4612  1300 pts/0    S+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-bond0
root      2907  0.3  0.6   4616  1300 pts/0    D+   16:13   0:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-eth2
---
pid 2907 cannot be killed even with -9

Expected results:
Command line gets back, system is running normally.

Additional info:
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0 
BOOTPROTO=none 
ONBOOT=yes 
NETWORK=10.0.0.0 
NETMASK=255.255.255.0 
IPADDR=10.0.0.1 
USERCTL=no
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=00:E0:7D:C2:D9:38
ONBOOT=yes
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=none 
ONBOOT=yes 
MASTER=bond0 
SLAVE=yes 
USERCTL=no
HWADDR=00:1f:1f:01:2f:22
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
HWADDR=00:1f:1f:01:17:69
[root@localhost ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:17:69

It doesn't matter in which mode bond0 is.

Comment 1 Jiri Pirko 2009-06-09 14:05:05 UTC
I reproduced this issue on another machine. Also with Realtek 8139 NIC's.

Comment 2 Ivan Vecera 2009-06-09 14:41:11 UTC
It could to be 8139 specific, but I will try the same steps on my machine with one tg3 and two r8169 based cards.

Comment 3 Jiri Pirko 2009-06-15 10:41:30 UTC
Indeed, this issue is 8139too specific. We were digging into this and Michal Schmidt found the upstream patch which fixes the issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=83cbb4d2577174e27a91e63a47a2a27c3af50d4e

I've backported this into rhel5 and tested with positive results.

Comment 5 Yasuhiro Ozone 2009-09-15 05:00:08 UTC
I use 2.6.18-164.el5 on i686 but it doesn't work completely.
I have read changelog ,but this issue didn't fix.

8139too NIC driver version 0.9.27 is used in 2.6.18-164.el5.
But 8139too NIC driver version 0.9.27 4D0198C0EF38F3D25A3DCF7 is used in 2.6.9-89.0.7.EL.

The bonding interface bond0 always work in 2.6.9-89.0.7.EL completely.

Maybe this issue is 8139too specific and rhel5.

I hope this issue should be fixed in next kernel.

Comment 6 RHEL Program Management 2009-09-25 17:36:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Don Zickus 2009-10-06 19:36:33 UTC
in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 9 Chris Ward 2010-03-19 12:54:29 UTC
@Yasuhiro

Could you confirm whether or not the latest kernel available resolves this issue?

http://people.redhat.com/dzickus/el5

Thank you!

Comment 11 errata-xmlrpc 2010-03-30 07:43:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 12 Red Hat Bugzilla 2023-09-14 01:15:22 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.