Bug 130535 - Bring down bonded ethernet network interface causes oops.
Bring down bonded ethernet network interface causes oops.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-21 04:40 EDT by Sean Plaice
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-12 09:24:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysreport (501.02 KB, application/x-bzip2)
2005-05-11 05:18 EDT, David Juran
no flags Details

  None (edit)
Description Sean Plaice 2004-08-21 04:40:13 EDT
Description of problem:
kernel oop's when taking down a bonded (bonding.ko) ethernet interface.


Version-Release number of selected component (if applicable):
kernel-smp-2.6.5-1.358
kernel-smp-2.6.7-1.494.2.2
kernel-smp-2.6.8-1.521

How reproducible:
Everytime.

Steps to Reproduce:
1. Configure bonded interface in ifcfg-ethX config.
2. Issue ifup bond0
3. Issue ifdown bond0 (initiates the oops) sometimes fatal, sometimes
recoverable on some kernel versions.
  
Actual results:
ifdown initiates then oopses, I can provide the oops text gathered
from syslog.

Expected results:
[root@corefw02 root]# ifup bond0
Enslaving eth2 to bond0
Enslaving eth3 to bond0
[root@corefw02 root]# ifdown bond0
[root@corefw02 root]#

Additional info:
Please see the messages posted on the bonding sourceforge mailing
list.
http://sourceforge.net/mailarchive/forum.php?thread_id=4678026&forum_id=2094

I have confirmed that the patch included in the post by Jay Vosburgh
works when applied to the 2.6.8-1.521 kernel sources.
Comment 1 Sean Plaice 2004-08-21 04:45:58 EDT
This is an excerpt from the systems syslog that contains the oops
message. The system is left unusable after the oops with the 2.6.8
dist kernel. 2.6.7 had mixed results of being usable/unusable after
the oops. I can provide the oops output from the other kernel version
if required.

Aug 20 17:57:01 corefw02 kernel: Debug: sleeping function called from
invalid context at net/core/dev.c:3130
Aug 20 17:57:01 corefw02 kernel: in_atomic():1[expected: 0],
irqs_disabled():0
Aug 20 17:57:01 corefw02 kernel:  [<0211e605>] __might_sleep+0x82/0x8c
Aug 20 17:57:01 corefw02 kernel:  [<0228334e>] synchronize_net+0x11/0x1b
Aug 20 17:57:01 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:01 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:01 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:01 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:01 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:01 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:01 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel: bad: scheduling while atomic!
Aug 20 17:57:01 corefw02 kernel:  [<022d5fe9>] schedule+0x2d/0x6e9
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel:  [<021312f4>]
__kernel_text_address+0x18/0x23
Aug 20 17:57:01 corefw02 kernel:  [<0210613d>]
print_context_stack+0x1a/0x4f
Aug 20 17:57:01 corefw02 kernel:  [<021061cb>] show_trace+0x59/0x72
Aug 20 17:57:01 corefw02 kernel:  [<02106276>] dump_stack+0x11/0x13
Aug 20 17:57:01 corefw02 kernel:  [<0211e605>] __might_sleep+0x82/0x8c
Aug 20 17:57:01 corefw02 kernel:  [<02283353>] synchronize_net+0x16/0x1b
Aug 20 17:57:01 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:01 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:01 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:01 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:01 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:01 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:01 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel: bad: scheduling while atomic!
Aug 20 17:57:01 corefw02 kernel:  [<022d5fe9>] schedule+0x2d/0x6e9
Aug 20 17:57:01 corefw02 kernel:  [<02121438>]
call_console_drivers+0xbe/0xe3
Aug 20 17:57:01 corefw02 kernel:  [<02121717>] printk+0x1e5/0x21b
Aug 20 17:57:01 corefw02 kernel:  [<022d677e>]
wait_for_completion+0xd9/0x155
Aug 20 17:57:01 corefw02 kernel:  [<021061cb>] show_trace+0x59/0x72
Aug 20 17:57:01 corefw02 kernel:  [<0211d07a>]
default_wake_function+0x0/0xc
Aug 20 17:57:01 corefw02 kernel:  [<022d63bb>] schedule+0x3ff/0x6e9
Aug 20 17:57:48 corefw02 kernel:  [<0211d07a>]
default_wake_function+0x0/0xc
Aug 20 17:57:48 corefw02 kernel:  [<02130d5d>]
synchronize_kernel+0x41/0x46
Aug 20 17:57:48 corefw02 kernel:  [<02130d14>] wakeme_after_rcu+0x0/0x8
Aug 20 17:57:48 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:48 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:48 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:48 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:48 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:48 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:48 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:48 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Comment 2 David Juran 2005-03-30 08:49:21 EST
This problem hits kernel-2.6.9-5.0.3.EL in RHEL4 as well. Do note that you load
the bonding module with the parameter
mode=4 for this to happen. At least the problem does not occur with mode=0.
Comment 3 Dave Jones 2005-04-16 01:07:59 EDT
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.
Comment 4 David Juran 2005-04-16 06:02:37 EDT
This bug _is_ still reproducable in RHEL4. Could someone please reopen this issue?
Comment 5 John W. Linville 2005-04-28 13:34:39 EDT
I am unable to recreate this problem.  I have a bond w/ two interfaces in mode
4, using the kernels here:

   http://people.redhat.com/linville/kernels/rhel4/

Could you please try those kernels and post the results?  If you are still
seeing the problem, the please attach the results of running "sysreport".  Thanks!
Comment 6 David Juran 2005-05-11 05:18:57 EDT
Created attachment 114238 [details]
sysreport

Sorry it took so long, but here is a sysreport. 
The steps I did to produce the oops were

1. modprobe bonding mode=4
2. ifconfig bond0 up
3. ifocnfig bond0 down
Comment 7 David Juran 2005-05-11 05:21:50 EDT
Seems like I still have to write sometihng here to get rid of the 'needinfo'
state )-:
Comment 8 John W. Linville 2005-05-11 08:19:43 EDT
David, the sysreport seems to indicate that you are using the stock kernel.  
Did you try to reproduce with the kernels mentioned in comment 5? 
Comment 9 David Juran 2005-05-12 05:28:19 EDT
Sorry... One of these days I guess I'll learn to read as well... Anyway, I can
no longer reproduce the problem with kernel-2.6.9-6.46.EL.jwltest.24 (-:
Comment 10 John W. Linville 2005-05-12 09:24:56 EDT
Cool!  It sounds like this will be fixed when U1 hits the streets.  If it's 
alright with you, I'm going to close this as CURRENTRELEASE (even though U1 
isn't quite out).  If you still have the problem after going to U1, then 
please reopen this bug.  Thanks! 

Note You need to log in before you can comment on or make changes to this bug.