Bug 130535

Summary: Bring down bonded ethernet network interface causes oops.
Product: Red Hat Enterprise Linux 4 Reporter: Sean Plaice <splaice>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: davej, djuran, splaice, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-12 13:24:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sysreport none

Description Sean Plaice 2004-08-21 08:40:13 UTC
Description of problem:
kernel oop's when taking down a bonded (bonding.ko) ethernet interface.


Version-Release number of selected component (if applicable):
kernel-smp-2.6.5-1.358
kernel-smp-2.6.7-1.494.2.2
kernel-smp-2.6.8-1.521

How reproducible:
Everytime.

Steps to Reproduce:
1. Configure bonded interface in ifcfg-ethX config.
2. Issue ifup bond0
3. Issue ifdown bond0 (initiates the oops) sometimes fatal, sometimes
recoverable on some kernel versions.
  
Actual results:
ifdown initiates then oopses, I can provide the oops text gathered
from syslog.

Expected results:
[root@corefw02 root]# ifup bond0
Enslaving eth2 to bond0
Enslaving eth3 to bond0
[root@corefw02 root]# ifdown bond0
[root@corefw02 root]#

Additional info:
Please see the messages posted on the bonding sourceforge mailing
list.
http://sourceforge.net/mailarchive/forum.php?thread_id=4678026&forum_id=2094

I have confirmed that the patch included in the post by Jay Vosburgh
works when applied to the 2.6.8-1.521 kernel sources.

Comment 1 Sean Plaice 2004-08-21 08:45:58 UTC
This is an excerpt from the systems syslog that contains the oops
message. The system is left unusable after the oops with the 2.6.8
dist kernel. 2.6.7 had mixed results of being usable/unusable after
the oops. I can provide the oops output from the other kernel version
if required.

Aug 20 17:57:01 corefw02 kernel: Debug: sleeping function called from
invalid context at net/core/dev.c:3130
Aug 20 17:57:01 corefw02 kernel: in_atomic():1[expected: 0],
irqs_disabled():0
Aug 20 17:57:01 corefw02 kernel:  [<0211e605>] __might_sleep+0x82/0x8c
Aug 20 17:57:01 corefw02 kernel:  [<0228334e>] synchronize_net+0x11/0x1b
Aug 20 17:57:01 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:01 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:01 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:01 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:01 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:01 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:01 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel: bad: scheduling while atomic!
Aug 20 17:57:01 corefw02 kernel:  [<022d5fe9>] schedule+0x2d/0x6e9
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel:  [<021312f4>]
__kernel_text_address+0x18/0x23
Aug 20 17:57:01 corefw02 kernel:  [<0210613d>]
print_context_stack+0x1a/0x4f
Aug 20 17:57:01 corefw02 kernel:  [<021061cb>] show_trace+0x59/0x72
Aug 20 17:57:01 corefw02 kernel:  [<02106276>] dump_stack+0x11/0x13
Aug 20 17:57:01 corefw02 kernel:  [<0211e605>] __might_sleep+0x82/0x8c
Aug 20 17:57:01 corefw02 kernel:  [<02283353>] synchronize_net+0x16/0x1b
Aug 20 17:57:01 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:01 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:01 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:01 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:01 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:01 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:01 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:01 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0
Aug 20 17:57:01 corefw02 kernel: bad: scheduling while atomic!
Aug 20 17:57:01 corefw02 kernel:  [<022d5fe9>] schedule+0x2d/0x6e9
Aug 20 17:57:01 corefw02 kernel:  [<02121438>]
call_console_drivers+0xbe/0xe3
Aug 20 17:57:01 corefw02 kernel:  [<02121717>] printk+0x1e5/0x21b
Aug 20 17:57:01 corefw02 kernel:  [<022d677e>]
wait_for_completion+0xd9/0x155
Aug 20 17:57:01 corefw02 kernel:  [<021061cb>] show_trace+0x59/0x72
Aug 20 17:57:01 corefw02 kernel:  [<0211d07a>]
default_wake_function+0x0/0xc
Aug 20 17:57:01 corefw02 kernel:  [<022d63bb>] schedule+0x3ff/0x6e9
Aug 20 17:57:48 corefw02 kernel:  [<0211d07a>]
default_wake_function+0x0/0xc
Aug 20 17:57:48 corefw02 kernel:  [<02130d5d>]
synchronize_kernel+0x41/0x46
Aug 20 17:57:48 corefw02 kernel:  [<02130d14>] wakeme_after_rcu+0x0/0x8
Aug 20 17:57:48 corefw02 kernel:  [<22967e4f>] bond_close+0x4d/0xc8
[bonding]
Aug 20 17:57:48 corefw02 kernel:  [<022813aa>] dev_close+0x5c/0x7c
Aug 20 17:57:48 corefw02 kernel:  [<022825d7>] dev_change_flags+0x48/0xee
Aug 20 17:57:48 corefw02 kernel:  [<022ba4a9>] devinet_ioctl+0x255/0x4ce
Aug 20 17:57:48 corefw02 kernel:  [<022bc270>] inet_ioctl+0x47/0x73
Aug 20 17:57:48 corefw02 kernel:  [<02279ece>] sock_ioctl+0x2be/0x317
Aug 20 17:57:48 corefw02 kernel:  [<022794ca>] sock_map_fd+0x33/0x38
Aug 20 17:57:48 corefw02 kernel:  [<0216d078>] sys_ioctl+0x23d/0x2a0

Comment 2 David Juran 2005-03-30 13:49:21 UTC
This problem hits kernel-2.6.9-5.0.3.EL in RHEL4 as well. Do note that you load
the bonding module with the parameter
mode=4 for this to happen. At least the problem does not occur with mode=0.


Comment 3 Dave Jones 2005-04-16 05:07:59 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.


Comment 4 David Juran 2005-04-16 10:02:37 UTC
This bug _is_ still reproducable in RHEL4. Could someone please reopen this issue?

Comment 5 John W. Linville 2005-04-28 17:34:39 UTC
I am unable to recreate this problem.  I have a bond w/ two interfaces in mode
4, using the kernels here:

   http://people.redhat.com/linville/kernels/rhel4/

Could you please try those kernels and post the results?  If you are still
seeing the problem, the please attach the results of running "sysreport".  Thanks!

Comment 6 David Juran 2005-05-11 09:18:57 UTC
Created attachment 114238 [details]
sysreport

Sorry it took so long, but here is a sysreport. 
The steps I did to produce the oops were

1. modprobe bonding mode=4
2. ifconfig bond0 up
3. ifocnfig bond0 down

Comment 7 David Juran 2005-05-11 09:21:50 UTC
Seems like I still have to write sometihng here to get rid of the 'needinfo'
state )-:

Comment 8 John W. Linville 2005-05-11 12:19:43 UTC
David, the sysreport seems to indicate that you are using the stock kernel.  
Did you try to reproduce with the kernels mentioned in comment 5? 

Comment 9 David Juran 2005-05-12 09:28:19 UTC
Sorry... One of these days I guess I'll learn to read as well... Anyway, I can
no longer reproduce the problem with kernel-2.6.9-6.46.EL.jwltest.24 (-:

Comment 10 John W. Linville 2005-05-12 13:24:56 UTC
Cool!  It sounds like this will be fixed when U1 hits the streets.  If it's 
alright with you, I'm going to close this as CURRENTRELEASE (even though U1 
isn't quite out).  If you still have the problem after going to U1, then 
please reopen this bug.  Thanks!