Bug 675664 - Kernel panic when restart network on vlan with bonding [rhel-5.6.z]
Summary: Kernel panic when restart network on vlan with bonding [rhel-5.6.z]
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Weibing Zhang
URL:
Whiteboard:
Keywords: Regression, ZStream
Depends On: 659594
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-07 08:22 UTC by RHEL Product and Program Management
Modified: 2018-11-14 14:46 UTC (History)
16 users (show)

(edit)
A bug was discovered in the bonding driver that occurred when using netpoll and changing, adding or removing slaves from a bond. The misuse of a per-cpu flag in the bonding driver during these operations at the wrong time could lead to the detection of an invalid state in the bonding driver, triggering kernel panic. With this update, the use of the aforementioned per-cpu flag has been corrected and a kernel panic no longer occurs.
Clone Of:
(edit)
Last Closed: 2011-04-12 18:21:15 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0429 normal SHIPPED_LIVE Important: kernel security and bug fix update 2011-04-12 18:19:57 UTC

Description RHEL Product and Program Management 2011-02-07 08:22:08 UTC
This bug has been copied from bug #659594 and has been proposed
to be backported to 5.6 z-stream (EUS).

Comment 5 Jiri Pirko 2011-03-02 07:53:59 UTC
in kernel-2.6.18-238.6.1.el5

linux-2.6-net-bonding-convert-netpoll-tx-blocking-to-a-counter.patch

Comment 10 Weibing Zhang 2011-03-31 03:59:45 UTC
with the reproducer from Liang Zheng

on kernel 2.6.18-232.el5, restarting network several times triggers a kernel panic.

[root@hp-bl460cg5-01 ~]# uname  -a 
Linux hp-bl460cg5-01.rhts.eng.bos.redhat.com 2.6.18-232.el5 #1 SMP Mon Nov 15 16:01:45 EST 2010 x86_64 x86_64 x86_64 GNU/Linux 
[root@hp-bl460cg5-01 ~]#  for i in {1..10}; do service network restart; done
......

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/bonding/bonding.h:135
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/net/bond0.10/broadcast
CPU 3 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc 8021q bonding ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table mperf dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tpm_tis tpm shpchp hpilo i5000_edac tpm_bios serio_raw bnx2 edac_mc pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2792, comm: bond0 Not tainted 2.6.18-232.el5 #1
RIP: 0010:[<ffffffff883bfe95>]  [<ffffffff883bfe95>] :bonding:bond_mii_monitor+0x41e/0x4c0
RSP: 0018:ffff810227d4be10  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff81022c384530 RCX: ffffffff80318f28
RDX: 0000000000000000 RSI: ffff81022de6d400 RDI: ffffffff80357a40
RBP: ffff81022c384500 R08: ffffffff80318f28 R09: 000000000000003f
R10: ffff810227d4bab0 R11: 0000000000000280 R12: ffff81022de6d400
R13: 0000000000000001 R14: 0000000000000002 R15: ffffffff883bfa77
FS:  0000000000000000(0000) GS:ffff81022ff26640(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bdc5c CR3: 0000000000201000 CR4: 00000000000006e0
Process bond0 (pid: 2792, threadinfo ffff810227d4a000, task ffff81022e523080)
Stack:  ffff81022c384878 ffff81022c384880 ffff81022c10c8c0 0000000000000282
 ffff81022c384500 ffffffff8004d7aa ffff810227d4be80 ffff81022c10c8c0
 ffffffff80049ff2 ffff81022b9e5d68 0000000000000282 ffff81022b9e5d58
Call Trace:
 [<ffffffff8004d7aa>] run_workqueue+0x99/0xf6
 [<ffffffff80049ff2>] worker_thread+0x0/0x122
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a0e2>] worker_thread+0xf0/0x122
 [<ffffffff8008e414>] default_wake_function+0x0/0xe
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032968>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003286a>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 aa 84 3c 88 c2 87 00 48 8d 5d 34 48 89 df e8 c9 4c 
RIP  [<ffffffff883bfe95>] :bonding:bond_mii_monitor+0x41e/0x4c0
 RSP <ffff810227d4be10>
 <0>Kernel panic - not syncing: Fatal exception

On kernel 238.5.1.el5 and kernel 238.9.1.el5:
After running "servie network restart" continuously for about 40 hours. No kernel panic was detected. 
So we think the bug is fixed. Set Verified.

Comment 11 errata-xmlrpc 2011-04-12 18:21:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0429.html

Comment 12 Martin Prpič 2011-04-14 10:15:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A bug was discovered in the bonding driver that occurred when using netpoll and changing, adding or removing slaves from a bond. The misuse of a per-cpu flag in the bonding driver during these operations at the wrong time could lead to the detection of an invalid state in the bonding driver, triggering kernel panic. With this update, the use of the aforementioned per-cpu flag has been corrected and a kernel panic no longer occurs.


Note You need to log in before you can comment on or make changes to this bug.