Bug 675664
| Summary: | Kernel panic when restart network on vlan with bonding [rhel-5.6.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | RHEL Program Management <pm-rhel> |
| Component: | kernel | Assignee: | Jiri Pirko <jpirko> |
| Status: | CLOSED ERRATA | QA Contact: | Weibing Zhang <atzhang> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 5.6 | CC: | anton, atzhang, cww, dhoward, fleitner, jolsa, jpirko, kzhang, lzheng, martin.wilck, mfuruta, nhorman, pm-eus, rkhan, rramacha, tgraf |
| Target Milestone: | rc | Keywords: | Regression, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.18-238.6.1.el5 | Doc Type: | Bug Fix |
| Doc Text: |
A bug was discovered in the bonding driver that occurred when using netpoll and changing, adding or removing slaves from a bond. The misuse of a per-cpu flag in the bonding driver during these operations at the wrong time could lead to the detection of an invalid state in the bonding driver, triggering kernel panic. With this update, the use of the aforementioned per-cpu flag has been corrected and a kernel panic no longer occurs.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-04-12 18:21:15 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 659594 | ||
| Bug Blocks: | |||
|
Description
RHEL Program Management
2011-02-07 08:22:08 UTC
in kernel-2.6.18-238.6.1.el5 linux-2.6-net-bonding-convert-netpoll-tx-blocking-to-a-counter.patch with the reproducer from Liang Zheng
on kernel 2.6.18-232.el5, restarting network several times triggers a kernel panic.
[root@hp-bl460cg5-01 ~]# uname -a
Linux hp-bl460cg5-01.rhts.eng.bos.redhat.com 2.6.18-232.el5 #1 SMP Mon Nov 15 16:01:45 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-bl460cg5-01 ~]# for i in {1..10}; do service network restart; done
......
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/net/bonding/bonding.h:135
invalid opcode: 0000 [1] SMP
last sysfs file: /class/net/bond0.10/broadcast
CPU 3
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc 8021q bonding ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table mperf dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tpm_tis tpm shpchp hpilo i5000_edac tpm_bios serio_raw bnx2 edac_mc pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2792, comm: bond0 Not tainted 2.6.18-232.el5 #1
RIP: 0010:[<ffffffff883bfe95>] [<ffffffff883bfe95>] :bonding:bond_mii_monitor+0x41e/0x4c0
RSP: 0018:ffff810227d4be10 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff81022c384530 RCX: ffffffff80318f28
RDX: 0000000000000000 RSI: ffff81022de6d400 RDI: ffffffff80357a40
RBP: ffff81022c384500 R08: ffffffff80318f28 R09: 000000000000003f
R10: ffff810227d4bab0 R11: 0000000000000280 R12: ffff81022de6d400
R13: 0000000000000001 R14: 0000000000000002 R15: ffffffff883bfa77
FS: 0000000000000000(0000) GS:ffff81022ff26640(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bdc5c CR3: 0000000000201000 CR4: 00000000000006e0
Process bond0 (pid: 2792, threadinfo ffff810227d4a000, task ffff81022e523080)
Stack: ffff81022c384878 ffff81022c384880 ffff81022c10c8c0 0000000000000282
ffff81022c384500 ffffffff8004d7aa ffff810227d4be80 ffff81022c10c8c0
ffffffff80049ff2 ffff81022b9e5d68 0000000000000282 ffff81022b9e5d58
Call Trace:
[<ffffffff8004d7aa>] run_workqueue+0x99/0xf6
[<ffffffff80049ff2>] worker_thread+0x0/0x122
[<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
[<ffffffff8004a0e2>] worker_thread+0xf0/0x122
[<ffffffff8008e414>] default_wake_function+0x0/0xe
[<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
[<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032968>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003286a>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 0f 0b 68 aa 84 3c 88 c2 87 00 48 8d 5d 34 48 89 df e8 c9 4c
RIP [<ffffffff883bfe95>] :bonding:bond_mii_monitor+0x41e/0x4c0
RSP <ffff810227d4be10>
<0>Kernel panic - not syncing: Fatal exception
On kernel 238.5.1.el5 and kernel 238.9.1.el5:
After running "servie network restart" continuously for about 40 hours. No kernel panic was detected.
So we think the bug is fixed. Set Verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0429.html
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
A bug was discovered in the bonding driver that occurred when using netpoll and changing, adding or removing slaves from a bond. The misuse of a per-cpu flag in the bonding driver during these operations at the wrong time could lead to the detection of an invalid state in the bonding driver, triggering kernel panic. With this update, the use of the aforementioned per-cpu flag has been corrected and a kernel panic no longer occurs.
|