Description of problem: Attempt to create VLAN iface on bond of two adapters. Service network restart ,kernel panic. Version-Release number of selected component (if applicable): kernel panci on 2.6.18-232-el5 but not on 2.6.18-194-el5 How reproducible: often Steps to Reproduce: 1.Configure network as follows vim /etc/modprobe.com alias eth0 bnx2 alias eth1 bnx2 alias bond0 bonding /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 ONBOOT=yes MASTER=bond0 SLAVE=yes HOTPLUG=no /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 ONBOOT=yes MASTER=bond0 SLAVE=yes HOTPLUG=no /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 BONDING_OPTS="mode=6 miimon=300" ONBOOT=yes BOOTPROTO=none /etc/sysconfig/network-scripts/ifcfg-bond0.10 DEVICE=bond0.10 ONBOOT=yes REORDER_HDR=no VLAN=yes BOOTPROTO=static IPADDR=192.168.18.18 NETMASK=255.255.255.0 2.service network start 3.repeat “service network restart“ command several times(about 10 times) Actual results: kernel panic [root@ibm-ls21-03 network-scripts]# service network restart Shutting down interface bond0.10: Removed VLAN -:bond0.10:- [ OK ] Shutting down interface bond0: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:14:5E:6D:1C:B8 - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts. ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/net/bonding/bonding.h:135 invalid opcode: 0000 [1] SMP last sysfs file: /class/net/bond0/bonding/slaves CPU 0 Modules linked in: bonding 8021q autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i2c_piix4 tpm_tis k8temp i2c_core k8_edac bnx2 tpm hwmon edac_mc serio_raw tpm_bios pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 12674, comm: ifdown-eth Not tainted 2.6.18-232.el5 #1 RIP: 0010:[<ffffffff884c2c0b>] [<ffffffff884c2c0b>] :bonding:bond_release+0x62/0x4f1 RSP: 0018:ffff810127609e28 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: 00000000000005dc RCX: ffffffff80318f28 RDX: ffffffff80318f28 RSI: ffff81022c488000 RDI: ffff8101281f2530 RBP: ffff8101281f2500 R08: ffffffff80318f28 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000080 R12: ffff8101281f2000 R13: 0000000000000006 R14: ffff81022c488000 R15: ffff81012ea50ac0 FS: 00002addc8667f50(0000) GS:ffffffff80424000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000388ca69220 CR3: 000000012743a000 CR4: 00000000000006e0 Process ifdown-eth (pid: 12674, threadinfo ffff810127608000, task ffff810128bc57a0) Stack: 00000000000080d0 ffffffff8006456b ffff810128bc57a0 00000000000005dc ffff81022c488000 ffff8101281f2500 0000000000000006 0000000000000006 ffff81012ea50ac0 ffffffff884cbb54 000000316874652d 0000000000000000 Call Trace: [<ffffffff8006456b>] __down_write_nested+0x12/0x92 [<ffffffff884cbb54>] :bonding:bonding_store_slaves+0x25c/0x2f7 [<ffffffff8010fdb5>] sysfs_write_file+0xb9/0xe8 [<ffffffff80016af0>] vfs_write+0xce/0x174 [<ffffffff800173a8>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 aa d4 4c 88 c2 87 00 4c 8b 6d 08 31 c0 eb 0c 4d 39 RIP [<ffffffff884c2c0b>] :bonding:bond_release+0x62/0x4f1 RSP <ffff810127609e28> <0>Kernel panic - not syncing: Fatal exception Expected results: Additional info:
It does not reproduce on 2.6.18.194-el5 so I think it is a regression bug.
*** Bug 659558 has been marked as a duplicate of this bug. ***
*** Bug 654600 has been marked as a duplicate of this bug. ***
http://marc.info/?l=linux-netdev&m=129166237512572&w=3 I've posted a patch for this upstream, and will backport for RHEL once its accepted.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2966935 Test build with backport
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-243.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Hello, I have a case open for a similar problem: 00414963 You can check it for further details. My problem doesn't involve VLAN, but a rh cluster where I have 7 services, and 5 with an IP associated (so 5 ip alias defined). When I run shutdown -r with all the services running I have a panic with screeshot I'm going to attach. If I manually stop the services and then shutdown all goes well without panic. It seems that kernel-2.6.18-243.el5 solves it for me too.
Created attachment 478208 [details] panic on shutdown rh el 5.6 cluster with 5 ip alias defined on 5 services
*** Bug 679499 has been marked as a duplicate of this bug. ***
*** Bug 689759 has been marked as a duplicate of this bug. ***
As this problem is understood and a fix is available, when can we expect a z-stream release?
Martin, very soon: https://bugzilla.redhat.com/show_bug.cgi?id=675664
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: A bug was discovered in the bonding driver that occurred when using netpoll and changing, adding or removing slaves from a bond. The misuse of a per-cpu flag in the bonding driver during these operations at the wrong time could lead to the detection of an invalid state in the bonding driver, triggering kernel panic. With this update, the use of the aforementioned per-cpu flag has been corrected and a kernel panic no longer occurs.
Reproduced in 2.6.18-241.el5 and verified in 2.6.18-243.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html
*** Bug 725849 has been marked as a duplicate of this bug. ***