Bug 467244
| Summary: | On RHEL 5.2 32 bit rmmod bonding results in a kernel panic when configured in balance-tlb mode | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Chris Tatman <ctatman> | ||||
| Component: | kernel | Assignee: | Andy Gospodarek <agospoda> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.2 | CC: | benl, jfeeney, mgahagan, narendra_k, peterm, syeghiay, tao, thomas_chenault, wwlinuxengineering | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | i686 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-01-20 19:44:05 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Chris Tatman
2008-10-16 14:50:39 UTC
My original attempt to fix this was no good since it contained the following warning when using alb-mode bonding: Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch bonding: MII link monitoring set to 100 ms ADDRCONF(NETDEV_UP): bond0: link is not ready bnx2: eth0: using MSI ADDRCONF(NETDEV_UP): eth0: link is not ready bonding: bond0: enslaving eth0 as an active interface with a down link. bnx2: eth1: using MSI ADDRCONF(NETDEV_UP): eth1: link is not ready bonding: bond0: enslaving eth1 as an active interface with a down link. bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: first active interface up! ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready bonding: bond0: link status definitely up for interface eth1. bnx2: eth0 NIC Copper Link is Down bonding: bond0: link status definitely down for interface eth0, disabling it bonding: bond0: making interface eth1 the new active one. device eth1 entered promiscuous mode device eth1 left promiscuous mode BUG: scheduling while atomic: rmmod/0x00000100/9094 [<c06074e7>] schedule+0x43/0x9cd [<f89e6b6b>] fib6_clean_node+0x11/0x6a [ipv6] [<c06095fe>] _write_lock_bh+0x8/0x1a [<f89e6629>] fib6_walk+0x69/0x6e [ipv6] [<f89e6654>] fib6_clean_tree+0x26/0x2a [ipv6] [<c0607f23>] wait_for_completion+0x6b/0x8f [<c042027b>] default_wake_function+0x0/0xc [<c0434408>] synchronize_rcu+0x2a/0x2f [<c0434059>] wakeme_after_rcu+0x0/0x8 [<f8a2824a>] bond_alb_deinitialize+0x1d/0x52 [bonding] [<f8a22c73>] bond_release_all+0x1da/0x1f9 [bonding] [<f8a22cf1>] bond_free_all+0x5f/0xd2 [bonding] [<f8a2a3b6>] bonding_exit+0x1e/0x28 [bonding] [<c043e80a>] sys_delete_module+0x192/0x1b8 [<c04059bf>] apic_timer_interrupt+0x1f/0x24 [<c0404eff>] syscall_call+0x7/0xb ======================= bonding: bond0: released all slaves It seems the better option is to leave the functions where they are and check in tlb_clear_slave() if the hash-tbl has already been destroyed. Created attachment 320601 [details]
/tmp/bond-fix-tx-hashtable-panic.patch
This is probably a better fix.
Andy, (In reply to comment #1) > It seems the better option is to leave the functions where they are and check > in tlb_clear_slave() if the hash-tbl has already been destroyed. I agree. (In reply to comment #2) > This is probably a better fix. I tested the patch from comment #2. It fixes the issue. Narendra, thanks for testing that for me. I'll propose that upstream. Is it OK if I mention it was discovered by you and mention your email address (I'd like to give you credit) or would you prefer that I do not do that? (In reply to comment #4) > Narendra, thanks for testing that for me. I'll propose that upstream. > Is it OK if I mention it was discovered by you and mention your email address > (I'd like to give you credit) or would you prefer that I do not do that? Andy, Thanks for the gesture. Golaz(lee_golaz) from our netwok team discoverd the issue. Myself and Thomas (thomas_chenault) worked on the rootcause. in kernel-2.6.18-122.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 I tested the kernel-2.6.18-122.el5 from comment #12 and it fixes the issue. Andy, Will this fix make it to RHEL 5.3 ? Yes it will make 5.3. I tested the this on RHEL 5.2 snapshot 2 kernel ( 2.6.18-122.el5). This kernel fixes the issue. Updating the Status field based on comment #16. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |