Bug 605711
| Summary: | [RHEL 5.5] fix panic on xen pv/bonding netconsole | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jarod Wilson <jarod> | ||||
| Component: | kernel | Assignee: | Thomas Graf <tgraf> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.6 | CC: | agospoda, drjones, jlv, jolsa, jwest, lwang, mjenner, nhorman, pbonzini, rkhan, tgraf, xen-maint | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-03-09 09:36:32 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 514491, 605694 | ||||||
| Attachments: |
|
||||||
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Are you sure this is the right patch? I don't see how it can be correct considering that you'd have:
static void poll_napi(struct net_device *dev)
{
...
}
void netpoll_poll(struct netpoll *np)
{
...
if (np->dev->poll)
poll_napi(np);
}
(In reply to comment #3) > Are you sure this is the right patch? I don't see how it can be correct > considering that you'd have: > > static void poll_napi(struct net_device *dev) > { > ... > } > > void netpoll_poll(struct netpoll *np) > { > ... > if (np->dev->poll) > poll_napi(np); > } Pretty sure its the exact patch I extracted from Oracle's kernel srpm. Never said their patches were actually *correct*. ;) But yeah, I see what you mean there, it does look broken. The wrong code is fixed by the follow-up patch, it seems, so the patch is salvageable. However, I'm still not very convinced by the patch. First of all, it's big. I would start by trying to understand which commits this patch is backporting. It must be half a dozen given the files it touches: linux-2.6.18.x86_64/arch/i386/kernel/traps.c linux-2.6.18.x86_64/arch/x86_64/kernel/traps.c linux-2.6.18.x86_64/drivers/char/sysrq.c linux-2.6.18.x86_64/drivers/net/bonding/bond_main.c linux-2.6.18.x86_64/drivers/net/bonding/bonding.h linux-2.6.18.x86_64/drivers/net/e1000/e1000_main.c linux-2.6.18.x86_64/drivers/xen/netfront/netfront.c (wrong, see below) linux-2.6.18.x86_64/include/linux/netpoll.h linux-2.6.18.x86_64/kernel/printk.c linux-2.6.18.x86_64/net/core/netpoll.c Second, the only change that is really Xen-specific is pointless. It changes network_start_xmit (the dev->hard_start_xmit callback) from spin_lock_irq to spin_lock_irqsave, even though interrupts must be enabled when hard_start_xmit is called according to Documentation/networking/netdevices.txt. This change is also not upstream, by the way. Everything else is not Xen-specific and way outside my area of expertise, so I'm reassigning this to kernel. Linda, please do not reassign this to xen-maint, see comment #7. Paolo, got it.. Do we have an actual backtrace of the panic in question? I recently backported my upstream netpoll over bonding changes to RHEL5 and they work fine. My read on this at the moment is that the OEL patch is a upstream deviating hack at this point, that we have no need for. If we, on the latest RHEL5 kernel (after RHEL5 commit 20d3e6aeeaf037228f80da1e3d402412c0644906) can cause a panic when using netconsole over bonding, we should look at this further, otherwise, I'm guessing its fixed. (In reply to comment #11) > My read > on this at the moment is that the OEL patch is a upstream deviating hack at > this point, that we have no need for. Your powers of reading comprehension on this issue cannot be doubted. Is there an actual bug behind this we are aware of? I couldn't link any of the changesets to an obvious bugfix. The bits touched by the "follow-up panic fix patch" come from commit:
commit 5106930bd6b57402205e3de54dae9476e215b622
Author: Stephen Hemminger <shemminger>
Date: Mon Nov 19 19:18:11 2007 -0800
[NETPOLL]: netpoll_poll() cleanup
Restructure code slightly to improve readability:
* dereference device once
* change obvious while() loop
* let poll_napi() handle null list itself
Signed-off-by: Stephen Hemminger <shemminger>
Signed-off-by: David S. Miller <davem>
This is definitely not fixing any bugs.
|
Created attachment 425158 [details] Follow-up panic fix patch