Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 605711

Summary: [RHEL 5.5] fix panic on xen pv/bonding netconsole
Product: Red Hat Enterprise Linux 5 Reporter: Jarod Wilson <jarod>
Component: kernelAssignee: Thomas Graf <tgraf>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 5.6CC: agospoda, drjones, jlv, jolsa, jwest, lwang, mjenner, nhorman, pbonzini, rkhan, tgraf, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-09 09:36:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514491, 605694    
Attachments:
Description Flags
Follow-up panic fix patch none

Comment 1 Jarod Wilson 2010-06-18 15:30:20 UTC
Created attachment 425158 [details]
Follow-up panic fix patch

Comment 2 RHEL Program Management 2010-06-30 18:30:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Paolo Bonzini 2010-08-04 11:22:19 UTC
Are you sure this is the right patch?  I don't see how it can be correct considering that you'd have:

static void poll_napi(struct net_device *dev)
{
  ...
}

void netpoll_poll(struct netpoll *np)
{
  ...
  if (np->dev->poll)
    poll_napi(np);
}

Comment 4 Jarod Wilson 2010-08-04 13:09:12 UTC
(In reply to comment #3)
> Are you sure this is the right patch?  I don't see how it can be correct
> considering that you'd have:
> 
> static void poll_napi(struct net_device *dev)
> {
>   ...
> }
> 
> void netpoll_poll(struct netpoll *np)
> {
>   ...
>   if (np->dev->poll)
>     poll_napi(np);
> }    

Pretty sure its the exact patch I extracted from Oracle's kernel srpm. Never said their patches were actually *correct*. ;) But yeah, I see what you mean there, it does look broken.

Comment 7 Paolo Bonzini 2010-11-02 11:12:54 UTC
The wrong code is fixed by the follow-up patch, it seems, so the patch is salvageable.

However, I'm still not very convinced by the patch.  First of all, it's big.  I would start by trying to understand which commits this patch is backporting.  It must be half a dozen given the files it touches:

linux-2.6.18.x86_64/arch/i386/kernel/traps.c
linux-2.6.18.x86_64/arch/x86_64/kernel/traps.c
linux-2.6.18.x86_64/drivers/char/sysrq.c
linux-2.6.18.x86_64/drivers/net/bonding/bond_main.c
linux-2.6.18.x86_64/drivers/net/bonding/bonding.h
linux-2.6.18.x86_64/drivers/net/e1000/e1000_main.c
linux-2.6.18.x86_64/drivers/xen/netfront/netfront.c (wrong, see below)
linux-2.6.18.x86_64/include/linux/netpoll.h
linux-2.6.18.x86_64/kernel/printk.c
linux-2.6.18.x86_64/net/core/netpoll.c

Second, the only change that is really Xen-specific is pointless.  It changes network_start_xmit (the dev->hard_start_xmit callback) from spin_lock_irq to spin_lock_irqsave, even though interrupts must be enabled when hard_start_xmit is called according to Documentation/networking/netdevices.txt.  This change is also not upstream, by the way.

Everything else is not Xen-specific and way outside my area of expertise, so I'm reassigning this to kernel.

Comment 8 Paolo Bonzini 2010-11-08 19:15:36 UTC
Linda, please do not reassign this to xen-maint, see comment #7.

Comment 9 Linda Wang 2010-11-10 18:00:18 UTC
Paolo, got it..

Comment 11 Neil Horman 2010-11-16 13:41:03 UTC
Do we have an actual backtrace of the panic in question?  I recently backported my upstream netpoll over bonding changes to RHEL5 and they work fine.  My read on this at the moment is that the OEL patch is a upstream deviating hack at this point, that we have no need for.  If we, on the latest RHEL5 kernel (after RHEL5 commit 20d3e6aeeaf037228f80da1e3d402412c0644906) can cause a panic when using netconsole over bonding, we should look at this further, otherwise, I'm guessing its fixed.

Comment 12 Andy Gospodarek 2010-11-16 14:51:22 UTC
(In reply to comment #11)
> My read
> on this at the moment is that the OEL patch is a upstream deviating hack at
> this point, that we have no need for.

Your powers of reading comprehension on this issue cannot be doubted.

Comment 13 Thomas Graf 2010-11-16 15:30:52 UTC
Is there an actual bug behind this we are aware of? I couldn't link any of the changesets to an obvious bugfix.

Comment 15 Thomas Graf 2011-03-09 09:07:49 UTC
The bits touched by the "follow-up panic fix patch" come from commit:

commit 5106930bd6b57402205e3de54dae9476e215b622
Author: Stephen Hemminger <shemminger>
Date:   Mon Nov 19 19:18:11 2007 -0800

    [NETPOLL]: netpoll_poll() cleanup
    
    Restructure code slightly to improve readability:
      * dereference device once
      * change obvious while() loop
      * let poll_napi() handle null list itself
    
    Signed-off-by: Stephen Hemminger <shemminger>
    Signed-off-by: David S. Miller <davem>

This is definitely not fixing any bugs.