Bug 459869 - Badness from xenU kernel.
Badness from xenU kernel.
Status: CLOSED NEXTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
i686 Linux
medium Severity medium
: rc
: ---
Assigned To: Thomas Graf
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-23 06:19 EDT by Russell Coker
Modified: 2014-06-18 04:29 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-13 09:30:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (735 bytes, patch)
2008-12-16 11:09 EST, john.haxby@oracle.com
no flags Details | Diff
submitted patch (1.96 KB, patch)
2009-01-13 10:53 EST, Thomas Graf
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
CentOS 3763 None None None Never

  None (edit)
Description Russell Coker 2008-08-23 06:19:06 EDT
The following occurs in the kernel message log when running kernel 2.6.9-78.0.1.ELxenU on one of my virtual machines.  The Dom0 runs 2.6.18-92.1.10.el5xen.  The problem occurs when I have 1 or two VCPUs assigned to the DomU (I have not tested with more than 2).  The problem does not occur on other RHEL4 DomU's on the same Xen server, I don't know why.

Badness in local_bh_enable at kernel/softirq.c:141
 [<c0121170>] local_bh_enable+0x3f/0x62
 [<c02177cd>] skb_checksum+0x133/0x25e
 [<c0250efe>] udp_poll+0x66/0x113
 [<c02135f5>] sock_poll+0x19/0x1d
 [<c016d182>] do_select+0x190/0x2c7
 [<c016ce91>] __pollwait+0x0/0x9b
 [<c0144ab4>] __kmalloc+0x56/0xd3
 [<c016d5b8>] sys_select+0x2e7/0x45c
 [<c016bf1a>] sys_fcntl64+0x78/0x7f
 [<c010740f>] syscall_call+0x7/0xb
Comment 1 Frank Arnold 2008-09-29 08:38:57 EDT
We see nearly the same message sporadically getting logged while running stress tests with RHEL4u7 32-bit SMP and 32-bit PAE HVM guests on upstream Xen with the PV network driver (xen-vnif) enabled.

Badness in local_bh_enable at kernel/softirq.c:141
 [<c0126e1d>] local_bh_enable+0x34/0x57
 [<c0287d01>] skb_checksum+0x136/0x260
 [<c02c1a66>] udp_poll+0x5a/0x105
 [<c0283c74>] sock_poll+0x12/0x14
 [<c016d6d9>] do_select+0x196/0x2c6
 [<c016d409>] __pollwait+0x0/0x95
 [<c016dafc>] sys_select+0x2e0/0x43a
 [<c01265f5>] sys_gettimeofday+0x53/0xac
 [<c02e09db>] syscall_call+0x7/0xb
Comment 2 john.haxby@oracle.com 2008-12-16 11:09:29 EST
Created attachment 327119 [details]
Proposed patch

The problem is that .../net/ipv4/udp.c udp_poll() acquires the wrong spinlock to protect its critical section.  This patch uses the correct spinlock (the same spinlock that the RHEL5 kernel uses).  The backported patch that included udp_poll() mistakenly picked up the wrong code.
Comment 3 john.haxby@oracle.com 2008-12-16 11:11:08 EST
I should add that this patch has been used in anger for some little while the problem has not re-occurred.
Comment 4 Chris Lalancette 2008-12-16 11:19:19 EST
OK, the patch looks totally reasonable, and seems to be upstream (in RHEL-5 at least).  I'm going to re-assign this to the regular kernel team, since this doesn't seem to be a virt-specific issue.

Chris Lalancette
Comment 5 john.haxby@oracle.com 2008-12-17 08:36:13 EST
It's also a lot easier to trigger the problem than I first thought.  One of my xen 4.7 guests threw this error a lot trying to use NFS on servers in California (I'm in the UK).   A kernel built with the patch immediately quashed the errors.
Comment 6 Linda Wang 2009-01-13 09:30:57 EST

*** This bug has been marked as a duplicate of bug 459185 ***
Comment 7 john.haxby@oracle.com 2009-01-13 10:26:47 EST
Nice of you to close this a as a duplicate of a bug that we can't see! :-)

Do you think you could either re-open this one and close bug 459185 as a duplicate of this one or change the visibility of 459185?
Comment 8 Thomas Graf 2009-01-13 10:53:58 EST
Created attachment 328874 [details]
submitted patch

Bug 459185 includes the following patch which also covers this bug.
Comment 9 john.haxby@oracle.com 2009-05-03 08:31:41 EDT
Is there any chance of a fix being released for this soon? I have one 4.7 machine that reports this error dozens of time a day.
Comment 10 Chris Lalancette 2009-05-04 03:38:44 EDT
(In reply to comment #9)
> Is there any chance of a fix being released for this soon? I have one 4.7
> machine that reports this error dozens of time a day.  

You can get updated RPMS with this patch in it from here:

http://people.redhat.com/vgoyal/rhel4/RPMS.kernel/

This will be part of 4.8.  If you need an officially supported fix before that, please go through your friendly support channel and ask for this to be added to z-stream (no guarantees that it will, but we can't do anything here in bugzilla).

In the future, you'll probably want to use bz 459185 to get more attention, since this BZ has been closed as a dup of that one.

Chris Lalancette
Comment 11 Frank Arnold 2009-05-04 06:56:50 EDT
(In reply to comment #10)
> In the future, you'll probably want to use bz 459185 to get more attention,
> since this BZ has been closed as a dup of that one.

Chris, see comment #7. Bug 459185 is restricted and we're still not authorized to write or even look into this bug report. But it's fixed in 4.8, that's true.
Comment 12 Jerry Amundson 2009-08-11 10:58:31 EDT
Glad to see it's nearly the end for this bad boy. It's been a long, annoying road.

Note You need to log in before you can comment on or make changes to this bug.