The following occurs in the kernel message log when running kernel 2.6.9-78.0.1.ELxenU on one of my virtual machines. The Dom0 runs 2.6.18-92.1.10.el5xen. The problem occurs when I have 1 or two VCPUs assigned to the DomU (I have not tested with more than 2). The problem does not occur on other RHEL4 DomU's on the same Xen server, I don't know why. Badness in local_bh_enable at kernel/softirq.c:141 [<c0121170>] local_bh_enable+0x3f/0x62 [<c02177cd>] skb_checksum+0x133/0x25e [<c0250efe>] udp_poll+0x66/0x113 [<c02135f5>] sock_poll+0x19/0x1d [<c016d182>] do_select+0x190/0x2c7 [<c016ce91>] __pollwait+0x0/0x9b [<c0144ab4>] __kmalloc+0x56/0xd3 [<c016d5b8>] sys_select+0x2e7/0x45c [<c016bf1a>] sys_fcntl64+0x78/0x7f [<c010740f>] syscall_call+0x7/0xb
We see nearly the same message sporadically getting logged while running stress tests with RHEL4u7 32-bit SMP and 32-bit PAE HVM guests on upstream Xen with the PV network driver (xen-vnif) enabled. Badness in local_bh_enable at kernel/softirq.c:141 [<c0126e1d>] local_bh_enable+0x34/0x57 [<c0287d01>] skb_checksum+0x136/0x260 [<c02c1a66>] udp_poll+0x5a/0x105 [<c0283c74>] sock_poll+0x12/0x14 [<c016d6d9>] do_select+0x196/0x2c6 [<c016d409>] __pollwait+0x0/0x95 [<c016dafc>] sys_select+0x2e0/0x43a [<c01265f5>] sys_gettimeofday+0x53/0xac [<c02e09db>] syscall_call+0x7/0xb
Created attachment 327119 [details] Proposed patch The problem is that .../net/ipv4/udp.c udp_poll() acquires the wrong spinlock to protect its critical section. This patch uses the correct spinlock (the same spinlock that the RHEL5 kernel uses). The backported patch that included udp_poll() mistakenly picked up the wrong code.
I should add that this patch has been used in anger for some little while the problem has not re-occurred.
OK, the patch looks totally reasonable, and seems to be upstream (in RHEL-5 at least). I'm going to re-assign this to the regular kernel team, since this doesn't seem to be a virt-specific issue. Chris Lalancette
It's also a lot easier to trigger the problem than I first thought. One of my xen 4.7 guests threw this error a lot trying to use NFS on servers in California (I'm in the UK). A kernel built with the patch immediately quashed the errors.
*** This bug has been marked as a duplicate of bug 459185 ***
Nice of you to close this a as a duplicate of a bug that we can't see! :-) Do you think you could either re-open this one and close bug 459185 as a duplicate of this one or change the visibility of 459185?
Created attachment 328874 [details] submitted patch Bug 459185 includes the following patch which also covers this bug.
Is there any chance of a fix being released for this soon? I have one 4.7 machine that reports this error dozens of time a day.
(In reply to comment #9) > Is there any chance of a fix being released for this soon? I have one 4.7 > machine that reports this error dozens of time a day. You can get updated RPMS with this patch in it from here: http://people.redhat.com/vgoyal/rhel4/RPMS.kernel/ This will be part of 4.8. If you need an officially supported fix before that, please go through your friendly support channel and ask for this to be added to z-stream (no guarantees that it will, but we can't do anything here in bugzilla). In the future, you'll probably want to use bz 459185 to get more attention, since this BZ has been closed as a dup of that one. Chris Lalancette
(In reply to comment #10) > In the future, you'll probably want to use bz 459185 to get more attention, > since this BZ has been closed as a dup of that one. Chris, see comment #7. Bug 459185 is restricted and we're still not authorized to write or even look into this bug report. But it's fixed in 4.8, that's true.
Glad to see it's nearly the end for this bad boy. It's been a long, annoying road.