Bug 510746 - BUG: warning at kernel/softirq.c:138/local_bh_enable() (Tainted: G )
BUG: warning at kernel/softirq.c:138/local_bh_enable() (Tainted: G )
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
ia64 Linux
low Severity medium
: rc
: ---
Assigned To: Paolo Bonzini
Red Hat Kernel QE team
Depends On: CVE-2008-5029 470436
Blocks: 533192
  Show dependency treegraph
Reported: 2009-07-10 11:01 EDT by Jan Tluka
Modified: 2017-07-18 22:30 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2010-03-30 03:45:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
reproducer xml for RHTS (1.15 KB, text/xml)
2009-07-10 11:18 EDT, Jan Tluka
no flags Details
patch that could fix the bug (15.04 KB, patch)
2009-07-21 10:55 EDT, Paolo Bonzini
no flags Details | Diff

  None (edit)
Description Jan Tluka 2009-07-10 11:01:40 EDT
Description of problem:
While runing regression tests in RHTS on RHEL5.4 snapshot I found following warning after /kernel/errata/5.3.z/470436 test finished and passed:

Checking dmesg for specific failures!
BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted)
End of log.

Links to RHTS logs in Additional info.

Version-Release number of selected component (if applicable):

How reproducible:
Run /kernel/errata/5.3.z/470436 test in RHTS on ia64 machine using xen kernel.

Steps to Reproduce:
1. kernel_workflow.py -x -u rhuser@redhat.com -t /kernel/errata/5.3.z/470436 -a ia64 -S rhts.redhat.com -d RHEL5.4-Server-20090708.0
Actual results:
BUG warning in dmesg and test Fails

Expected results:
No BUG warning in dmesg and test Passes

Additional info:
Job where I saw the warning: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71162
Testruns that ever failed: http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?result=Fail&test_filter=/kernel/errata/5.3.z/470436
Comment 1 Jan Tluka 2009-07-10 11:18:25 EDT
Created attachment 351275 [details]
reproducer xml for RHTS

The kernel_workflow.py reproducer does not run as i expected, so use attached xml file to reproduce on correct system configuration.

/usr/bin/submit_job.py -S rhts.redhat.com -j bug510746.xml

Scheduled job is here: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71644
Comment 2 Jan Tluka 2009-07-10 12:19:32 EDT
Scheduled job in comment 1 was aborted so I scheduled another one with specific host: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71657
Comment 4 Paolo Bonzini 2009-07-13 09:33:25 EDT
Unfortunately, the error does not help much.  The log at http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8983261 does not include the full dmesg output.

Would it be possible (by modifying the workflow?) to include /tmp/dmesg.log at the end of the output if the test fails?

Comment 5 Jan Tluka 2009-07-15 09:17:44 EDT
Hi Paolo, I scheduled yet another job, because provided xml file had an error. The job in comment 2 failed to install xen kernel and was testing non-xen one.

Here's the job that will additionaly include dmesg log once it's finished.
Comment 6 Jan Tluka 2009-07-15 13:08:03 EDT
(In reply to comment #5)
> Hi Paolo, I scheduled yet another job, because provided xml file had an error.
> The job in comment 2 failed to install xen kernel and was testing non-xen one.
> Here's the job that will additionaly include dmesg log once it's finished.
> http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72787  

Damn, my XML file got corrupted somehow so another job is queued ATM.
Comment 7 Jan Tluka 2009-07-17 07:50:24 EDT
Great, now I think I have the stuff you requested.
This happened on hp-bl870c-02.rhts.bos.redhat.com system.

BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted)

Call Trace:
 [<a00000010001d240>] show_stack+0x40/0xa0
                                sp=e00000018ac97bc0 bsp=e00000018ac915a8
 [<a00000010001d2d0>] dump_stack+0x30/0x60
                                sp=e00000018ac97d90 bsp=e00000018ac91590
 [<a00000010009b520>] local_bh_enable+0x120/0x1c0
                                sp=e00000018ac97d90 bsp=e00000018ac91578
 [<a000000100549f90>] lock_sock+0x190/0x1c0
                                sp=e00000018ac97d90 bsp=e00000018ac91548
 [<a000000100542f20>] sock_fasync+0xe0/0x320
                                sp=e00000018ac97dc0 bsp=e00000018ac914e8
 [<a000000100544970>] sock_close+0x70/0xa0
                                sp=e00000018ac97dc0 bsp=e00000018ac914c0
 [<a0000001001841e0>] __fput+0x1a0/0x420
                                sp=e00000018ac97dc0 bsp=e00000018ac91480
 [<a0000001001844a0>] fput+0x40/0x60
                                sp=e00000018ac97dc0 bsp=e00000018ac91460
 [<a00000010055a6d0>] __scm_destroy+0x130/0x1e0
                                sp=e00000018ac97dc0 bsp=e00000018ac91438
 [<a000000100662e50>] unix_destruct_fds+0x70/0xa0
                                sp=e00000018ac97dd0 bsp=e00000018ac91418
 [<a000000100550b70>] skb_release_head_state+0x1f0/0x300
                                sp=e00000018ac97e00 bsp=e00000018ac913e8
 [<a000000100552640>] __kfree_skb+0x20/0x60
                                sp=e00000018ac97e00 bsp=e00000018ac913c8
 [<a000000100552880>] kfree_skb+0x140/0x160
                                sp=e00000018ac97e00 bsp=e00000018ac91398
 [<a000000100660f00>] unix_release_sock+0x360/0x460
                                sp=e00000018ac97e00 bsp=e00000018ac91340
 [<a000000100661040>] unix_release+0x40/0x60
                                sp=e00000018ac97e00 bsp=e00000018ac91320
 [<a0000001005447c0>] sock_release+0x80/0x1c0
                                sp=e00000018ac97e00 bsp=e00000018ac912f8
 [<a000000100544980>] sock_close+0x80/0xa0
                                sp=e00000018ac97e10 bsp=e00000018ac912d0
 [<a0000001001841e0>] __fput+0x1a0/0x420
                                sp=e00000018ac97e10 bsp=e00000018ac91290
 [<a0000001001844a0>] fput+0x40/0x60
                                sp=e00000018ac97e10 bsp=e00000018ac91270
 [<a00000010017da90>] filp_close+0x110/0x140
                                sp=e00000018ac97e10 bsp=e00000018ac91240
 [<a00000010008fac0>] put_files_struct+0x120/0x1e0
                                sp=e00000018ac97e10 bsp=e00000018ac91200
 [<a000000100093be0>] do_exit+0x7a0/0x1800
                                sp=e00000018ac97e10 bsp=e00000018ac911a8
 [<a000000100094e50>] do_group_exit+0x210/0x220
                                sp=e00000018ac97e30 bsp=e00000018ac91170
 [<a000000100094e80>] sys_exit_group+0x20/0x40
                                sp=e00000018ac97e30 bsp=e00000018ac91118
 [<a00000010006ae00>] xen_trace_syscall+0x100/0x140
                                sp=e00000018ac97e30 bsp=e00000018ac91118
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000018ac98000 bsp=e00000018ac91118
Comment 8 Chris Lalancette 2009-07-17 09:05:46 EDT
Hm, it looks kind of similar to some other local_bh_enable badness we've had elsewhere (bz 508648, bz 498394, bz 470919).  Paolo, care to take a look?

Chris Lalancette
Comment 9 Paolo Bonzini 2009-07-21 07:16:26 EDT
It looks very different from other local_bh_enable problems. :-(
Comment 10 Paolo Bonzini 2009-07-21 08:07:43 EDT
For the record, here is the job that had the failure:
Comment 11 Paolo Bonzini 2009-07-21 10:55:29 EDT
Created attachment 354496 [details]
patch that could fix the bug

This is a backport of upstream 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c.  It will remove the call to sock_fasync in the case of unix.c and thus it should fix the badness.  However, the bug could still be latent for a more complicated testcase.
Comment 13 Paolo Bonzini 2009-07-24 04:49:58 EDT

this bug has been assigned to kernel-xen, but it seems like a non-virtualization-related problem.

It is related to the SCM_RIGHTS DoS of bug 470201, in that it is triggered by the same testcase.  It is not as serious, however, because this is just a WARN_ON_ONCE rather than a kernel panic.

My backport of an upstream patch should fix this bug by removing the execution path that triggered the bug.  However, I didn't really understand the root cause of the problem (i.e. where are the IRQs enabled in the call trace of comment #7) and I'm pretty sure that the bug would resurface if FASYNC usage was added somehow to the unix.c testcase.  Can you take a look?
Comment 17 Paolo Bonzini 2009-07-27 06:12:07 EDT
Reassigned from kernel-xen to kernel as the patch does not affect Xen at all.
Comment 19 RHEL Product and Program Management 2009-09-25 13:36:09 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 20 Don Zickus 2009-11-17 16:55:57 EST
in kernel-2.6.18-174.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 24 errata-xmlrpc 2010-03-30 03:45:26 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.