Description of problem: While runing regression tests in RHTS on RHEL5.4 snapshot I found following warning after /kernel/errata/5.3.z/470436 test finished and passed: Checking dmesg for specific failures! BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted) End of log. Links to RHTS logs in Additional info. Version-Release number of selected component (if applicable): RHEL5.4-Server-20090708.0 kernel-xen-2.6.18-157.el5 How reproducible: Run /kernel/errata/5.3.z/470436 test in RHTS on ia64 machine using xen kernel. Steps to Reproduce: 1. kernel_workflow.py -x -u rhuser -t /kernel/errata/5.3.z/470436 -a ia64 -S rhts.redhat.com -d RHEL5.4-Server-20090708.0 2. 3. Actual results: BUG warning in dmesg and test Fails Expected results: No BUG warning in dmesg and test Passes Additional info: Job where I saw the warning: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71162 Testruns that ever failed: http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?result=Fail&test_filter=/kernel/errata/5.3.z/470436
Created attachment 351275 [details] reproducer xml for RHTS The kernel_workflow.py reproducer does not run as i expected, so use attached xml file to reproduce on correct system configuration. /usr/bin/submit_job.py -S rhts.redhat.com -j bug510746.xml Scheduled job is here: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71644
Scheduled job in comment 1 was aborted so I scheduled another one with specific host: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71657
Unfortunately, the error does not help much. The log at http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8983261 does not include the full dmesg output. Would it be possible (by modifying the workflow?) to include /tmp/dmesg.log at the end of the output if the test fails? Thanks!
Hi Paolo, I scheduled yet another job, because provided xml file had an error. The job in comment 2 failed to install xen kernel and was testing non-xen one. Here's the job that will additionaly include dmesg log once it's finished. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72787
(In reply to comment #5) > Hi Paolo, I scheduled yet another job, because provided xml file had an error. > The job in comment 2 failed to install xen kernel and was testing non-xen one. > > Here's the job that will additionaly include dmesg log once it's finished. > http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72787 Damn, my XML file got corrupted somehow so another job is queued ATM. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72837
Great, now I think I have the stuff you requested. This happened on hp-bl870c-02.rhts.bos.redhat.com system. BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted) Call Trace: [<a00000010001d240>] show_stack+0x40/0xa0 sp=e00000018ac97bc0 bsp=e00000018ac915a8 [<a00000010001d2d0>] dump_stack+0x30/0x60 sp=e00000018ac97d90 bsp=e00000018ac91590 [<a00000010009b520>] local_bh_enable+0x120/0x1c0 sp=e00000018ac97d90 bsp=e00000018ac91578 [<a000000100549f90>] lock_sock+0x190/0x1c0 sp=e00000018ac97d90 bsp=e00000018ac91548 [<a000000100542f20>] sock_fasync+0xe0/0x320 sp=e00000018ac97dc0 bsp=e00000018ac914e8 [<a000000100544970>] sock_close+0x70/0xa0 sp=e00000018ac97dc0 bsp=e00000018ac914c0 [<a0000001001841e0>] __fput+0x1a0/0x420 sp=e00000018ac97dc0 bsp=e00000018ac91480 [<a0000001001844a0>] fput+0x40/0x60 sp=e00000018ac97dc0 bsp=e00000018ac91460 [<a00000010055a6d0>] __scm_destroy+0x130/0x1e0 sp=e00000018ac97dc0 bsp=e00000018ac91438 [<a000000100662e50>] unix_destruct_fds+0x70/0xa0 sp=e00000018ac97dd0 bsp=e00000018ac91418 [<a000000100550b70>] skb_release_head_state+0x1f0/0x300 sp=e00000018ac97e00 bsp=e00000018ac913e8 [<a000000100552640>] __kfree_skb+0x20/0x60 sp=e00000018ac97e00 bsp=e00000018ac913c8 [<a000000100552880>] kfree_skb+0x140/0x160 sp=e00000018ac97e00 bsp=e00000018ac91398 [<a000000100660f00>] unix_release_sock+0x360/0x460 sp=e00000018ac97e00 bsp=e00000018ac91340 [<a000000100661040>] unix_release+0x40/0x60 sp=e00000018ac97e00 bsp=e00000018ac91320 [<a0000001005447c0>] sock_release+0x80/0x1c0 sp=e00000018ac97e00 bsp=e00000018ac912f8 [<a000000100544980>] sock_close+0x80/0xa0 sp=e00000018ac97e10 bsp=e00000018ac912d0 [<a0000001001841e0>] __fput+0x1a0/0x420 sp=e00000018ac97e10 bsp=e00000018ac91290 [<a0000001001844a0>] fput+0x40/0x60 sp=e00000018ac97e10 bsp=e00000018ac91270 [<a00000010017da90>] filp_close+0x110/0x140 sp=e00000018ac97e10 bsp=e00000018ac91240 [<a00000010008fac0>] put_files_struct+0x120/0x1e0 sp=e00000018ac97e10 bsp=e00000018ac91200 [<a000000100093be0>] do_exit+0x7a0/0x1800 sp=e00000018ac97e10 bsp=e00000018ac911a8 [<a000000100094e50>] do_group_exit+0x210/0x220 sp=e00000018ac97e30 bsp=e00000018ac91170 [<a000000100094e80>] sys_exit_group+0x20/0x40 sp=e00000018ac97e30 bsp=e00000018ac91118 [<a00000010006ae00>] xen_trace_syscall+0x100/0x140 sp=e00000018ac97e30 bsp=e00000018ac91118 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e00000018ac98000 bsp=e00000018ac91118
Hm, it looks kind of similar to some other local_bh_enable badness we've had elsewhere (bz 508648, bz 498394, bz 470919). Paolo, care to take a look? Chris Lalancette
It looks very different from other local_bh_enable problems. :-(
For the record, here is the job that had the failure: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72909
Created attachment 354496 [details] patch that could fix the bug This is a backport of upstream 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c. It will remove the call to sock_fasync in the case of unix.c and thus it should fix the badness. However, the bug could still be latent for a more complicated testcase.
Dave, this bug has been assigned to kernel-xen, but it seems like a non-virtualization-related problem. It is related to the SCM_RIGHTS DoS of bug 470201, in that it is triggered by the same testcase. It is not as serious, however, because this is just a WARN_ON_ONCE rather than a kernel panic. My backport of an upstream patch should fix this bug by removing the execution path that triggered the bug. However, I didn't really understand the root cause of the problem (i.e. where are the IRQs enabled in the call trace of comment #7) and I'm pretty sure that the bug would resurface if FASYNC usage was added somehow to the unix.c testcase. Can you take a look?
Reassigned from kernel-xen to kernel as the patch does not affect Xen at all.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-174.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html