Bug 510746
| Summary: | BUG: warning at kernel/softirq.c:138/local_bh_enable() (Tainted: G ) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jan Tluka <jtluka> | ||||||
| Component: | kernel | Assignee: | Paolo Bonzini <pbonzini> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 5.4 | CC: | clalance, davem, dzickus, emcnabb, pbonzini, peterm, prarit, qcai, xen-maint, yzheng | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | ia64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2010-03-30 07:45:26 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 470201, 470436 | ||||||||
| Bug Blocks: | 533192 | ||||||||
| Attachments: |
|
||||||||
|
Description
Jan Tluka
2009-07-10 15:01:40 UTC
Created attachment 351275 [details] reproducer xml for RHTS The kernel_workflow.py reproducer does not run as i expected, so use attached xml file to reproduce on correct system configuration. /usr/bin/submit_job.py -S rhts.redhat.com -j bug510746.xml Scheduled job is here: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71644 Scheduled job in comment 1 was aborted so I scheduled another one with specific host: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=71657 Unfortunately, the error does not help much. The log at http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8983261 does not include the full dmesg output. Would it be possible (by modifying the workflow?) to include /tmp/dmesg.log at the end of the output if the test fails? Thanks! Hi Paolo, I scheduled yet another job, because provided xml file had an error. The job in comment 2 failed to install xen kernel and was testing non-xen one. Here's the job that will additionaly include dmesg log once it's finished. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72787 (In reply to comment #5) > Hi Paolo, I scheduled yet another job, because provided xml file had an error. > The job in comment 2 failed to install xen kernel and was testing non-xen one. > > Here's the job that will additionaly include dmesg log once it's finished. > http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72787 Damn, my XML file got corrupted somehow so another job is queued ATM. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72837 Great, now I think I have the stuff you requested.
This happened on hp-bl870c-02.rhts.bos.redhat.com system.
BUG: warning at kernel/softirq.c:138/local_bh_enable() (Not tainted)
Call Trace:
[<a00000010001d240>] show_stack+0x40/0xa0
sp=e00000018ac97bc0 bsp=e00000018ac915a8
[<a00000010001d2d0>] dump_stack+0x30/0x60
sp=e00000018ac97d90 bsp=e00000018ac91590
[<a00000010009b520>] local_bh_enable+0x120/0x1c0
sp=e00000018ac97d90 bsp=e00000018ac91578
[<a000000100549f90>] lock_sock+0x190/0x1c0
sp=e00000018ac97d90 bsp=e00000018ac91548
[<a000000100542f20>] sock_fasync+0xe0/0x320
sp=e00000018ac97dc0 bsp=e00000018ac914e8
[<a000000100544970>] sock_close+0x70/0xa0
sp=e00000018ac97dc0 bsp=e00000018ac914c0
[<a0000001001841e0>] __fput+0x1a0/0x420
sp=e00000018ac97dc0 bsp=e00000018ac91480
[<a0000001001844a0>] fput+0x40/0x60
sp=e00000018ac97dc0 bsp=e00000018ac91460
[<a00000010055a6d0>] __scm_destroy+0x130/0x1e0
sp=e00000018ac97dc0 bsp=e00000018ac91438
[<a000000100662e50>] unix_destruct_fds+0x70/0xa0
sp=e00000018ac97dd0 bsp=e00000018ac91418
[<a000000100550b70>] skb_release_head_state+0x1f0/0x300
sp=e00000018ac97e00 bsp=e00000018ac913e8
[<a000000100552640>] __kfree_skb+0x20/0x60
sp=e00000018ac97e00 bsp=e00000018ac913c8
[<a000000100552880>] kfree_skb+0x140/0x160
sp=e00000018ac97e00 bsp=e00000018ac91398
[<a000000100660f00>] unix_release_sock+0x360/0x460
sp=e00000018ac97e00 bsp=e00000018ac91340
[<a000000100661040>] unix_release+0x40/0x60
sp=e00000018ac97e00 bsp=e00000018ac91320
[<a0000001005447c0>] sock_release+0x80/0x1c0
sp=e00000018ac97e00 bsp=e00000018ac912f8
[<a000000100544980>] sock_close+0x80/0xa0
sp=e00000018ac97e10 bsp=e00000018ac912d0
[<a0000001001841e0>] __fput+0x1a0/0x420
sp=e00000018ac97e10 bsp=e00000018ac91290
[<a0000001001844a0>] fput+0x40/0x60
sp=e00000018ac97e10 bsp=e00000018ac91270
[<a00000010017da90>] filp_close+0x110/0x140
sp=e00000018ac97e10 bsp=e00000018ac91240
[<a00000010008fac0>] put_files_struct+0x120/0x1e0
sp=e00000018ac97e10 bsp=e00000018ac91200
[<a000000100093be0>] do_exit+0x7a0/0x1800
sp=e00000018ac97e10 bsp=e00000018ac911a8
[<a000000100094e50>] do_group_exit+0x210/0x220
sp=e00000018ac97e30 bsp=e00000018ac91170
[<a000000100094e80>] sys_exit_group+0x20/0x40
sp=e00000018ac97e30 bsp=e00000018ac91118
[<a00000010006ae00>] xen_trace_syscall+0x100/0x140
sp=e00000018ac97e30 bsp=e00000018ac91118
[<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
sp=e00000018ac98000 bsp=e00000018ac91118
Hm, it looks kind of similar to some other local_bh_enable badness we've had elsewhere (bz 508648, bz 498394, bz 470919). Paolo, care to take a look? Chris Lalancette It looks very different from other local_bh_enable problems. :-( For the record, here is the job that had the failure: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=72909 Created attachment 354496 [details]
patch that could fix the bug
This is a backport of upstream 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c. It will remove the call to sock_fasync in the case of unix.c and thus it should fix the badness. However, the bug could still be latent for a more complicated testcase.
Dave, this bug has been assigned to kernel-xen, but it seems like a non-virtualization-related problem. It is related to the SCM_RIGHTS DoS of bug 470201, in that it is triggered by the same testcase. It is not as serious, however, because this is just a WARN_ON_ONCE rather than a kernel panic. My backport of an upstream patch should fix this bug by removing the execution path that triggered the bug. However, I didn't really understand the root cause of the problem (i.e. where are the IRQs enabled in the call trace of comment #7) and I'm pretty sure that the bug would resurface if FASYNC usage was added somehow to the unix.c testcase. Can you take a look? Reassigned from kernel-xen to kernel as the patch does not affect Xen at all. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-174.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |