Description of problem:
IBM and TI have reported to us that under low stack pressure, the
x86_64 platform can encounter a stack overflow when handling a
softirq. This is due to the fact that local_bh_enable, as defined for
the x86_64 platform, calls do_softirq_thunk, which in turn enters
do_softirq using the process stack, rather than the normal per-irq
stacks which the softirq task normally uses.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.force process stack usage down to a point where > 1k of free stack
2.lock a spinlock with spin_lock_irqsave
3.trigger a softirq (I believe scheduling a tasklet will do this)
4.unlock the spinlock with spin_unlock_irqrestore
system will oops on stack overflow
system should not oops
The above reproducer instructions are generic. The problem was
initially reported in IT numbers 39062 and 46982 as problems with
clearcase, as clearcase makes significant stack usage and can trigger
the problem. However, any method of eating most of a process stack
can trigger this issue.
Created attachment 107177 [details]
patch to enable low stack checking for softirqs on x86_64
This patch solves the issue by adding x86_64 to the list of arches which can
detect low stack pressure, and consequently defer their processing until a
Neil has posted a patch to RHKL for this on 11/22.
Patch has been accepted into RHEL 3, targeting patch for inclusion
into U5 this week. Will update when hot fix kernel is available later
A fix for this problem has just been committed to the RHEL3 U5
patch pool this afternoon (in kernel version 2.4.21-27.4.EL).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.