Red Hat Bugzilla – Bug 140331
stack overflows can occur on x86_64 under stack pressure when softirq's are handled
Last modified: 2007-11-30 17:07:05 EST
Description of problem: IBM and TI have reported to us that under low stack pressure, the x86_64 platform can encounter a stack overflow when handling a softirq. This is due to the fact that local_bh_enable, as defined for the x86_64 platform, calls do_softirq_thunk, which in turn enters do_softirq using the process stack, rather than the normal per-irq stacks which the softirq task normally uses. Version-Release number of selected component (if applicable): How reproducible: sometimes Steps to Reproduce: 1.force process stack usage down to a point where > 1k of free stack remains 2.lock a spinlock with spin_lock_irqsave 3.trigger a softirq (I believe scheduling a tasklet will do this) 4.unlock the spinlock with spin_unlock_irqrestore Actual results: system will oops on stack overflow Expected results: system should not oops Additional info: The above reproducer instructions are generic. The problem was initially reported in IT numbers 39062 and 46982 as problems with clearcase, as clearcase makes significant stack usage and can trigger the problem. However, any method of eating most of a process stack can trigger this issue.
Created attachment 107177 [details] patch to enable low stack checking for softirqs on x86_64 This patch solves the issue by adding x86_64 to the list of arches which can detect low stack pressure, and consequently defer their processing until a later time.
Neil has posted a patch to RHKL for this on 11/22.
Patch has been accepted into RHEL 3, targeting patch for inclusion into U5 this week. Will update when hot fix kernel is available later this week.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this afternoon (in kernel version 2.4.21-27.4.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html