Red Hat Bugzilla – Bug 110224
kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
Last modified: 2007-11-30 17:06:59 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.5)
Description of problem:
While putting I/O stress with aio on 4 disks through fibre channel HBA
(qla2300), the kernel becomes irresponsive to both ping and console
The problem is not reproducible. It happens twice so far.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Generate intensive I/O with aio to disks connected via Qlogic QLA2300.
After retrieving the processor's register information,
it showed that ar.bsp reached too high.
AR.BSP = 0xE000000008487EA8
(considering that task_struct + kernel stack area should be
from 0xE000000008480000 to 0xE000000008487fff and
register backing store grows upwards)
In 2.4.21-4.EL, __scsi_end_request() calls scsi_release_command(),
which calls scsi_queue_next_request.
On the other hand, in the Linus' kernel, __scsi_end_request() calls
__scsi_release_command(), which does not call scsi_queue_next_request().
As scsi_queue_next_request can make call to __scsi_end_request()
eventually as following:
may this difference cause the stack overflow under some conditions and
result in unexpected behaviour of operating system kernel?
queued for U1.
Did this fix make into U1 ?
Yes. The fix was first committed to the (internal Engineering)
build of kernel version 2.4.21-4.9.EL on 4-Nov-2003.
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.