Bug 110224 - kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Jeffrey Moyer
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-11-17 04:13 EST by Jun'ichi NOMURA
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-02 21:15:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jun'ichi NOMURA 2003-11-17 04:13:33 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.5)
Gecko/20031024 Debian/1.5-2

Description of problem:
While putting I/O stress with aio on 4 disks through fibre channel HBA
(qla2300), the kernel becomes irresponsive to both ping and console
operation.

The problem is not reproducible. It happens twice so far.

Version-Release number of selected component (if applicable):
kernel-2.4.21-4.EL

How reproducible:
Sometimes

Steps to Reproduce:
1.Generate intensive I/O with aio to disks connected via Qlogic QLA2300.
2.
3.
    

Additional info:

After retrieving the processor's register information,
it showed that ar.bsp reached too high.
 AR.BSP = 0xE000000008487EA8
(considering that task_struct + kernel stack area should be
 from 0xE000000008480000 to 0xE000000008487fff and
 register backing store grows upwards)

In 2.4.21-4.EL, __scsi_end_request() calls scsi_release_command(),
which calls scsi_queue_next_request.
On the other hand, in the Linus' kernel, __scsi_end_request() calls
__scsi_release_command(), which does not call scsi_queue_next_request().

As scsi_queue_next_request can make call to __scsi_end_request()
eventually as following:
  scsi_release_command()
    scsi_queue_next_request()
      scsi_request_fn()
        __scsi_end_request()
          scsi_release_command()
            ....

may this difference cause the stack overflow under some conditions and
result in unexpected behaviour of operating system kernel?
Comment 1 Jeffrey Moyer 2003-11-25 13:51:35 EST
queued for U1.
Comment 2 Mahesh Kunjal 2004-02-02 18:26:15 EST
Did this fix make into U1 ?
Comment 3 Ernie Petrides 2004-02-02 19:45:26 EST
Yes.  The fix was first committed to the (internal Engineering)
build of kernel version 2.4.21-4.9.EL on 4-Nov-2003.
Comment 4 Ernie Petrides 2004-12-02 21:15:52 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html

Note You need to log in before you can comment on or make changes to this bug.