Bug 110224 - kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
Summary: kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.0
Hardware: ia64 Linux
medium
high
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-11-17 09:13 UTC by Jun'ichi NOMURA
Modified: 2007-11-30 22:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-03 02:15:52 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Jun'ichi NOMURA 2003-11-17 09:13:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.5)
Gecko/20031024 Debian/1.5-2

Description of problem:
While putting I/O stress with aio on 4 disks through fibre channel HBA
(qla2300), the kernel becomes irresponsive to both ping and console
operation.

The problem is not reproducible. It happens twice so far.

Version-Release number of selected component (if applicable):
kernel-2.4.21-4.EL

How reproducible:
Sometimes

Steps to Reproduce:
1.Generate intensive I/O with aio to disks connected via Qlogic QLA2300.
2.
3.
    

Additional info:

After retrieving the processor's register information,
it showed that ar.bsp reached too high.
 AR.BSP = 0xE000000008487EA8
(considering that task_struct + kernel stack area should be
 from 0xE000000008480000 to 0xE000000008487fff and
 register backing store grows upwards)

In 2.4.21-4.EL, __scsi_end_request() calls scsi_release_command(),
which calls scsi_queue_next_request.
On the other hand, in the Linus' kernel, __scsi_end_request() calls
__scsi_release_command(), which does not call scsi_queue_next_request().

As scsi_queue_next_request can make call to __scsi_end_request()
eventually as following:
  scsi_release_command()
    scsi_queue_next_request()
      scsi_request_fn()
        __scsi_end_request()
          scsi_release_command()
            ....

may this difference cause the stack overflow under some conditions and
result in unexpected behaviour of operating system kernel?

Comment 1 Jeff Moyer 2003-11-25 18:51:35 UTC
queued for U1.

Comment 2 Mahesh Kunjal 2004-02-02 23:26:15 UTC
Did this fix make into U1 ?

Comment 3 Ernie Petrides 2004-02-03 00:45:26 UTC
Yes.  The fix was first committed to the (internal Engineering)
build of kernel version 2.4.21-4.9.EL on 4-Nov-2003.


Comment 4 Ernie Petrides 2004-12-03 02:15:52 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html



Note You need to log in before you can comment on or make changes to this bug.