Bug 110224

Summary: kernel stall by aio (ksoftirqd stack overflow during SCSI softirq)
Product: Red Hat Enterprise Linux 3 Reporter: Jun'ichi NOMURA <junichi.nomura>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-03 02:15:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jun'ichi NOMURA 2003-11-17 09:13:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.5)
Gecko/20031024 Debian/1.5-2

Description of problem:
While putting I/O stress with aio on 4 disks through fibre channel HBA
(qla2300), the kernel becomes irresponsive to both ping and console
operation.

The problem is not reproducible. It happens twice so far.

Version-Release number of selected component (if applicable):
kernel-2.4.21-4.EL

How reproducible:
Sometimes

Steps to Reproduce:
1.Generate intensive I/O with aio to disks connected via Qlogic QLA2300.
2.
3.
    

Additional info:

After retrieving the processor's register information,
it showed that ar.bsp reached too high.
 AR.BSP = 0xE000000008487EA8
(considering that task_struct + kernel stack area should be
 from 0xE000000008480000 to 0xE000000008487fff and
 register backing store grows upwards)

In 2.4.21-4.EL, __scsi_end_request() calls scsi_release_command(),
which calls scsi_queue_next_request.
On the other hand, in the Linus' kernel, __scsi_end_request() calls
__scsi_release_command(), which does not call scsi_queue_next_request().

As scsi_queue_next_request can make call to __scsi_end_request()
eventually as following:
  scsi_release_command()
    scsi_queue_next_request()
      scsi_request_fn()
        __scsi_end_request()
          scsi_release_command()
            ....

may this difference cause the stack overflow under some conditions and
result in unexpected behaviour of operating system kernel?

Comment 1 Jeff Moyer 2003-11-25 18:51:35 UTC
queued for U1.

Comment 2 Mahesh Kunjal 2004-02-02 23:26:15 UTC
Did this fix make into U1 ?

Comment 3 Ernie Petrides 2004-02-03 00:45:26 UTC
Yes.  The fix was first committed to the (internal Engineering)
build of kernel version 2.4.21-4.9.EL on 4-Nov-2003.


Comment 4 Ernie Petrides 2004-12-03 02:15:52 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html