Bug 71514 - Infinite recursion in SCSI mid layer
Summary: Infinite recursion in SCSI mid layer
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 87937
TreeView+ depends on / blocked
 
Reported: 2002-08-14 17:35 UTC by Tim Wright
Modified: 2007-11-30 22:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-12-19 18:07:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2003:368 0 normal SHIPPED_LIVE Important: Updated IA64 kernel packages address security vulnerabilities, bugfixes 2003-12-19 05:00:00 UTC
Red Hat Product Errata RHSA-2004:017 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 1 2004-01-13 05:00:00 UTC

Description Tim Wright 2002-08-14 17:35:31 UTC
Description of Problem:
The SCSI mid layer in the 2.4.9 kernel contains an eveil stack-smashing infinite
recursion bug in the case that the SCpnt->init_command() fails (which happens on
Fibre-Channel controllers if the link happens to be down). The stack looks
something like this:
^M[<f8981f29>] __scsi_end_request [scsi_mod] 0xc9
^M[<f8982843>] scsi_request_fn [scsi_mod] 0x2a3
^M[<f899d3e0>] sd_template [sd_mod] 0x0
^M[<f8981d47>] scsi_queue_next_request [scsi_mod] 0x67
^M[<f897a524>] scsi_release_command_Rsmp_f12100cc [scsi_mod] 0x124
^M[<f8981fcd>] __scsi_end_request [scsi_mod] 0x16d
^M[<f8982843>] scsi_request_fn [scsi_mod] 0x2a3
... repeated dozens of times ...
^M[<f897a524>] scsi_release_command_Rsmp_f12100cc [scsi_mod] 0x124
^M[<f8981fcd>] __scsi_end_request [scsi_mod] 0x16d
^M[<f8982008>] scsi_end_request_Rsmp_8a9b35ba [scsi_mod] 0x18
^M[<f8982496>] scsi_io_completion_Rsmp_a96c15bb [scsi_mod] 0x3b6
^M[<f899ada4>] rw_intr [sd_mod] 0x204
^M[<f8963340>] lpfc_dpc [lpfcdd] 0x0
^M[<f897b17d>] scsi_finish_command [scsi_mod] 0xad
^M[<f897aeec>] scsi_bottom_half_handler [scsi_mod] 0xbc
^M[<c012143d>] bh_action [kernel] 0x4d
^M[<c01212df>] tasklet_hi_action [kernel] 0x7f
^M[<c012100b>] do_softirq [kernel] 0x7b
^M[<c0121635>] ksoftirqd [kernel] 0xf5
^M[<c0105886>] kernel_thread [kernel] 0x26
^M[<c0121540>] ksoftirqd [kernel] 0x0

The problem is that __scsi_end_request() calls scsi_release_command() causing
the loop. I note that in 2.4.18, it calls __scsi_release_command() instead,
preventing the recursion.

Version-Release number of selected component (if applicable):
2.4.9-e.3, 2.4.9-e.5 and 2.4.9-e.8 are all susceptible.

How Reproducible:
100%

Steps to Reproduce:
1. Install Advanced Server. Install Emulex LP9000 cards (or LP8000), or QLogic
fibre cards. Connect to a switched fabric.
2. Start a load.
3. Disable the port on the FC switch to which the server is connected.
4. Watch the fireworks :-)

Actual Results:
Infinite recursion, several task structures destroyed, eventuall
oops/panic/hang/crash when these data structures are accessed.

Expected Results:
System should could continue to operate normally and the I/Os to the downed link
should fail.

Additional Information:

Comment 1 dnd 2003-01-03 17:34:40 UTC
ME TOO:

I have Red Hat Advance Server 2.1 and latest errata kernel 2.4.9-e.10 trying to 
use "multipath" functionality with setup as follows:

 Redhat advanced server 2.1 in a SAN Proliant DL380 G2
 245299-B21          PCI 2Gb FC adapter
 A7346A              HP FC 1Gb/2Gb Entry Switch 8B
 Emulex LPFC (LP950) SCSI on PCI bus 01 device 18 irq 15 scsi1
 Brocade switches (16 ports) 
 VA7410.


Oops dump
---------

multipath: IO failure on sdf1, disabling IO path. 
Unable to handle kernel paging request at virtual address a7fa4070
*pde = 00000000
Oops: 0002
Kernel 2.4.9-e.10custom
CPU:    0
EIP:    0010:[<c1c6a423>]    Tainted: P 
EFLAGS: 00010082
EIP is at ___strtok_R29805c13 [] 0x18e1d0f 
eax: c12d7dfe   ebx: 00000000   ecx: 000000ac   edx: c1c6a418
esi: 00000018   edi: c1c6a418   ebp: c12d7630   esp: c12d7608
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 6, stackpage=c12d7000)
Stack: c6808174 c1c60ba0 c1a38c00 c68223e0 c26401a0 c1c6a400 00000000 00000256 
       00000000 00000296 c1c6a418 c68076fd c1c6a418 c1a38c00 c1c6a400 00000296 
       c1c6a400 c68004a5 c1c6a418 00000000 c1c6a400 c1a38cb4 00000000 00000000 
Call Trace: [<c6808174>] scsi_request_fn [scsi_mod] 0x264 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 
[<c6808195>] scsi_request_fn [scsi_mod] 0x285 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 
[<c6808195>] scsi_request_fn [scsi_mod] 0x285 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 
...



Comment 3 Tom Coughlan 2003-08-29 13:00:34 UTC
We will investigate, with a goal of fixing this in the next AS2.1 kernel update.

Comment 5 Jeff Moyer 2003-10-27 15:20:47 UTC
The fix for this bug is in the pensacola tree.

Comment 7 John Flanagan 2003-12-19 18:07:34 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2003-368.html



Note You need to log in before you can comment on or make changes to this bug.