This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 71514 - Infinite recursion in SCSI mid layer
Infinite recursion in SCSI mid layer
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
Depends On:
Blocks: 87937
  Show dependency treegraph
Reported: 2002-08-14 13:35 EDT by Tim Wright
Modified: 2007-11-30 17:06 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-12-19 13:07:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Tim Wright 2002-08-14 13:35:31 EDT
Description of Problem:
The SCSI mid layer in the 2.4.9 kernel contains an eveil stack-smashing infinite
recursion bug in the case that the SCpnt->init_command() fails (which happens on
Fibre-Channel controllers if the link happens to be down). The stack looks
something like this:
^M[<f8981f29>] __scsi_end_request [scsi_mod] 0xc9
^M[<f8982843>] scsi_request_fn [scsi_mod] 0x2a3
^M[<f899d3e0>] sd_template [sd_mod] 0x0
^M[<f8981d47>] scsi_queue_next_request [scsi_mod] 0x67
^M[<f897a524>] scsi_release_command_Rsmp_f12100cc [scsi_mod] 0x124
^M[<f8981fcd>] __scsi_end_request [scsi_mod] 0x16d
^M[<f8982843>] scsi_request_fn [scsi_mod] 0x2a3
... repeated dozens of times ...
^M[<f897a524>] scsi_release_command_Rsmp_f12100cc [scsi_mod] 0x124
^M[<f8981fcd>] __scsi_end_request [scsi_mod] 0x16d
^M[<f8982008>] scsi_end_request_Rsmp_8a9b35ba [scsi_mod] 0x18
^M[<f8982496>] scsi_io_completion_Rsmp_a96c15bb [scsi_mod] 0x3b6
^M[<f899ada4>] rw_intr [sd_mod] 0x204
^M[<f8963340>] lpfc_dpc [lpfcdd] 0x0
^M[<f897b17d>] scsi_finish_command [scsi_mod] 0xad
^M[<f897aeec>] scsi_bottom_half_handler [scsi_mod] 0xbc
^M[<c012143d>] bh_action [kernel] 0x4d
^M[<c01212df>] tasklet_hi_action [kernel] 0x7f
^M[<c012100b>] do_softirq [kernel] 0x7b
^M[<c0121635>] ksoftirqd [kernel] 0xf5
^M[<c0105886>] kernel_thread [kernel] 0x26
^M[<c0121540>] ksoftirqd [kernel] 0x0

The problem is that __scsi_end_request() calls scsi_release_command() causing
the loop. I note that in 2.4.18, it calls __scsi_release_command() instead,
preventing the recursion.

Version-Release number of selected component (if applicable):
2.4.9-e.3, 2.4.9-e.5 and 2.4.9-e.8 are all susceptible.

How Reproducible:

Steps to Reproduce:
1. Install Advanced Server. Install Emulex LP9000 cards (or LP8000), or QLogic
fibre cards. Connect to a switched fabric.
2. Start a load.
3. Disable the port on the FC switch to which the server is connected.
4. Watch the fireworks :-)

Actual Results:
Infinite recursion, several task structures destroyed, eventuall
oops/panic/hang/crash when these data structures are accessed.

Expected Results:
System should could continue to operate normally and the I/Os to the downed link
should fail.

Additional Information:
Comment 1 dnd 2003-01-03 12:34:40 EST

I have Red Hat Advance Server 2.1 and latest errata kernel 2.4.9-e.10 trying to 
use "multipath" functionality with setup as follows:

 Redhat advanced server 2.1 in a SAN Proliant DL380 G2
 245299-B21          PCI 2Gb FC adapter
 A7346A              HP FC 1Gb/2Gb Entry Switch 8B
 Emulex LPFC (LP950) SCSI on PCI bus 01 device 18 irq 15 scsi1
 Brocade switches (16 ports) 

Oops dump

multipath: IO failure on sdf1, disabling IO path. 
Unable to handle kernel paging request at virtual address a7fa4070
*pde = 00000000
Oops: 0002
Kernel 2.4.9-e.10custom
CPU:    0
EIP:    0010:[<c1c6a423>]    Tainted: P 
EFLAGS: 00010082
EIP is at ___strtok_R29805c13 [] 0x18e1d0f 
eax: c12d7dfe   ebx: 00000000   ecx: 000000ac   edx: c1c6a418
esi: 00000018   edi: c1c6a418   ebp: c12d7630   esp: c12d7608
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 6, stackpage=c12d7000)
Stack: c6808174 c1c60ba0 c1a38c00 c68223e0 c26401a0 c1c6a400 00000000 00000256 
       00000000 00000296 c1c6a418 c68076fd c1c6a418 c1a38c00 c1c6a400 00000296 
       c1c6a400 c68004a5 c1c6a418 00000000 c1c6a400 c1a38cb4 00000000 00000000 
Call Trace: [<c6808174>] scsi_request_fn [scsi_mod] 0x264 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 
[<c6808195>] scsi_request_fn [scsi_mod] 0x285 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 
[<c6808195>] scsi_request_fn [scsi_mod] 0x285 
[<c68223e0>] sd_template [sd_mod] 0x0 
[<c68076fd>] scsi_queue_next_request [scsi_mod] 0x3d 
[<c68004a5>] scsi_release_command_Ra9b69956 [scsi_mod] 0x105 
[<c6807939>] __scsi_end_request [scsi_mod] 0x179 

Comment 3 Tom Coughlan 2003-08-29 09:00:34 EDT
We will investigate, with a goal of fixing this in the next AS2.1 kernel update.
Comment 5 Jeffrey Moyer 2003-10-27 10:20:47 EST
The fix for this bug is in the pensacola tree.
Comment 7 John Flanagan 2003-12-19 13:07:34 EST
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.