Bug 192445 - [RHEL3 U7] scsi_request_fn() leaks request structures
[RHEL3 U7] scsi_request_fn() leaks request structures
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Red Hat Kernel Manager
Brian Brock
: 192446 (view as bug list)
Depends On:
Blocks: 190430
  Show dependency treegraph
Reported: 2006-05-19 15:08 EDT by Stuart Hayes
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-07-12 13:21:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch to scsi_lib.c to fix leak (315 bytes, patch)
2006-05-19 15:08 EDT, Stuart Hayes
no flags Details | Diff
script used to see problem (925 bytes, application/octet-stream)
2006-05-19 15:14 EDT, Stuart Hayes
no flags Details

  None (edit)
Description Stuart Hayes 2006-05-19 15:08:16 EDT
Note that this problem is similar to, but not a duplicate of, BZ119771.

A problem was discovered (on RHEL3 U7) by a script that writes 0s to the first 
60M and last 10M of a scsi disk, and then sends a "scsi remove-single-device" 
command to /proc/scsi/scsi for the disk.  When this was done, the system would 
panic with a null pointer dereference even if a "sync" and a delay were put 
between the writes and the "remove-single-device" command.

I found that one of the commandblocks for the SCSI device had gotten lost--it 
wasn't in and sdev_free_q, sdev_retry_q, or scsi_done_cmds, nor was it active.

The problem is in scsi_request_fn() (in scsi_lib.c).  When this function gets 
a command, it calls scsi_allocate_device() to get one of the device's free 
commandblocks.  It then calls scsi_init_io_fn(), which tries to allocate some 
memory for the write data.  If this call fails (because the alloc fails), the 
commandblock will be lost--it won't be used again, which effectively reduces 
the SCSI device's queue size by 1.

The problem is that, when scsi_init_io_fn() fails, scsi_request_fn() is 
setting SCpnt->request.special = SCpnt, but it should actually be setting req-
>special = SCpnt, because that's what was checked earlier in scsi_request_fn() 
where it checks to see if a commandblock had already been reserved for the 
request.  SCpnt->request is a COPY of req, not a pointer to the req.

I'll attach a patch.  I'm not sure if it's too late to get this into RHEL3 or 

The only way I know of to actually reproduce this is to boot to RHEL3, run a 
small python script that zeros out the first 60M and last 10M of a RAID volume 
on a Dell SAS5/ir controller, and then do 'echo "scsi remove-single-device 0 0 
0 0" >/proc/scsi/scsi' to remove that controller from the kernel (your OS 
should be on a different controller).

This problem is not specific to that setup--I just haven't tried reproducing 
it any other way.

I'll attach the script, too.
Comment 1 Stuart Hayes 2006-05-19 15:08:16 EDT
Created attachment 129615 [details]
patch to scsi_lib.c to fix leak
Comment 2 Stuart Hayes 2006-05-19 15:14:46 EDT
Created attachment 129619 [details]
script used to see problem
Comment 3 Ernie Petrides 2006-05-19 18:25:39 EDT
RHEL3 is now closed.
Comment 4 Ernie Petrides 2006-05-19 18:28:40 EDT
*** Bug 192446 has been marked as a duplicate of this bug. ***
Comment 6 John Feeney 2007-07-12 13:21:29 EDT
With RHEL3 now in maintenance mode, where only critical customer issues
can be fixed, this bugzilla has been closed as wont fix due to its priority
level and a lack of recent activity. 

Note You need to log in before you can comment on or make changes to this bug.