Red Hat Bugzilla – Bug 192445
[RHEL3 U7] scsi_request_fn() leaks request structures
Last modified: 2007-11-30 17:07:10 EST
Note that this problem is similar to, but not a duplicate of, BZ119771.
A problem was discovered (on RHEL3 U7) by a script that writes 0s to the first
60M and last 10M of a scsi disk, and then sends a "scsi remove-single-device"
command to /proc/scsi/scsi for the disk. When this was done, the system would
panic with a null pointer dereference even if a "sync" and a delay were put
between the writes and the "remove-single-device" command.
I found that one of the commandblocks for the SCSI device had gotten lost--it
wasn't in and sdev_free_q, sdev_retry_q, or scsi_done_cmds, nor was it active.
The problem is in scsi_request_fn() (in scsi_lib.c). When this function gets
a command, it calls scsi_allocate_device() to get one of the device's free
commandblocks. It then calls scsi_init_io_fn(), which tries to allocate some
memory for the write data. If this call fails (because the alloc fails), the
commandblock will be lost--it won't be used again, which effectively reduces
the SCSI device's queue size by 1.
The problem is that, when scsi_init_io_fn() fails, scsi_request_fn() is
setting SCpnt->request.special = SCpnt, but it should actually be setting req-
>special = SCpnt, because that's what was checked earlier in scsi_request_fn()
where it checks to see if a commandblock had already been reserved for the
request. SCpnt->request is a COPY of req, not a pointer to the req.
I'll attach a patch. I'm not sure if it's too late to get this into RHEL3 or
The only way I know of to actually reproduce this is to boot to RHEL3, run a
small python script that zeros out the first 60M and last 10M of a RAID volume
on a Dell SAS5/ir controller, and then do 'echo "scsi remove-single-device 0 0
0 0" >/proc/scsi/scsi' to remove that controller from the kernel (your OS
should be on a different controller).
This problem is not specific to that setup--I just haven't tried reproducing
it any other way.
I'll attach the script, too.
Created attachment 129615 [details]
patch to scsi_lib.c to fix leak
Created attachment 129619 [details]
script used to see problem
RHEL3 is now closed.
*** Bug 192446 has been marked as a duplicate of this bug. ***
With RHEL3 now in maintenance mode, where only critical customer issues
can be fixed, this bugzilla has been closed as wont fix due to its priority
level and a lack of recent activity.