Bug 119771

Summary: scsi_request_fn() leaks request structures
Product: Red Hat Enterprise Linux 3 Reporter: Justin T. Gibbs <gibbs>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: conway_heather, fhirtz, petrides
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-02 04:31:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 122950    

Description Justin T. Gibbs 2004-04-02 02:22:04 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1)
Gecko/20030618

Description of problem:
If the initialization of a command structure in
scsi_lib.c:scsi_request_fn() fails, the struct
request is dequeued, but not released, causing
a leak. 

Version-Release number of selected component (if applicable):
kernel-2.4.21-11.EL

How reproducible:
Always

Steps to Reproduce:
1. Queue up lots of I/O to a device
2. Offline that device
    

Actual Results:  Assuming enough I/O has been queued, the requests
for the queue may be exausted.  If you happen to
have processes sleeping waiting for requests to
free up on this device, they will sleep forever,
never noticing that the device has been offlined.

Expected Results:  Requests for failed transactions should be released.

Additional info:

Looking at the latest 2.4.X kernel it seems that
scsi_request_fn() has been cleaned up such that cleanup
on error is much simpler.  One fix is to re-sync with
the stock 2.4.X code.  The less risky fix is to add a
call to blkdev_release_request() near line 1222 after
the request is dequeued if the request is not embedded
in the SCSI command structure.

Comment 1 Doug Ledford 2004-04-02 13:58:21 UTC
You need to be more specific.  Are you referring to it failing to init
the command as part of SDpnt->scsi_init_io_fn(SCpnt) or as
STpnt->init_command(SCpnt)?  If the first, then the command is left on
the request queue, so freeing it is the wrong thing to do (this is
considered a temporary failure and we want to retry the command
later).  If it's the later, then the code I'm looking at calls
blkdev_dequeue_request(), scsi_release_buffers(), and
__end_scsi_request(), which should do what you want.

Comment 2 Justin T. Gibbs 2004-04-02 15:45:52 UTC
I am talking about the latter.  Neither scsi_release_buffers() nor
__end_scsi_request()
have the ability to release the incoming request since it is not
referenced in any way
by the scsi_command structure (it may only have a copy).  That is why
the request
must be freed by blkdev_release_request().  See similar code later in
the routine
that calls blkdev_release_request() explicitly if the request embedded
in the
scsi_command structure is only a copy.

Comment 5 Doug Ledford 2004-05-12 17:01:26 UTC
Justin, I added a call to blkdev_release_request() in the area you
were talking about.  This should be fixed now.  If you want to check
for yourself, the source is located in a bk tree on
linux-scsi.bkbits.net/rhel3-scsi-test

Comment 8 Doug Ledford 2004-05-13 06:21:52 UTC
Justin, someone pointed out to me that my last entry was vague.  I
should note that the change I referenced is *not* in RHEL3 U2, but
instead is a proposed fix for U3.  The BK tree I referenced was
originally populated with the U2 source code and then contains the
patches I've made so far in preparation for U3.  Contrary to previous
updates, I'm doing the work on the SCSI stack as a whole compilation
vs. individual patches in order to make sure some overlapping bugs
I've got all get solved properly and don't cause unintended
interactions.  For that reason, it's easier just to have people that
want to test and make sure their particular problem is solved pull
from the bk tree than anything else.

Comment 9 Ernie Petrides 2004-05-13 19:53:51 UTC
Thanks for the clarification, Doug.  I'm putting this bug back into
"assigned" state to reflect that a fix has not yet been committed to
the U3 kernel patch pool.


Comment 10 Ernie Petrides 2004-06-05 04:37:46 UTC
A fix for this problem has been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.6.EL).


Comment 11 Ernie Petrides 2004-08-05 23:20:13 UTC
*** Bug 129174 has been marked as a duplicate of this bug. ***

Comment 12 John Flanagan 2004-09-02 04:31:14 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html