Bug 138517 - (IT_53295) Memory Leak in AS 2.1 and 3.0 kernels
Memory Leak in AS 2.1 and 3.0 kernels
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
All Linux
medium Severity medium
: ---
: ---
Assigned To: Doug Ledford
Brian Brock
:
: 138941 (view as bug list)
Depends On:
Blocks: 143573
  Show dependency treegraph
 
Reported: 2004-11-09 13:56 EST by Peter Martuccelli
Modified: 2007-11-30 17:06 EST (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2005-529
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-08-25 09:29:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (949 bytes, patch)
2005-02-17 18:59 EST, Doug Ledford
no flags Details | Diff

  None (edit)
Description Peter Martuccelli 2004-11-09 13:56:09 EST
There is a memory leak in the scsi add-single-device path.
The leak drops queue_depth command blocks for each upper level driver
when a new device is added. This means around 16k lost every time a
new device is added.

In the hardcoded scan routine scsi_build_commandblocks is called once
for each upper level driver after the first that sets attached. This
is the main problem. A secondary problem is that
scsi_build_commandblocks unconditionally overwrites the
SDpnt->device_queue pointer so already allocated commands on that
list are simply lost.


Here is a patch to address the first issue:


scsi_scan_cmnd_leak.rhel.diff
-----------------------------------------
---
/home/osbuild/RHEL3.1/linux-2.4.21-9.ELsmp/drivers/scsi/scsi_scan.c	Tue
May  4 07:09:41 2004
+++ linux-2.4.21-9.ELsmp/drivers/scsi/scsi_scan.c	Fri Oct 29 12:42:23 2004
@@ -440,7 +440,7 @@
 			for (sdtpnt = scsi_devicelist; sdtpnt; sdtpnt = sdtpnt->next) {
 				if (sdtpnt->attach) {
 					(*sdtpnt->attach) (oldSDpnt);
-					if (oldSDpnt->attached) {
+					if (oldSDpnt->attached && !oldSDpnt->has_cmdblocks) {
 						scsi_build_commandblocks(oldSDpnt);
 						if (0 == oldSDpnt->has_cmdblocks) {
 							printk("scan_scsis: DANGER, no command blocks\n");


---------------------------------------

This is the minimal change. This fixes the current problem.



It may also be worth fixing scsi_buildcommand_blocks.
Below are two possible patches that do this in different ways.


scsi_cmnd_leak.rhel1.diff
---------------------------------------
--- /home/osbuild/RHEL3.1/linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c	Tue May
4 07:09:41 2004
+++ linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c	Fri Oct 29 12:40:25 2004
@@ -1475,6 +1475,9 @@
 	Scsi_Cmnd *SCpnt;
 	request_queue_t *q = &SDpnt->request_queue;

+        if (SDpnt->has_cmdblocks)
+                return;
+
 	/*
 	 * Init the spin lock that protects alloc/free of command blocks
 	 */

----------------------------------------



scsi_cmnd_leak.rhel2.diff
----------------------------------------
diff -ruN -X ../patches/dontdiff
/home/osbuild/RHEL3.1/linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c
linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c
--- /home/osbuild/RHEL3.1/linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c	Tue May
4 07:09:41 2004
+++ linux-2.4.21-9.ELsmp/drivers/scsi/scsi.c	Wed Oct 27 09:04:03 2004
@@ -1472,10 +1472,17 @@
 	unsigned long flags;
 	struct Scsi_Host *host = SDpnt->host;
 	int j;
-	Scsi_Cmnd *SCpnt;
+	Scsi_Cmnd *SCpnt , *SCnext;
 	request_queue_t *q = &SDpnt->request_queue;
-
-	/*
+
+        if (SDpnt->has_cmdblocks){
+                for (SCpnt = SDpnt->device_queue; SCpnt; SCpnt =
SCnext) {
+                        SDpnt->device_queue = SCnext = SCpnt->next;
+                        list_del(&SCpnt->sc_list);
+                        kfree((char *) SCpnt);
+                }
+        }
+        /*
 	 * Init the spin lock that protects alloc/free of command blocks
 	 */
 	spin_lock_init(&SDpnt->device_request_lock);


-----------------------------------------



These both have the same result (not leaking memory). The second can
take into account that the current number of commands is not correct
and build a correct number. The first assumes the commands are okay.

----------
Action by: smorin
Issue Registered
----------
Action by: mgahagan
Wendy, RHEL 3 U4?

What about 2.1?

They are reporting this with both 2.1 and 3 (both patches in the top
of the ticket)




----------
Action by: mgahagan
We already have a fix for this for RHEL 3 scheduled for U4, I have
included the RHEL 3 U4 patch below for reference.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/08/31 13:47:52-04:00 dledford compaq xsintricity com
#   drivers/scsi/scsi.c:
#    Don't re-init existing command blocks
#
# drivers/scsi/scsi.c
#   2004/08/31 13:47:51-04:00 dledford compaq xsintricity com +6 -0
#   Don't re-init existing command blocks
#
diff -Nru a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
--- a/drivers/scsi/scsi.c 2004-08-31 13:48:48 -04:00
+++ b/drivers/scsi/scsi.c 2004-08-31 13:48:48 -04:00
@@ -1477,6 +1477,12 @@
request_queue_t *q = &SDpnt->request_queue;

/*
+ * Only init things once.
+ */
+ if (SDpnt->has_cmdblocks)
+ return;
+
+ /*
* Init the spin lock that protects alloc/free of command blocks
*/
spin_lock_init(&SDpnt->device_request_lock);



----------
Action by: peterm
Escalated to Bugzilla
Comment 1 Peter Martuccelli 2004-12-08 16:56:22 EST

*** This bug has been marked as a duplicate of 131521 ***
Comment 3 Peter Martuccelli 2005-02-08 15:36:10 EST
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).

Comment 8 Doug Ledford 2005-02-17 18:55:06 EST
*** Bug 138941 has been marked as a duplicate of this bug. ***
Comment 9 Doug Ledford 2005-02-17 18:59:16 EST
Created attachment 111186 [details]
Proposed patch

The attached patch should solve the regression that was found in testing of the
original AS2.1 patch.  In case the people have forgotten, the original
regression was that multi-lun devices were no longer properly probed with the
original memleak fix patch.  Patch being submitted for internal review,
although testing will have to be done by someone else since I don't have any
multi-lun devices (I believe the original regression was found when testing on
a MegaRAID controller with multiple logical volumes).
Comment 10 Doug Ledford 2005-02-18 02:13:23 EST
Patch partially tested (I don't have a multilun setup, but I tested
that it works in non-multilun configurations) and submitted for review
and further testing.
Comment 11 Tom Coughlan 2005-02-28 09:46:41 EST
I applied the old patch and re-created the regression on a megaraid controller
(as reported in BZ 140454). Then I applied the new patch and verified that the
problem does not occur. This patch looks good to go. 
Comment 12 Jim Paradis 2005-03-02 22:54:07 EST
A fix for this problem has just been committed to the RHEL2.1 U7
patch pool this evening (in kernel version 2.4.9-61).
Comment 20 Jim Paradis 2005-04-26 17:46:05 EDT
Due to operator error, this patch in fact did *not* make it into the U7 kernel
build, so it is now slated for U8.
Comment 22 Jim Paradis 2005-05-23 17:28:15 EDT
I just verified that this patch *is* in the kernel source pool that will become U8.
Comment 24 Red Hat Bugzilla 2005-08-25 09:29:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-529.html

Note You need to log in before you can comment on or make changes to this bug.