Bug 131521 - [RHEL3 U4] scsi add-single-device memory leak
Summary: [RHEL3 U4] scsi add-single-device memory leak
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 123574
TreeView+ depends on / blocked
 
Reported: 2004-09-01 20:49 UTC by Frank Hirtz
Modified: 2007-11-30 22:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-12-20 20:56:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Suggested fix (767 bytes, patch)
2004-09-01 20:50 UTC, Frank Hirtz
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 05:00:00 UTC

Description Frank Hirtz 2004-09-01 20:49:46 UTC
Every time an 'echo "add-single-device 0 0 2 0" >/proc/scsi/scsi'
command is performed (on a hard drive, at least), roughly 16K of low
memory is lost.

The problem is in the scan_scsis function in drivers/scsi/scsi_scan.c.
 This function goes through a loop, attempting to attach the new
device to each of the scsi device drivers (generic, disk, tape, etc.),
and, each time through the loop, if the device is attached, the
function scsi_build_commandblocks is run.  If this function is called
when commandblocks are already allocated, the existing commandblocks
are lost (but not "kfree"d).

The solution is to call scsi_release_commandblocks before calling
scsi_build_commandblocks in this loop, if the device already had
commandblocks allocated to it.

Category set to: Kernel 	
wcheng 	Vaguely recall this was discussed in rhkernel-list... checking...

wcheng assigned to issue for Support Engineering Group. 	
wcheng 	No, it's not in rhkernel-list at all and the fix looks right
to me (check line #1494 and #1521 in drivers/scsi/scsi.c):

  1494         SDpnt->device_queue = NULL;
  1495
  1496         for (j = 0; j < SDpnt->queue_depth; j++) {
  1497                 SCpnt = (Scsi_Cmnd *)
  1498                     kmalloc(sizeof(Scsi_Cmnd),
  1499                                      GFP_ATOMIC |
  1500                                 (host->unchecked_isa_dma ?
GFP_DMA : 0))        ;
  1501                 if (NULL == SCpnt)
  1502                         break;  /* If not, the next line will
oops ... *        /
  1503                 memset(SCpnt, 0, sizeof(Scsi_Cmnd));
  .................................
  1521                 SDpnt->device_queue = SCpnt;
  1522                 SCpnt->state = SCSI_STATE_UNUSED;
  1523                 SCpnt->owner = SCSI_OWNER_NOBODY;
  1524                 list_add(&SCpnt->sc_list, &SDpnt->sdev_free_q);
  1525         }




Issue escalated to Sustaining Engineering by: wcheng
. 	
jneedle 	jneedle assigned to issue for Sustaining Engineering. 	
fhirtz 	Summary edited. 	
faith 	What makes this a Sev1?  I'm changing the internal priority to
2.  Add a justification and change it back if I'm missing something.

Priority set to: 2 	
fhirtz 	It's a Sev 1 since Dell has this listed as a blocker for them.
It's a one line change which should be fairly straightforward to assess. 	
fhirtz 	Have we heard anything on this? It's been a couple of weeks in
the system with no response of substance. 	
fhirtz 	Priority set to: 1 	
wcheng 	Jeff, if you're busy, I'll post this to rhkernel-list for
review ? 	
wcheng 	Frank, could you ask the customer how they decided it was a
memory leak ? Just by code review ? I did a careful testing last night
and found the scsi_build_commandblocks() (#478) is required since
scan_scsis_single() replaces the SDpnt field (#460 and #461 in
scsi_scan.c) as in our newest kernel base (linux-2.4.21-15.19.EL) and
it has correctly released the command block before returning back to
scan_scsis().
 
Unless I miss somthing, I don't think this patch has chances to get
accepted.

   371  *  Detecting SCSI devices :
   372  *  We scan all present host adapter's busses,  from ID 0 to ID
(max_id).
   373  *  We use the INQUIRY command, determine device type, and pass
the ID /
   374  *  lun address of all sequential devices to the tape driver,
all random
   375  *  devices to the disk driver.
   376  */
   377 void scan_scsis(struct Scsi_Host *shpnt,
   378                        uint hardcoded,
   379                        uint hchannel,
   380                        uint hid,
   381                        uint hlun)
   382 {
                          ..................................
   458                 lun0_sl = find_lun0_scsi_level(channel, dev,
shpnt);
   459                 scan_scsis_single(channel, dev, lun, lun0_sl,
&max_dev_lun,
   460                                   &sparse_lun, &SDpnt, shpnt,
scsi_result);
   461                 if (SDpnt != oldSDpnt) {
   462
                                  ...................................
   474                         for (sdtpnt = scsi_devicelist; sdtpnt;
sdtpnt = sdtpnt->next) {
   475                                 if (sdtpnt->attach) {
   476                                         (*sdtpnt->attach)
(oldSDpnt);
   477                                         if (oldSDpnt->attached)  {
   478                                                
scsi_build_commandblocks(oldSDpnt);
   479                                                 if (0 ==
oldSDpnt->has_cmdblocks) {
   480                                                        
printk("scan_scsis: DANGER, no command blocks\n"); 	
fhirtz 	How did you detemine that this is a memory leak? By code
review? We did testing last night and found the
scsi_build_commandblocks() (#478) is required since
scan_scsis_single() replaces the SDpnt field (#460 and #461 in
scsi_scan.c) as in our newest kernel base (linux-2.4.21-15.19.EL) and
it has correctly released the command block before returning back to
scan_scsis().

  371  *  Detecting SCSI devices :
  372  *  We scan all present host adapter's busses,  from ID 0 to ID
(max_id).
  373  *  We use the INQUIRY command, determine device type, and pass
the ID /
  374  *  lun address of all sequential devices to the tape driver,
all random
  375  *  devices to the disk driver.
  376  */
  377 void scan_scsis(struct Scsi_Host *shpnt,
  378                        uint hardcoded,
  379                        uint hchannel,
  380                        uint hid,
  381                        uint hlun)
  382 {
                         ..................................
  458                 lun0_sl = find_lun0_scsi_level(channel, dev, shpnt);
  459                 scan_scsis_single(channel, dev, lun, lun0_sl,
&max_dev_lun,
  460                                   &sparse_lun, &SDpnt, shpnt,
scsi_result);
  461                 if (SDpnt != oldSDpnt) {
  462
                                 ...................................
  474                         for (sdtpnt = scsi_devicelist; sdtpnt;
sdtpnt = sdtpnt->next) {
  475                                 if (sdtpnt->attach) {
  476                                         (*sdtpnt->attach)
(oldSDpnt);
  477                                         if (oldSDpnt->attached)  {
  478                                                
scsi_build_commandblocks(oldSDpnt);
  479                                                 if (0 ==
oldSDpnt->has_cmdblocks) {
  480                                                        
printk("scan_scsis: DANGER, no command blocks\n");

Status set to: Waiting on Client 	
stuart_hayes 	I found it through testing... someone at Dell decided to
run a script that added and removed drives to test a system.  They
found a different problem (a lockup that got fixed in update 3), but,
in the process of testing the patch I had for the lockup, I ran a
script to add & remove drives continuously.  I found that it
eventually ran out of memory.  This is what I tracked it down
to--after fixing this, the free memory remained constant when I ran
the test script... before this patch, one could watch the free memory
decrease steadily as the script ran--until it ran out of low memory
altogether, if it ran that long.  Stopping the test did not recover
the lost memory.

I'll trace through the code to see if I can point out the exact path
that's causing the memory to be lost.

Did you see failures with the patch, or did you just do a code review?



Status set to: Waiting on Tech 	
wcheng 	My test script looped in add-device overnight and the system
with the suggested patch actually depletes the memory faster than the
one without the patch - so I went in to carefully examine the code and
drew that conclusion. What I didn't do (it didn't mention before) was
to delete the device. 	
wcheng 	I'll modify the test script and re-run tonight. 	
stuart_hayes 	Oh, I understand what you are saying now.  The reason we
need the test for "oldSDpnt->has_cmdblocks" is that the loop at line
474 will sometimes cause scsi_build_commandblocks() to be called
multiple times on oldSDpnt.  The patch will allow
scsi_build_commandblocks() to be called the first time through the
loop, but prevents it being called on subsequent iterations.  If you
call scsi_build_commandblocks() twice in a row like that without first
doing a scsi_release_commandblocks(), you will lose the memory from
the existing set of commandblocks.

There were two options:  One is to do what the patch does, and not
call scsi_build_commandblocks() again if it has already been called. 
The other is to first release the command blocks before calling
scsi_build_commandblocks() again if has_cmdblocks shows that we had
already called scsi_build_commandblocks() on that scsi device.  I
can't remember why I decided on the former... I guess I should have
put a comment in there.  I think the queue size was set correctly the
first time through the for loop, and if we released and re-built the
command blocks it would end up with an incorrect number of commmand
blocks.  You might want to run that one by your SCSI expert, though,
to get a second opinion on that.

Thanks
Stuart 	
wcheng 	I still don't see the memory leak in the particular call path
:)  - but I'll further test it out tonight and discuss this with our
kernel folks. Note that the command block malloced in
scsi_build_commandblocks() is backward-chained into SDpnt->sdev_free_q. 	
wcheng 	Sorry I missed the line - #1478 will screw up this backward
chain. Maybe you're right - I'll test it out again ....

  1471 void scsi_build_commandblocks(Scsi_Device * SDpnt)
  1472 {
  1473         unsigned long flags;
  1474         struct Scsi_Host *host = SDpnt->host;
  1475         int j;
  1476         Scsi_Cmnd *SCpnt;
  1477         request_queue_t *q = &SDpnt->request_queue;
  1478
  1479         /*
  1480          * Init the spin lock that protects alloc/free of
command blocks
  1481          */
  1482         spin_lock_init(&SDpnt->device_request_lock);
  1483
  1484         /*
  1485          * Init the list we keep free commands on
  1486          */
  1487         INIT_LIST_HEAD(&SDpnt->sdev_free_q); 	
wcheng 	Just figure this out - it is a memory leak. I moved the
scsi_build_commandblocks() out of previous loop to be less confusing.
Waiting for rhkernel-list's acks.

File uploaded: scsi-leak.patch 	
stuart_hayes 	If you go through the loop at line 474 (above), it will
effectively call scsi_build_commandblocks() twice in a row on the same
SCSI device (oldSDpnt).  And scsi_build_commandblocks is effectively
the following code--oldSDpnt->sdev_free_q gets wiped out the second
time you call scsi_build_commandblocks, rather than getting added on to.

 {
   INIT_LIST_HEAD(&oldSDpnt->sdev_free_q);
   ...
   for (j = 0; j < SDpnt->queue_depth; j++) {
     SCpnt = (Scsi_Cmnd *) kmalloc (...)
     ...
     list_add(&SCpnt->sc_list, &SDpnt->sdev_free_q);
   }
   ...
   SDpnt->has_commandblocks = 1;
 }

And I think I remember now why I did the patch the way I did (I said
above that I couldn't remember)--I think it was this:  The device
structure's queue_depth, which affects how many commandblocks were
created, is wiped out when you do a scsi_release_commandblocks, so
doing a release and then a build could result in the wrong number of
commandblocks being created.  So it is really only important that you
just do the scsi_build_commandblocks once and only once--if you do it
more than once, you'll leak memory, and if you do a release and then a
build again, you might get the wrong queue_depth.
Stuart

Comment 1 Frank Hirtz 2004-09-01 20:50:52 UTC
Created attachment 103373 [details]
Suggested fix

Comment 2 Ernie Petrides 2004-09-10 01:04:58 UTC
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).


Comment 3 Peter Martuccelli 2004-12-08 21:56:40 UTC
*** Bug 138517 has been marked as a duplicate of this bug. ***

Comment 4 John Flanagan 2004-12-20 20:56:05 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html



Note You need to log in before you can comment on or make changes to this bug.