Every time an 'echo "add-single-device 0 0 2 0" >/proc/scsi/scsi' command is performed (on a hard drive, at least), roughly 16K of low memory is lost. The problem is in the scan_scsis function in drivers/scsi/scsi_scan.c. This function goes through a loop, attempting to attach the new device to each of the scsi device drivers (generic, disk, tape, etc.), and, each time through the loop, if the device is attached, the function scsi_build_commandblocks is run. If this function is called when commandblocks are already allocated, the existing commandblocks are lost (but not "kfree"d). The solution is to call scsi_release_commandblocks before calling scsi_build_commandblocks in this loop, if the device already had commandblocks allocated to it. Category set to: Kernel wcheng Vaguely recall this was discussed in rhkernel-list... checking... wcheng assigned to issue for Support Engineering Group. wcheng No, it's not in rhkernel-list at all and the fix looks right to me (check line #1494 and #1521 in drivers/scsi/scsi.c): 1494 SDpnt->device_queue = NULL; 1495 1496 for (j = 0; j < SDpnt->queue_depth; j++) { 1497 SCpnt = (Scsi_Cmnd *) 1498 kmalloc(sizeof(Scsi_Cmnd), 1499 GFP_ATOMIC | 1500 (host->unchecked_isa_dma ? GFP_DMA : 0)) ; 1501 if (NULL == SCpnt) 1502 break; /* If not, the next line will oops ... * / 1503 memset(SCpnt, 0, sizeof(Scsi_Cmnd)); ................................. 1521 SDpnt->device_queue = SCpnt; 1522 SCpnt->state = SCSI_STATE_UNUSED; 1523 SCpnt->owner = SCSI_OWNER_NOBODY; 1524 list_add(&SCpnt->sc_list, &SDpnt->sdev_free_q); 1525 } Issue escalated to Sustaining Engineering by: wcheng . jneedle jneedle assigned to issue for Sustaining Engineering. fhirtz Summary edited. faith What makes this a Sev1? I'm changing the internal priority to 2. Add a justification and change it back if I'm missing something. Priority set to: 2 fhirtz It's a Sev 1 since Dell has this listed as a blocker for them. It's a one line change which should be fairly straightforward to assess. fhirtz Have we heard anything on this? It's been a couple of weeks in the system with no response of substance. fhirtz Priority set to: 1 wcheng Jeff, if you're busy, I'll post this to rhkernel-list for review ? wcheng Frank, could you ask the customer how they decided it was a memory leak ? Just by code review ? I did a careful testing last night and found the scsi_build_commandblocks() (#478) is required since scan_scsis_single() replaces the SDpnt field (#460 and #461 in scsi_scan.c) as in our newest kernel base (linux-2.4.21-15.19.EL) and it has correctly released the command block before returning back to scan_scsis(). Unless I miss somthing, I don't think this patch has chances to get accepted. 371 * Detecting SCSI devices : 372 * We scan all present host adapter's busses, from ID 0 to ID (max_id). 373 * We use the INQUIRY command, determine device type, and pass the ID / 374 * lun address of all sequential devices to the tape driver, all random 375 * devices to the disk driver. 376 */ 377 void scan_scsis(struct Scsi_Host *shpnt, 378 uint hardcoded, 379 uint hchannel, 380 uint hid, 381 uint hlun) 382 { .................................. 458 lun0_sl = find_lun0_scsi_level(channel, dev, shpnt); 459 scan_scsis_single(channel, dev, lun, lun0_sl, &max_dev_lun, 460 &sparse_lun, &SDpnt, shpnt, scsi_result); 461 if (SDpnt != oldSDpnt) { 462 ................................... 474 for (sdtpnt = scsi_devicelist; sdtpnt; sdtpnt = sdtpnt->next) { 475 if (sdtpnt->attach) { 476 (*sdtpnt->attach) (oldSDpnt); 477 if (oldSDpnt->attached) { 478 scsi_build_commandblocks(oldSDpnt); 479 if (0 == oldSDpnt->has_cmdblocks) { 480 printk("scan_scsis: DANGER, no command blocks\n"); fhirtz How did you detemine that this is a memory leak? By code review? We did testing last night and found the scsi_build_commandblocks() (#478) is required since scan_scsis_single() replaces the SDpnt field (#460 and #461 in scsi_scan.c) as in our newest kernel base (linux-2.4.21-15.19.EL) and it has correctly released the command block before returning back to scan_scsis(). 371 * Detecting SCSI devices : 372 * We scan all present host adapter's busses, from ID 0 to ID (max_id). 373 * We use the INQUIRY command, determine device type, and pass the ID / 374 * lun address of all sequential devices to the tape driver, all random 375 * devices to the disk driver. 376 */ 377 void scan_scsis(struct Scsi_Host *shpnt, 378 uint hardcoded, 379 uint hchannel, 380 uint hid, 381 uint hlun) 382 { .................................. 458 lun0_sl = find_lun0_scsi_level(channel, dev, shpnt); 459 scan_scsis_single(channel, dev, lun, lun0_sl, &max_dev_lun, 460 &sparse_lun, &SDpnt, shpnt, scsi_result); 461 if (SDpnt != oldSDpnt) { 462 ................................... 474 for (sdtpnt = scsi_devicelist; sdtpnt; sdtpnt = sdtpnt->next) { 475 if (sdtpnt->attach) { 476 (*sdtpnt->attach) (oldSDpnt); 477 if (oldSDpnt->attached) { 478 scsi_build_commandblocks(oldSDpnt); 479 if (0 == oldSDpnt->has_cmdblocks) { 480 printk("scan_scsis: DANGER, no command blocks\n"); Status set to: Waiting on Client stuart_hayes I found it through testing... someone at Dell decided to run a script that added and removed drives to test a system. They found a different problem (a lockup that got fixed in update 3), but, in the process of testing the patch I had for the lockup, I ran a script to add & remove drives continuously. I found that it eventually ran out of memory. This is what I tracked it down to--after fixing this, the free memory remained constant when I ran the test script... before this patch, one could watch the free memory decrease steadily as the script ran--until it ran out of low memory altogether, if it ran that long. Stopping the test did not recover the lost memory. I'll trace through the code to see if I can point out the exact path that's causing the memory to be lost. Did you see failures with the patch, or did you just do a code review? Status set to: Waiting on Tech wcheng My test script looped in add-device overnight and the system with the suggested patch actually depletes the memory faster than the one without the patch - so I went in to carefully examine the code and drew that conclusion. What I didn't do (it didn't mention before) was to delete the device. wcheng I'll modify the test script and re-run tonight. stuart_hayes Oh, I understand what you are saying now. The reason we need the test for "oldSDpnt->has_cmdblocks" is that the loop at line 474 will sometimes cause scsi_build_commandblocks() to be called multiple times on oldSDpnt. The patch will allow scsi_build_commandblocks() to be called the first time through the loop, but prevents it being called on subsequent iterations. If you call scsi_build_commandblocks() twice in a row like that without first doing a scsi_release_commandblocks(), you will lose the memory from the existing set of commandblocks. There were two options: One is to do what the patch does, and not call scsi_build_commandblocks() again if it has already been called. The other is to first release the command blocks before calling scsi_build_commandblocks() again if has_cmdblocks shows that we had already called scsi_build_commandblocks() on that scsi device. I can't remember why I decided on the former... I guess I should have put a comment in there. I think the queue size was set correctly the first time through the for loop, and if we released and re-built the command blocks it would end up with an incorrect number of commmand blocks. You might want to run that one by your SCSI expert, though, to get a second opinion on that. Thanks Stuart wcheng I still don't see the memory leak in the particular call path :) - but I'll further test it out tonight and discuss this with our kernel folks. Note that the command block malloced in scsi_build_commandblocks() is backward-chained into SDpnt->sdev_free_q. wcheng Sorry I missed the line - #1478 will screw up this backward chain. Maybe you're right - I'll test it out again .... 1471 void scsi_build_commandblocks(Scsi_Device * SDpnt) 1472 { 1473 unsigned long flags; 1474 struct Scsi_Host *host = SDpnt->host; 1475 int j; 1476 Scsi_Cmnd *SCpnt; 1477 request_queue_t *q = &SDpnt->request_queue; 1478 1479 /* 1480 * Init the spin lock that protects alloc/free of command blocks 1481 */ 1482 spin_lock_init(&SDpnt->device_request_lock); 1483 1484 /* 1485 * Init the list we keep free commands on 1486 */ 1487 INIT_LIST_HEAD(&SDpnt->sdev_free_q); wcheng Just figure this out - it is a memory leak. I moved the scsi_build_commandblocks() out of previous loop to be less confusing. Waiting for rhkernel-list's acks. File uploaded: scsi-leak.patch stuart_hayes If you go through the loop at line 474 (above), it will effectively call scsi_build_commandblocks() twice in a row on the same SCSI device (oldSDpnt). And scsi_build_commandblocks is effectively the following code--oldSDpnt->sdev_free_q gets wiped out the second time you call scsi_build_commandblocks, rather than getting added on to. { INIT_LIST_HEAD(&oldSDpnt->sdev_free_q); ... for (j = 0; j < SDpnt->queue_depth; j++) { SCpnt = (Scsi_Cmnd *) kmalloc (...) ... list_add(&SCpnt->sc_list, &SDpnt->sdev_free_q); } ... SDpnt->has_commandblocks = 1; } And I think I remember now why I did the patch the way I did (I said above that I couldn't remember)--I think it was this: The device structure's queue_depth, which affects how many commandblocks were created, is wiped out when you do a scsi_release_commandblocks, so doing a release and then a build could result in the wrong number of commandblocks being created. So it is really only important that you just do the scsi_build_commandblocks once and only once--if you do it more than once, you'll leak memory, and if you do a release and then a build again, you might get the wrong queue_depth. Stuart
Created attachment 103373 [details] Suggested fix
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.4.EL).
*** Bug 138517 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html