Bug 114939 - kernel panic in scsi_resize_dma_pool due to memory leak
kernel panic in scsi_resize_dma_pool due to memory leak
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Doug Ledford
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-02-04 12:56 EST by Martin Peschke
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-02 00:31:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
debug patch 1 (1.61 KB, patch)
2004-02-04 18:36 EST, Martin Peschke
no flags Details | Diff
debug data for each scsi_free() / scsi_malloc() (11.41 KB, application/x-bzip2)
2004-02-09 20:01 EST, Martin Peschke
no flags Details
Make retried commands have their own queue (6.31 KB, patch)
2004-06-08 11:52 EDT, Doug Ledford
no flags Details | Diff

  None (edit)
Description Martin Peschke 2004-02-04 12:56:54 EST
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
almost 100 % (did a 3 times in a row)

Steps to Reproduce:
load zfcp, scsi_mod, sd_mod,
configure 40 LUNs on a single adapter via add-single-device,
start up some error injections (warmstarts) in SCSI target device (IBM
ESS), which will cause bursts of QUEUE_FULLs
let disk exercisers (blast - IBM internal tool) run for a while (1-2
hours should do),
stop exercisers, umount filesystems and unload sd_mod,
unload zfcp -> kernel panic in scsi_mod::scsi_resize_dma_pool():
"SCSI DMA pool memory leak 20585 21808"
Comment 1 Martin Peschke 2004-02-04 16:10:30 EST
I have got a Linux dump for that panic, which means that someone could
try to go through the scsi dma pool and identify the pieces that have
not been scsi_free()-ed. Would this be of any help? The crucial point
is how to guess what these pieces have been used for in order to make
any assumption about possible reasons for a leak. Or is this simply a
misaccounting of resources? I will run another test with counters for
the number of calls to scsi_malloc() and scsi_free().
Comment 2 Martin Peschke 2004-02-04 18:36:36 EST
Created attachment 97477 [details]
debug patch 1

I ran another time with the attached debug patch, which counts calls to scsi
dma pool routines. Result:
panic 21802 21808 0 726523 726517
      |     |	  | |	   |
      |     |	  | |	   + number of calls to scsi_free()
      |     |	  | + number of successfull calls to scsi_malloc() -> got
buffer
      |     |	  + number of failed calls to scsi_malloc() -> got no buffer
      |     + dma_sectors
      +scsi_dma_free_sectors

I am writing another debug patch to find out which caller gets how many buffers
of which size, and how many of these buffers are returned. Meanwhile we will
rerun the above test.
Comment 3 Martin Peschke 2004-02-06 11:24:52 EST
Might be a duplicate of 114941 .. I will follow up on that.
Comment 4 Martin Peschke 2004-02-09 20:01:52 EST
Created attachment 97546 [details]
debug data for each scsi_free() / scsi_malloc()

Seems to be a new problem, no duplicate.

I have attached a patch that modifies all scsi_free() and scsi_malloc() calls
in a way that allows to count how often memory is allocated or released
repectively by each caller of these routines.

Result of today's test run is that scsi_io_completion() in scsi_merge.c (sg_use
case) has released 111 scatter-gather lists of 512 bytes each, and 45
scatter-gather lists of 1024 bytes each less than __init_io() in scsi_lib.c had
allocated.

We had to inject some errors into our SCSI target device which usually cause
bursts of QUEUE_FULLs or timed out commands.

Please let me know if we can help with other data.
Comment 5 Martin Peschke 2004-02-09 22:16:29 EST
Seems as if I can't change the bug'd status from NEEDINFO back to
ASSIGNED. Could anybody care for it, please?
Comment 6 Martin Peschke 2004-02-25 19:10:23 EST
This might be unrelated to the initial situation. But it has just
occured to me when reading scsi_dma.c in an attempt to debug the
original problem ...

I think the following (RedHat specific) snippet from scsi_malloc() is
bogus. The comparison of lowest_known_hole and dma_sectors is a
comparison of apples and oranges, isn't it? The former refers to
pages, the latter to sectors, or one eighth of a page:

while ((lowest_known_hole < dma_sectors) &&
(dma_malloc_freelist[lowest_known_hole] == (1<<SECTORS_PER_PAGE)-1))
                lowest_known_hole++;

I guess, the authors intention must have been the following:

while ((lowest_known_hole < dma_sectors / SECTORS_PER_PAGE) &&
       (dma_malloc_freelist[lowest_known_hole] ==
(1<<SECTORS_PER_PAGE)-1))
        lowest_known_hole++;

Except for accessing dma_malloc_freelist read-only beyond its upper
bound, I see no real harm caused, like memory corruption. The expected
performance gain associated with lowest_known_hole might just be thwarted.

Pete, Doug, what is your opinion?
Comment 7 Martin Peschke 2004-02-26 10:17:08 EST
Ok, re-running with the change proposed in my previous comment I see
the following after unloading scsi modules:

Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (131)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (125)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (684)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (165)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (171)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (110)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (129)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (219)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (165)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (158)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (129)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (159)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (243)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (189)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (144)
Feb 26 00:59:10 linux5 kernel: blk_cleanup_queue: leaked requests (166)
Feb 26 00:59:10 linux5 kernel: scsi : 0 hosts left.

This was preceded by lots of SCSI I/O errors which caused our disk
exercisers to abend tests on all SCSI devices. These I/O errors were
caused by error injections into our target devices, which made for a
somewhat different workload than previously exercised. However, I
don't see why this should cause this another leak.

I really wish someone at RedHat would comment on this bugzilla.
Comment 8 Martin Peschke 2004-02-26 14:01:12 EST
Earlier messages reveal that the I/O errors where caused by a failed
target device port:

Feb 25 14:54:17 linux5 kernel: zfcp: ERP: zfcp_erp_port_failed: Port
recovery failed on the port with WWPN 0x5005076300c2a3c4 at the
adapter with devno 0x4900.
Feb 25 14:54:22 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 0
Feb 25 14:54:22 linux5 last message repeated 31 times
Feb 25 14:54:22 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 1
Feb 25 14:54:22 linux5 last message repeated 17 times
Feb 25 14:54:22 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 2
Feb 25 14:54:22 linux5 last message repeated 31 times
Feb 25 14:54:22 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 3
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 4
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 5
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 6
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 7
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 8
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 9
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 10
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 11
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 12
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 13
Feb 25 14:54:28 linux5 last message repeated 31 times
Feb 25 14:54:28 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 14
Feb 25 14:54:29 linux5 last message repeated 31 times
Feb 25 14:54:29 linux5 kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 1 lun 15
Feb 25 14:54:29 linux5 last message repeated 31 times

And there was no dma pool memory leak and no panic this time, for
whatever reason:

Feb 26 04:22:03 linux5 kernel: MP3 scsi_resize_dma_pool: enter
Feb 26 04:22:03 linux5 kernel: MP3 scsi_resize_dma_pool: host list empty
Feb 26 04:22:03 linux5 kernel: SCSI DMA pool 0 0 0 1623361 1623361
Feb 26 04:22:03 linux5 kernel: hits/missed: 3246722/0 hits
Feb 26 04:22:03 linux5 kernel: Caller 2 released chunk size 0x00000200
0x000001b2 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 2 released chunk size 0x00000400
0x00000021 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 3 allocated chunk size
0x00000200 0x0017be05 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 3 allocated chunk size
0x00000400 0x000106fd time(s).
Feb 26 04:22:03 linux5 kernel: Caller 3 allocated chunk size
0x00000600 0x00000013 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 3 allocated chunk size
0x00000800 0x0000000c time(s).
Feb 26 04:22:03 linux5 kernel: Caller 5 allocated chunk size
0x00000200 0x00000020 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 5 released chunk size 0x00000200
0x0017bc53 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 5 released chunk size 0x00000400
0x000106dc time(s).
Feb 26 04:22:03 linux5 kernel: Caller 5 released chunk size 0x00000600
0x00000013 time(s).
Feb 26 04:22:03 linux5 kernel: Caller 5 released chunk size 0x00000800
0x0000000c time(s).
Feb 26 04:22:03 linux5 kernel: Caller 8 released chunk size 0x00000200
0x00000020 time(s).
Comment 9 Doug Ledford 2004-05-12 19:20:22 EDT
The fix to bug 119771 might very well be the cause of this.  Please
retest with the bk tree referenced in our email discussion and let me
know if this problem still exists.
Comment 11 Ernie Petrides 2004-05-12 21:40:24 EDT
Doug, if this problem is deemed fixed in U2, please refer me to the
specific patch tracking file or changelog entry corresponding to the
fix.  I should also point out that the official U2 kernel was pushed
via RHN today, so any RHEL customer can now get it (2.4.21-15.EL).

Cheers.  -ernie
Comment 12 Doug Ledford 2004-05-13 02:07:14 EDT
No, it's not fixed in U2.  The tree I was referring to was initially
populated with the U2 source code and contains additional fixes that
are not yet posted.
Comment 14 Doug Ledford 2004-06-08 11:52:39 EDT
Created attachment 100968 [details]
Make retried commands have their own queue

The cause of the scatter gather memory leak has been found.  A change was
needed in the core scsi code for a different issue, and during investigation
the actual cause of this leak was also uncovered.  The patch is being submitted
for the U3 kernel pool.

Basic description: if a device gets a QUEUE_FULL or BUSY return and the driver
passes that back up to the SCSI mid layer (most older drivers don't), then the
command gets requeued on the block request queue so it can be retried.	When
scsi_request_fn() pulls it off the queue, it has to re-init the command because
a merge might have taken place while it was one the queue.  The init_io()
function doesn't check to see if there is an already allocated scatter gather
buffer before allocating a new one.  So, when it sets the scatter gather list
pointer in the command, it overwrites the old pointer and that allocation is
then lost.

Resolution: We already have a struct list_head in each command.  This is only
used to thread the commands onto a free command list so we don't have to search
the entire list of commands looking for a free one.  With my patch, whenever we
requeue a command, utilize the struct list_head to thread the command onto a
dedicated retry queue.	Then we modify the scsi_request_fn() to check this
retry queue before the block layer queue when sending commands to a device. 
Any command(s) on the retry queue go first.  Since they never made it back to
the request queue, we don't have to reinit the command, so we don't leak any
scatter gather buffers.  This also solves the issue of new commands possibly
being sent before retried commands which isn't a big deal, but was something I
wanted to address in a later version of the mid layer queue changes.
Comment 15 Ernie Petrides 2004-06-22 07:14:09 EDT
A fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.15.EL).
Comment 16 John Flanagan 2004-09-02 00:31:04 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html

Note You need to log in before you can comment on or make changes to this bug.