Bug 811946 - dm-thinp: memory leak
dm-thinp: memory leak
Status: CLOSED DUPLICATE of bug 828955
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
x86_64 Linux
urgent Severity high
: rc
: ---
Assigned To: Mike Snitzer
Cluster QE
:
Depends On:
Blocks: 840683
  Show dependency treegraph
 
Reported: 2012-04-12 07:30 EDT by Heinz Mauelshagen
Modified: 2012-07-31 01:53 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-30 17:13:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reinstate mempool_free in cell_release_singleton (983 bytes, patch)
2012-04-12 12:19 EDT, Alasdair Kergon
no flags Details | Diff
dm thin: fix memory leak of singleton cell (1.13 KB, patch)
2012-04-12 16:01 EDT, Mike Snitzer
no flags Details | Diff

  None (edit)
Description Heinz Mauelshagen 2012-04-12 07:30:20 EDT
Description of problem:

After a run of the thinp-test-suite involving many thinp/snapshot configuration changes, lots of memory aren't freed.

Version-Release number of selected component (if applicable):


How reproducible:

Always by running thinp-test-suite


Steps to Reproduce:
1. Start test system and setup config.rb
2. Run test suite
3.
  
Actual results:
Almost 3GB leaked.

Expected results:
Memory freed.

Additional info:
Test system started with ~500MB used.
Extracted from top after test-suite run:
Mem:   6109712k total,  3480524k used,  2629188k free,    13896k buffer

slabtop:
8822550 8819429  99%    0.12K 294085       30   1176340K size-128
Ie. kfrees missing for respective kmallocs.
Comment 1 Zdenek Kabelac 2012-04-12 07:50:47 EDT
I'd suggest try to use  kmemleak (with linux debug kernel) - so you get pretty quickly to code path which leaks memory chunks.
Comment 2 Mike Snitzer 2012-04-12 08:23:05 EDT
The slab that has all the leaked memory is "size-128".

I was going to try kmemleak today... but Heinz you're welcome to take ownership of tracking this down (and thanks for opening this BZ).
Comment 3 Mike Snitzer 2012-04-12 08:25:45 EDT
Ah, comment#0 already pointed to slab-128...

Telltale sign of a leak is the memory is unreclaimable (from /proc/meminfo):

Slab:            2052224 kB
SReclaimable:     121952 kB
SUnreclaim:      1930272 kB
Comment 4 Mike Snitzer 2012-04-12 11:18:57 EDT
I am using a modified Linux 3.3 kernel with 3.4's DM changes plus refactoring that split the bio-prison code out to drivers/md/dm-bio-prison.c -- and all exported bio-prison.c symbols were exported via dm-mod.

I ran: ./run_tests -n /test_two_pools_can_create_thins/

which resulted in numerous kmemleak reports like the following:

unreferenced object 0xffff8802b1567f40 (size 128):
  comm "kworker/u:86", pid 651, jiffies 4295253235 (age 302.975s)
  hex dump (first 32 bytes):
    00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de  .......... .....
    00 80 3b f0 02 88 ff ff 01 00 00 00 03 88 ff ff  ..;.............
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff811459cf>] __kmalloc+0x13f/0x260
    [<ffffffff810fbe95>] mempool_kmalloc+0x15/0x20
    [<ffffffff810fc040>] mempool_alloc+0x60/0x170
    [<ffffffffa02f99be>] dm_bio_detain+0x10e/0x210 [dm_mod]
    [<ffffffffa06c85b0>] process_bio+0x60/0x170 [dm_thin_pool]
    [<ffffffffa06c8788>] process_deferred_bios+0xc8/0x230 [dm_thin_pool]
    [<ffffffffa06c8942>] do_worker+0x52/0x60 [dm_thin_pool]
    [<ffffffff81057092>] process_one_work+0x132/0x450
    [<ffffffff81058e9b>] worker_thread+0x17b/0x3c0
    [<ffffffff8105dd9e>] kthread+0x9e/0xb0
    [<ffffffff814c66a4>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff

dm_bio_detain+0x10e is drivers/md/dm-bio-prison.c:145, which is dm_bio_detain's:

cell2 = mempool_alloc(prison->cell_pool, GFP_NOIO);

prison->cell_pool was created in dm_prison_create() with:

prison->cell_pool = mempool_create_kmalloc_pool(nr_cells, sizeof(struct cell));

sizeof(struct cell) is 72 bytes:

crash> struct -o cell
struct cell {
   [0] struct hlist_node list;
  [16] struct bio_prison *prison;
  [24] struct cell_key key;
  [48] struct bio *holder;
  [56] struct bio_list bios;
}
SIZE: 72

I'm not yet seeing _why_ there is a leak... but I'll continue looking at the bio-prison code in this area.
Comment 5 Alasdair Kergon 2012-04-12 12:19:39 EDT
Created attachment 577128 [details]
reinstate mempool_free in cell_release_singleton

Patch to try attached.
Comment 6 Alasdair Kergon 2012-04-12 12:34:47 EDT
Also, can we consider mempool_create_slab_pool to keep memory like this segregated instead of kmalloc?
Comment 7 RHEL Product and Program Management 2012-04-12 12:40:14 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 8 Mike Snitzer 2012-04-12 15:57:24 EDT
(In reply to comment #5)
> Created attachment 577128 [details]
> reinstate mempool_free in cell_release_singleton
> 
> Patch to try attached.

prison is undefined.  I took a different approach.  Will attach.
Comment 9 Mike Snitzer 2012-04-12 16:01:54 EDT
Created attachment 577160 [details]
dm thin: fix memory leak of singleton cell

verified to eliminate leak of single cell in upstream Linux 3.4 -- this leak is _not_ a concern for RHEL6.3.
Comment 10 Mike Snitzer 2012-04-12 16:04:26 EDT
(In reply to comment #6)
> Also, can we consider mempool_create_slab_pool to keep memory like this
> segregated instead of kmalloc?

Yes, I have a patch I'll post to dm-devel.

If you'd like this conversion applied to RHEL6.3 it can easily be ported.
Comment 11 Mike Snitzer 2012-04-12 17:35:52 EDT
kmemleak found what it thinks is another leak (which appears relevant to RHEL6.3's thinp):

unreferenced object 0xffff88032ae2fb40 (size 32):
  comm "dmsetup", pid 5477, jiffies 4296842262 (age 6021.209s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 57 c3 12 03 88 ff ff  .........W......
    03 00 00 00 00 00 00 00 6d 2d 32 00 00 00 00 00  ........m-2.....
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200
    [<ffffffffa06ba125>] insert_shadow+0x35/0x90 [dm_persistent_data]
    [<ffffffffa06ba20e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa06b8758>] sm_ll_extend+0xc8/0x120 [dm_persistent_data]
    [<ffffffffa06b9dd3>] dm_sm_metadata_create+0x103/0x1e0 [dm_persistent_data]
    [<ffffffffa06ba593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data]
    [<ffffffffa06ba6db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data]
    [<ffffffffa0357aeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool]
    [<ffffffffa0357f4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool]
    [<ffffffffa03540a9>] pool_create+0x39/0x3e0 [dm_thin_pool]
    [<ffffffffa0356056>] pool_ctr+0x296/0x460 [dm_thin_pool]
    [<ffffffffa02fedde>] dm_table_add_target+0x13e/0x3b0 [dm_mod]
    [<ffffffffa03014f9>] table_load+0xc9/0x340 [dm_mod]
    [<ffffffffa0302874>] ctl_ioctl+0x1b4/0x270 [dm_mod]
    [<ffffffffa0302943>] dm_ctl_ioctl+0x13/0x20 [dm_mod]

There were only 3 of these leaked, so it is considerably more rare than the leak  from comment#4.

drivers/md/persistent-data/dm-transaction-manager.c:insert_shadow calls:
si = kmalloc(sizeof(*si), GFP_NOIO);

crash> struct -o shadow_info
struct shadow_info {
   [0] struct hlist_node hlist;
  [16] dm_block_t where;
}
SIZE: 24
Comment 12 Mike Snitzer 2012-04-12 18:27:09 EDT
The leak from comment#11 is induced by thinp-test-suite's 'test_mass_create_apply_remove_dtrandom_device' test.
Comment 13 Mike Snitzer 2012-04-12 18:43:58 EDT
(In reply to comment #12)
> The leak from comment#11 is induced by thinp-test-suite's
> 'test_mass_create_apply_remove_dtrandom_device' test.

But unfortunately, it isn't a reliable reproducer.
Comment 14 Alasdair Kergon 2012-04-12 18:50:48 EDT
I suggest moving this to a separate bugzilla to simplify tracking.
Comment 15 Mike Snitzer 2012-04-12 20:08:11 EDT
(In reply to comment #14)
> I suggest moving this to a separate bugzilla to simplify tracking.

Well, this BZ is suspect as is.  The original leak report is not relevant for RHEL6.3 -- the upstream Linux 3.4 changes aren't in RHEL6.3.

So the only thing this bug would be for (if the leak in comment#11 is moved to a new BZ) is the cleanup to use slab caches in dm-thin.c.

IMHO, that cleanup isn't even really needed given the late date of RHEL6.3 (and the fact that the cleanup doesn't actually fix anything).
Comment 16 Mike Snitzer 2012-04-13 11:04:17 EDT
Here is another stack trace for the leak associated with insert_shadow's allocation of 'struct shadow_info' (first reported in comment#11):

unreferenced object 0xffff88032d7b5880 (size 32):
  comm "dmsetup", pid 6324, jiffies 4296338651 (age 916.002s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 d8 67 4f fd 02 88 ff ff  .........gO.....
    02 00 00 00 00 00 00 00 6d 2d 30 00 ff ff ff ff  ........m-0.....
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200
    [<ffffffffa05d8125>] insert_shadow+0x35/0x90 [dm_persistent_data]
    [<ffffffffa05d820e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa05daeab>] new_block+0x1b/0x20 [dm_persistent_data]
    [<ffffffffa05d92e0>] dm_btree_empty+0x30/0xd0 [dm_persistent_data]
    [<ffffffffa05d60de>] sm_ll_new_metadata+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa05d7db0>] dm_sm_metadata_create+0xe0/0x1e0 [dm_persistent_data]
    [<ffffffffa05d8593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data]
    [<ffffffffa05d86db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data]
    [<ffffffffa054caeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool]
    [<ffffffffa054cf4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool]
    [<ffffffffa05490a9>] pool_create+0x39/0x3e0 [dm_thin_pool]
    [<ffffffffa054b056>] pool_ctr+0x296/0x460 [dm_thin_pool]
    [<ffffffffa02dddde>] dm_table_add_target+0x13e/0x3b0 [dm_mod]
    [<ffffffffa02e04f9>] table_load+0xc9/0x340 [dm_mod]

test_discard_random_sectors(DiscardTests): pool status metadata(90/1048576) data(0/262144)
Fri Apr 13 10:53:30 -0400 2012 wiping dev
entering discard loop
Apr 13 10:55:39 rhel-storage-01 kernel: kmemleak: 6 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

Bottomline, running the thinp-test-suite will cause kmemleak to record a leak associated with insert_shadow... but there isn't one specific test that reliably causes it.
Comment 17 Mike Snitzer 2012-04-13 11:13:35 EDT
Any leak fixes will be introduced when the upstream Linux 3.4 thinp code is introduced in RHEL6.4.  Pushing to 6.4.
Comment 19 Mike Snitzer 2012-06-20 14:11:50 EDT
The leak in comment#4 is fixed in this 6.4 patch via bug# 828955:
[RHEL6.4 PATCH 23/34] dm thin: fix stacked bi_next usage

The leak in comment#11 has been fixed with this patch:
http://www.redhat.com/archives/dm-devel/2012-June/msg00110.html

Leaving this BZ open for now, it could be used to commit the fix the leak from comment#11... or closed as a dup of a new BZ that will be opened to commit the 2nd push of DM fixes for RHEL6.4.
Comment 20 Mike Snitzer 2012-07-30 17:13:06 EDT
The fix for the leak in comment#11 was posted to rhkernel-list using bug#828955

*** This bug has been marked as a duplicate of bug 828955 ***

Note You need to log in before you can comment on or make changes to this bug.