RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 811946 - dm-thinp: memory leak
Summary: dm-thinp: memory leak
Keywords:
Status: CLOSED DUPLICATE of bug 828955
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.4
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Mike Snitzer
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 840683
TreeView+ depends on / blocked
 
Reported: 2012-04-12 11:30 UTC by Heinz Mauelshagen
Modified: 2012-07-31 05:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-30 21:13:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reinstate mempool_free in cell_release_singleton (983 bytes, patch)
2012-04-12 16:19 UTC, Alasdair Kergon
no flags Details | Diff
dm thin: fix memory leak of singleton cell (1.13 KB, patch)
2012-04-12 20:01 UTC, Mike Snitzer
no flags Details | Diff

Description Heinz Mauelshagen 2012-04-12 11:30:20 UTC
Description of problem:

After a run of the thinp-test-suite involving many thinp/snapshot configuration changes, lots of memory aren't freed.

Version-Release number of selected component (if applicable):


How reproducible:

Always by running thinp-test-suite


Steps to Reproduce:
1. Start test system and setup config.rb
2. Run test suite
3.
  
Actual results:
Almost 3GB leaked.

Expected results:
Memory freed.

Additional info:
Test system started with ~500MB used.
Extracted from top after test-suite run:
Mem:   6109712k total,  3480524k used,  2629188k free,    13896k buffer

slabtop:
8822550 8819429  99%    0.12K 294085       30   1176340K size-128
Ie. kfrees missing for respective kmallocs.

Comment 1 Zdenek Kabelac 2012-04-12 11:50:47 UTC
I'd suggest try to use  kmemleak (with linux debug kernel) - so you get pretty quickly to code path which leaks memory chunks.

Comment 2 Mike Snitzer 2012-04-12 12:23:05 UTC
The slab that has all the leaked memory is "size-128".

I was going to try kmemleak today... but Heinz you're welcome to take ownership of tracking this down (and thanks for opening this BZ).

Comment 3 Mike Snitzer 2012-04-12 12:25:45 UTC
Ah, comment#0 already pointed to slab-128...

Telltale sign of a leak is the memory is unreclaimable (from /proc/meminfo):

Slab:            2052224 kB
SReclaimable:     121952 kB
SUnreclaim:      1930272 kB

Comment 4 Mike Snitzer 2012-04-12 15:18:57 UTC
I am using a modified Linux 3.3 kernel with 3.4's DM changes plus refactoring that split the bio-prison code out to drivers/md/dm-bio-prison.c -- and all exported bio-prison.c symbols were exported via dm-mod.

I ran: ./run_tests -n /test_two_pools_can_create_thins/

which resulted in numerous kmemleak reports like the following:

unreferenced object 0xffff8802b1567f40 (size 128):
  comm "kworker/u:86", pid 651, jiffies 4295253235 (age 302.975s)
  hex dump (first 32 bytes):
    00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de  .......... .....
    00 80 3b f0 02 88 ff ff 01 00 00 00 03 88 ff ff  ..;.............
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff811459cf>] __kmalloc+0x13f/0x260
    [<ffffffff810fbe95>] mempool_kmalloc+0x15/0x20
    [<ffffffff810fc040>] mempool_alloc+0x60/0x170
    [<ffffffffa02f99be>] dm_bio_detain+0x10e/0x210 [dm_mod]
    [<ffffffffa06c85b0>] process_bio+0x60/0x170 [dm_thin_pool]
    [<ffffffffa06c8788>] process_deferred_bios+0xc8/0x230 [dm_thin_pool]
    [<ffffffffa06c8942>] do_worker+0x52/0x60 [dm_thin_pool]
    [<ffffffff81057092>] process_one_work+0x132/0x450
    [<ffffffff81058e9b>] worker_thread+0x17b/0x3c0
    [<ffffffff8105dd9e>] kthread+0x9e/0xb0
    [<ffffffff814c66a4>] kernel_thread_helper+0x4/0x10
    [<ffffffffffffffff>] 0xffffffffffffffff

dm_bio_detain+0x10e is drivers/md/dm-bio-prison.c:145, which is dm_bio_detain's:

cell2 = mempool_alloc(prison->cell_pool, GFP_NOIO);

prison->cell_pool was created in dm_prison_create() with:

prison->cell_pool = mempool_create_kmalloc_pool(nr_cells, sizeof(struct cell));

sizeof(struct cell) is 72 bytes:

crash> struct -o cell
struct cell {
   [0] struct hlist_node list;
  [16] struct bio_prison *prison;
  [24] struct cell_key key;
  [48] struct bio *holder;
  [56] struct bio_list bios;
}
SIZE: 72

I'm not yet seeing _why_ there is a leak... but I'll continue looking at the bio-prison code in this area.

Comment 5 Alasdair Kergon 2012-04-12 16:19:39 UTC
Created attachment 577128 [details]
reinstate mempool_free in cell_release_singleton

Patch to try attached.

Comment 6 Alasdair Kergon 2012-04-12 16:34:47 UTC
Also, can we consider mempool_create_slab_pool to keep memory like this segregated instead of kmalloc?

Comment 7 RHEL Program Management 2012-04-12 16:40:14 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 8 Mike Snitzer 2012-04-12 19:57:24 UTC
(In reply to comment #5)
> Created attachment 577128 [details]
> reinstate mempool_free in cell_release_singleton
> 
> Patch to try attached.

prison is undefined.  I took a different approach.  Will attach.

Comment 9 Mike Snitzer 2012-04-12 20:01:54 UTC
Created attachment 577160 [details]
dm thin: fix memory leak of singleton cell

verified to eliminate leak of single cell in upstream Linux 3.4 -- this leak is _not_ a concern for RHEL6.3.

Comment 10 Mike Snitzer 2012-04-12 20:04:26 UTC
(In reply to comment #6)
> Also, can we consider mempool_create_slab_pool to keep memory like this
> segregated instead of kmalloc?

Yes, I have a patch I'll post to dm-devel.

If you'd like this conversion applied to RHEL6.3 it can easily be ported.

Comment 11 Mike Snitzer 2012-04-12 21:35:52 UTC
kmemleak found what it thinks is another leak (which appears relevant to RHEL6.3's thinp):

unreferenced object 0xffff88032ae2fb40 (size 32):
  comm "dmsetup", pid 5477, jiffies 4296842262 (age 6021.209s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 57 c3 12 03 88 ff ff  .........W......
    03 00 00 00 00 00 00 00 6d 2d 32 00 00 00 00 00  ........m-2.....
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200
    [<ffffffffa06ba125>] insert_shadow+0x35/0x90 [dm_persistent_data]
    [<ffffffffa06ba20e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa06b8758>] sm_ll_extend+0xc8/0x120 [dm_persistent_data]
    [<ffffffffa06b9dd3>] dm_sm_metadata_create+0x103/0x1e0 [dm_persistent_data]
    [<ffffffffa06ba593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data]
    [<ffffffffa06ba6db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data]
    [<ffffffffa0357aeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool]
    [<ffffffffa0357f4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool]
    [<ffffffffa03540a9>] pool_create+0x39/0x3e0 [dm_thin_pool]
    [<ffffffffa0356056>] pool_ctr+0x296/0x460 [dm_thin_pool]
    [<ffffffffa02fedde>] dm_table_add_target+0x13e/0x3b0 [dm_mod]
    [<ffffffffa03014f9>] table_load+0xc9/0x340 [dm_mod]
    [<ffffffffa0302874>] ctl_ioctl+0x1b4/0x270 [dm_mod]
    [<ffffffffa0302943>] dm_ctl_ioctl+0x13/0x20 [dm_mod]

There were only 3 of these leaked, so it is considerably more rare than the leak  from comment#4.

drivers/md/persistent-data/dm-transaction-manager.c:insert_shadow calls:
si = kmalloc(sizeof(*si), GFP_NOIO);

crash> struct -o shadow_info
struct shadow_info {
   [0] struct hlist_node hlist;
  [16] dm_block_t where;
}
SIZE: 24

Comment 12 Mike Snitzer 2012-04-12 22:27:09 UTC
The leak from comment#11 is induced by thinp-test-suite's 'test_mass_create_apply_remove_dtrandom_device' test.

Comment 13 Mike Snitzer 2012-04-12 22:43:58 UTC
(In reply to comment #12)
> The leak from comment#11 is induced by thinp-test-suite's
> 'test_mass_create_apply_remove_dtrandom_device' test.

But unfortunately, it isn't a reliable reproducer.

Comment 14 Alasdair Kergon 2012-04-12 22:50:48 UTC
I suggest moving this to a separate bugzilla to simplify tracking.

Comment 15 Mike Snitzer 2012-04-13 00:08:11 UTC
(In reply to comment #14)
> I suggest moving this to a separate bugzilla to simplify tracking.

Well, this BZ is suspect as is.  The original leak report is not relevant for RHEL6.3 -- the upstream Linux 3.4 changes aren't in RHEL6.3.

So the only thing this bug would be for (if the leak in comment#11 is moved to a new BZ) is the cleanup to use slab caches in dm-thin.c.

IMHO, that cleanup isn't even really needed given the late date of RHEL6.3 (and the fact that the cleanup doesn't actually fix anything).

Comment 16 Mike Snitzer 2012-04-13 15:04:17 UTC
Here is another stack trace for the leak associated with insert_shadow's allocation of 'struct shadow_info' (first reported in comment#11):

unreferenced object 0xffff88032d7b5880 (size 32):
  comm "dmsetup", pid 6324, jiffies 4296338651 (age 916.002s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 d8 67 4f fd 02 88 ff ff  .........gO.....
    02 00 00 00 00 00 00 00 6d 2d 30 00 ff ff ff ff  ........m-0.....
  backtrace:
    [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0
    [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200
    [<ffffffffa05d8125>] insert_shadow+0x35/0x90 [dm_persistent_data]
    [<ffffffffa05d820e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa05daeab>] new_block+0x1b/0x20 [dm_persistent_data]
    [<ffffffffa05d92e0>] dm_btree_empty+0x30/0xd0 [dm_persistent_data]
    [<ffffffffa05d60de>] sm_ll_new_metadata+0x8e/0xa0 [dm_persistent_data]
    [<ffffffffa05d7db0>] dm_sm_metadata_create+0xe0/0x1e0 [dm_persistent_data]
    [<ffffffffa05d8593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data]
    [<ffffffffa05d86db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data]
    [<ffffffffa054caeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool]
    [<ffffffffa054cf4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool]
    [<ffffffffa05490a9>] pool_create+0x39/0x3e0 [dm_thin_pool]
    [<ffffffffa054b056>] pool_ctr+0x296/0x460 [dm_thin_pool]
    [<ffffffffa02dddde>] dm_table_add_target+0x13e/0x3b0 [dm_mod]
    [<ffffffffa02e04f9>] table_load+0xc9/0x340 [dm_mod]

test_discard_random_sectors(DiscardTests): pool status metadata(90/1048576) data(0/262144)
Fri Apr 13 10:53:30 -0400 2012 wiping dev
entering discard loop
Apr 13 10:55:39 rhel-storage-01 kernel: kmemleak: 6 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

Bottomline, running the thinp-test-suite will cause kmemleak to record a leak associated with insert_shadow... but there isn't one specific test that reliably causes it.

Comment 17 Mike Snitzer 2012-04-13 15:13:35 UTC
Any leak fixes will be introduced when the upstream Linux 3.4 thinp code is introduced in RHEL6.4.  Pushing to 6.4.

Comment 19 Mike Snitzer 2012-06-20 18:11:50 UTC
The leak in comment#4 is fixed in this 6.4 patch via bug# 828955:
[RHEL6.4 PATCH 23/34] dm thin: fix stacked bi_next usage

The leak in comment#11 has been fixed with this patch:
http://www.redhat.com/archives/dm-devel/2012-June/msg00110.html

Leaving this BZ open for now, it could be used to commit the fix the leak from comment#11... or closed as a dup of a new BZ that will be opened to commit the 2nd push of DM fixes for RHEL6.4.

Comment 20 Mike Snitzer 2012-07-30 21:13:06 UTC
The fix for the leak in comment#11 was posted to rhkernel-list using bug#828955

*** This bug has been marked as a duplicate of bug 828955 ***


Note You need to log in before you can comment on or make changes to this bug.