Bug 811946
Summary: | dm-thinp: memory leak | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Heinz Mauelshagen <heinzm> | ||||||
Component: | kernel | Assignee: | Mike Snitzer <msnitzer> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.4 | CC: | agk, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-07-30 21:13:06 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 840683 | ||||||||
Attachments: |
|
Description
Heinz Mauelshagen
2012-04-12 11:30:20 UTC
I'd suggest try to use kmemleak (with linux debug kernel) - so you get pretty quickly to code path which leaks memory chunks. The slab that has all the leaked memory is "size-128". I was going to try kmemleak today... but Heinz you're welcome to take ownership of tracking this down (and thanks for opening this BZ). Ah, comment#0 already pointed to slab-128... Telltale sign of a leak is the memory is unreclaimable (from /proc/meminfo): Slab: 2052224 kB SReclaimable: 121952 kB SUnreclaim: 1930272 kB I am using a modified Linux 3.3 kernel with 3.4's DM changes plus refactoring that split the bio-prison code out to drivers/md/dm-bio-prison.c -- and all exported bio-prison.c symbols were exported via dm-mod. I ran: ./run_tests -n /test_two_pools_can_create_thins/ which resulted in numerous kmemleak reports like the following: unreferenced object 0xffff8802b1567f40 (size 128): comm "kworker/u:86", pid 651, jiffies 4295253235 (age 302.975s) hex dump (first 32 bytes): 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de .......... ..... 00 80 3b f0 02 88 ff ff 01 00 00 00 03 88 ff ff ..;............. backtrace: [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0 [<ffffffff811459cf>] __kmalloc+0x13f/0x260 [<ffffffff810fbe95>] mempool_kmalloc+0x15/0x20 [<ffffffff810fc040>] mempool_alloc+0x60/0x170 [<ffffffffa02f99be>] dm_bio_detain+0x10e/0x210 [dm_mod] [<ffffffffa06c85b0>] process_bio+0x60/0x170 [dm_thin_pool] [<ffffffffa06c8788>] process_deferred_bios+0xc8/0x230 [dm_thin_pool] [<ffffffffa06c8942>] do_worker+0x52/0x60 [dm_thin_pool] [<ffffffff81057092>] process_one_work+0x132/0x450 [<ffffffff81058e9b>] worker_thread+0x17b/0x3c0 [<ffffffff8105dd9e>] kthread+0x9e/0xb0 [<ffffffff814c66a4>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff dm_bio_detain+0x10e is drivers/md/dm-bio-prison.c:145, which is dm_bio_detain's: cell2 = mempool_alloc(prison->cell_pool, GFP_NOIO); prison->cell_pool was created in dm_prison_create() with: prison->cell_pool = mempool_create_kmalloc_pool(nr_cells, sizeof(struct cell)); sizeof(struct cell) is 72 bytes: crash> struct -o cell struct cell { [0] struct hlist_node list; [16] struct bio_prison *prison; [24] struct cell_key key; [48] struct bio *holder; [56] struct bio_list bios; } SIZE: 72 I'm not yet seeing _why_ there is a leak... but I'll continue looking at the bio-prison code in this area. Created attachment 577128 [details]
reinstate mempool_free in cell_release_singleton
Patch to try attached.
Also, can we consider mempool_create_slab_pool to keep memory like this segregated instead of kmalloc? This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #5) > Created attachment 577128 [details] > reinstate mempool_free in cell_release_singleton > > Patch to try attached. prison is undefined. I took a different approach. Will attach. Created attachment 577160 [details]
dm thin: fix memory leak of singleton cell
verified to eliminate leak of single cell in upstream Linux 3.4 -- this leak is _not_ a concern for RHEL6.3.
(In reply to comment #6) > Also, can we consider mempool_create_slab_pool to keep memory like this > segregated instead of kmalloc? Yes, I have a patch I'll post to dm-devel. If you'd like this conversion applied to RHEL6.3 it can easily be ported. kmemleak found what it thinks is another leak (which appears relevant to RHEL6.3's thinp): unreferenced object 0xffff88032ae2fb40 (size 32): comm "dmsetup", pid 5477, jiffies 4296842262 (age 6021.209s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b0 57 c3 12 03 88 ff ff .........W...... 03 00 00 00 00 00 00 00 6d 2d 32 00 00 00 00 00 ........m-2..... backtrace: [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0 [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200 [<ffffffffa06ba125>] insert_shadow+0x35/0x90 [dm_persistent_data] [<ffffffffa06ba20e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data] [<ffffffffa06b8758>] sm_ll_extend+0xc8/0x120 [dm_persistent_data] [<ffffffffa06b9dd3>] dm_sm_metadata_create+0x103/0x1e0 [dm_persistent_data] [<ffffffffa06ba593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data] [<ffffffffa06ba6db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data] [<ffffffffa0357aeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool] [<ffffffffa0357f4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool] [<ffffffffa03540a9>] pool_create+0x39/0x3e0 [dm_thin_pool] [<ffffffffa0356056>] pool_ctr+0x296/0x460 [dm_thin_pool] [<ffffffffa02fedde>] dm_table_add_target+0x13e/0x3b0 [dm_mod] [<ffffffffa03014f9>] table_load+0xc9/0x340 [dm_mod] [<ffffffffa0302874>] ctl_ioctl+0x1b4/0x270 [dm_mod] [<ffffffffa0302943>] dm_ctl_ioctl+0x13/0x20 [dm_mod] There were only 3 of these leaked, so it is considerably more rare than the leak from comment#4. drivers/md/persistent-data/dm-transaction-manager.c:insert_shadow calls: si = kmalloc(sizeof(*si), GFP_NOIO); crash> struct -o shadow_info struct shadow_info { [0] struct hlist_node hlist; [16] dm_block_t where; } SIZE: 24 The leak from comment#11 is induced by thinp-test-suite's 'test_mass_create_apply_remove_dtrandom_device' test. (In reply to comment #12) > The leak from comment#11 is induced by thinp-test-suite's > 'test_mass_create_apply_remove_dtrandom_device' test. But unfortunately, it isn't a reliable reproducer. I suggest moving this to a separate bugzilla to simplify tracking. (In reply to comment #14) > I suggest moving this to a separate bugzilla to simplify tracking. Well, this BZ is suspect as is. The original leak report is not relevant for RHEL6.3 -- the upstream Linux 3.4 changes aren't in RHEL6.3. So the only thing this bug would be for (if the leak in comment#11 is moved to a new BZ) is the cleanup to use slab caches in dm-thin.c. IMHO, that cleanup isn't even really needed given the late date of RHEL6.3 (and the fact that the cleanup doesn't actually fix anything). Here is another stack trace for the leak associated with insert_shadow's allocation of 'struct shadow_info' (first reported in comment#11): unreferenced object 0xffff88032d7b5880 (size 32): comm "dmsetup", pid 6324, jiffies 4296338651 (age 916.002s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 d8 67 4f fd 02 88 ff ff .........gO..... 02 00 00 00 00 00 00 00 6d 2d 30 00 ff ff ff ff ........m-0..... backtrace: [<ffffffff814a5f4e>] kmemleak_alloc+0x5e/0xc0 [<ffffffff81144983>] kmem_cache_alloc_trace+0xd3/0x200 [<ffffffffa05d8125>] insert_shadow+0x35/0x90 [dm_persistent_data] [<ffffffffa05d820e>] dm_tm_new_block+0x8e/0xa0 [dm_persistent_data] [<ffffffffa05daeab>] new_block+0x1b/0x20 [dm_persistent_data] [<ffffffffa05d92e0>] dm_btree_empty+0x30/0xd0 [dm_persistent_data] [<ffffffffa05d60de>] sm_ll_new_metadata+0x8e/0xa0 [dm_persistent_data] [<ffffffffa05d7db0>] dm_sm_metadata_create+0xe0/0x1e0 [dm_persistent_data] [<ffffffffa05d8593>] dm_tm_create_internal+0xe3/0x200 [dm_persistent_data] [<ffffffffa05d86db>] dm_tm_create_with_sm+0x2b/0x30 [dm_persistent_data] [<ffffffffa054caeb>] init_pmd.clone.0+0x3b/0x370 [dm_thin_pool] [<ffffffffa054cf4e>] dm_pool_metadata_open+0x12e/0x2ec [dm_thin_pool] [<ffffffffa05490a9>] pool_create+0x39/0x3e0 [dm_thin_pool] [<ffffffffa054b056>] pool_ctr+0x296/0x460 [dm_thin_pool] [<ffffffffa02dddde>] dm_table_add_target+0x13e/0x3b0 [dm_mod] [<ffffffffa02e04f9>] table_load+0xc9/0x340 [dm_mod] test_discard_random_sectors(DiscardTests): pool status metadata(90/1048576) data(0/262144) Fri Apr 13 10:53:30 -0400 2012 wiping dev entering discard loop Apr 13 10:55:39 rhel-storage-01 kernel: kmemleak: 6 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Bottomline, running the thinp-test-suite will cause kmemleak to record a leak associated with insert_shadow... but there isn't one specific test that reliably causes it. Any leak fixes will be introduced when the upstream Linux 3.4 thinp code is introduced in RHEL6.4. Pushing to 6.4. The leak in comment#4 is fixed in this 6.4 patch via bug# 828955: [RHEL6.4 PATCH 23/34] dm thin: fix stacked bi_next usage The leak in comment#11 has been fixed with this patch: http://www.redhat.com/archives/dm-devel/2012-June/msg00110.html Leaving this BZ open for now, it could be used to commit the fix the leak from comment#11... or closed as a dup of a new BZ that will be opened to commit the 2nd push of DM fixes for RHEL6.4. The fix for the leak in comment#11 was posted to rhkernel-list using bug#828955 *** This bug has been marked as a duplicate of bug 828955 *** |