Bug 1419684 - Using raid target leaks kernel memory
Summary: Using raid target leaks kernel memory
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: LVM and device-mapper
Classification: Community
Component: device-mapper
Version: 2.02.169
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Nigel Croxon
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1519577
TreeView+ depends on / blocked
 
Reported: 2017-02-06 18:27 UTC by Zdenek Kabelac
Modified: 2017-12-07 13:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1519577 (view as bug list)
Environment:
Last Closed: 2017-11-30 23:38:15 UTC
Embargoed:
rule-engine: lvm-technical-solution?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1315417 0 unspecified CLOSED Potential memory leak when stressing lvm activation operations 2021-02-22 00:41:40 UTC

Internal Links: 1315417

Description Zdenek Kabelac 2017-02-06 18:27:35 UTC
Description of problem:

Running lvm2 test suite  'make check_local T=lvconvert-raid.sh' leaks ~336 4K pages reported by kmemleak.

unreferenced object 0xffff8f4311e9c000 (size 4096):
  comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
  hex dump (first 32 bytes):
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
    02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
  backtrace:
    [<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
    [<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
    [<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
    [<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
    [<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
    [<ffffffffa676f1ae>] bitmap_load+0xbe/0x160
    [<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
    [<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
    [<ffffffffa6774b52>] dm_resume+0x122/0x140
    [<ffffffffa6779b9f>] dev_suspend+0x18f/0x290
    [<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
    [<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
    [<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
    [<ffffffffa62d7269>] SyS_ioctl+0x79/0x90
    [<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2


Version-Release number of selected component (if applicable):
kernel  4.10-rcX
dm raid target 1.9.0/1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2017-02-06 18:59:58 UTC
Actually 'make check_local T=lvcreate-large-raid.sh' is the one which surely triggers problem.

Comment 2 Heinz Mauelshagen 2017-02-13 12:19:57 UTC
The resoning seems to be that bitmap_free() does not free hijacked bitmap pages:
"(if (bp[k].map && !bp[k].hijacked)
     kfree(bp[k].map);"

Currrent theory is, that bp[k].hijacked is not being reset properly before the bitmap is being destroyed. Hijackinga bitmap page becomes necessary under low memory conditions _but_ freeing it up on destroy must happen or leak.

Comment 3 Heinz Mauelshagen 2017-02-13 14:37:47 UTC
Instrumented and tested: never hit a bk[k].hijacked -> theory's wrong.

Comment 4 Heinz Mauelshagen 2017-02-13 16:48:05 UTC
Thinking twice, the leaks are being detected on initial bitmap_load() when kzalloc() is called from bitmap_checkpage() referencing the result via an auto ptr variable which can be conditionally used to free the allocation or set the repective slots reference in the bitmap page map array. I don't see where the any leakage is in that code path yet.

Comment 5 Zdenek Kabelac 2017-11-03 12:40:41 UTC
I believe the bug is related to resize of bitmap whet the content of old bitmap map states were not released.

With this patch I get these leak fixd:

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index d2121637b4ab..58ee21027709 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -2152,6 +2152,7 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
                                for (k = 0; k < page; k++) {
                                        kfree(new_bp[k].map);
                                }
+                               kfree(new_bp);
 
                                /* restore some fields from old_counts */
                                bitmap->counts.bp = old_counts.bp;
@@ -2202,6 +2203,14 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
                block += old_blocks;
        }
 
+       if (bitmap->counts.bp != old_counts.bp) {
+               unsigned long k;
+               for (k = 0; k < old_counts.pages; k++)
+                       if (!old_counts.bp[k].hijacked)
+                               kfree(old_counts.bp[k].map);
+               kfree(old_counts.bp);
+       }
+
        if (!init) {
                int i;
                while (block < (chunks << chunkshift)) {

Comment 6 Heinz Mauelshagen 2017-11-30 23:33:00 UTC
MD upstream kernel commit 0868b99c214a3d55486c700de7c3f770b7243e7c needs to be backported to rhel7 kernel.


Note You need to log in before you can comment on or make changes to this bug.