Description of problem: When kernel memory gets fragment after couple days of heavy use, thin-pool target starts to fail on this allocation error: lvm: page allocation failure: order:5, mode:0x40d0 CPU: 0 PID: 15987 Comm: lvm Tainted: G W 4.0.0-0.rc7.git2.1.fc23.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 0000000000000000 0000000080956201 ffff8800007bf8c8 ffffffff81781898 0000000000000000 00000000000040d0 ffff8800007bf958 ffffffff811a5bbe ffff88007a5eab08 0000000000000005 0000000000000040 ffff88002723d970 Call Trace: [<ffffffff81781898>] dump_stack+0x45/0x57 [<ffffffff811a5bbe>] warn_alloc_failed+0xfe/0x170 [<ffffffff811a9523>] ? __alloc_pages_direct_compact+0x43/0x100 [<ffffffff811a9b17>] __alloc_pages_nodemask+0x537/0xa10 [<ffffffff811f2c21>] alloc_pages_current+0x91/0x110 [<ffffffff811a5d5b>] alloc_kmem_pages+0x3b/0xf0 [<ffffffffa012d8ff>] ? dm_bm_unlock+0x2f/0x60 [dm_persistent_data] [<ffffffff811c3f6e>] kmalloc_order_trace+0x2e/0xd0 [<ffffffffa0146766>] pool_ctr+0x486/0x9d0 [dm_thin_pool] [<ffffffff815fa65b>] dm_table_add_target+0x15b/0x3b0 [<ffffffff815fa1b7>] ? dm_table_create+0x87/0x140 [<ffffffff815fde4b>] table_load+0x14b/0x370 [<ffffffff815fdd00>] ? retrieve_status+0x1c0/0x1c0 [<ffffffff815feaf2>] ctl_ioctl+0x232/0x520 [<ffffffff815fedf3>] dm_ctl_ioctl+0x13/0x20 [<ffffffff81231d86>] do_vfs_ioctl+0x2c6/0x4d0 [<ffffffff8114085c>] ? __audit_syscall_entry+0xac/0x100 [<ffffffff810225d5>] ? do_audit_syscall_entry+0x55/0x80 [<ffffffff81232011>] SyS_ioctl+0x81/0xa0 [<ffffffff81788188>] ? int_check_syscall_exit_work+0x34/0x3d [<ffffffff81787f49>] system_call_fastpath+0x12/0x17 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 active_anon:28189 inactive_anon:24979 isolated_anon:0\x0a active_file:119411 inactive_file:79073 isolated_file:0\x0a unevictable:4714 dirty:441 writeback:0 unstable:0\x0a free:26120 slab_reclaimable:119444 slab_unreclaimable:46909\x0a mapped:12817 shmem:32535 pagetables:1146 bounce:0\x0a free_cma:0 Node 0 DMA free:8060kB min:364kB low:452kB high:544kB active_anon:1144kB inactive_anon:1212kB active_file:208kB inactive_file:164kB unevictable:44kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:40kB dirty:0kB writeback:0kB mapped:0kB shmem:1752kB slab_reclaimable:2268kB slab_unreclaimable:1028kB kernel_stack:208kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8 all_unreclaimable? no lowmem_reserve[]: 0 1893 1893 1893 Node 0 DMA32 free:96420kB min:44688kB low:55860kB high:67032kB active_anon:111612kB inactive_anon:98704kB active_file:477436kB inactive_file:316128kB unevictable:18812kB isolated(anon):0kB isolated(file):0kB present:1988596kB managed:1941380kB mlocked:18800kB dirty:1764kB writeback:0kB mapped:51268kB shmem:128388kB slab_reclaimable:475508kB slab_unreclaimable:186608kB kernel_stack:2416kB pagetables:4576kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:640 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 116*4kB (UEM) 39*8kB (UEM) 7*16kB (U) 9*32kB (UM) 8*64kB (UE) 10*128kB (UEM) 10*256kB (UM) 1*512kB (U) 0*1024kB 1*2048kB (R) 0*4096kB = 8088kB Node 0 DMA32: 19355*4kB (UEM) 840*8kB (UEM) 346*16kB (UEM) 9*32kB (UM) 3*64kB (M) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 1*4096kB (R) = 96428kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 234220 total pagecache pages 1507 pages in swap cache Swap cache stats: add 11781, delete 10274, find 266363/269302 Free swap = 1031824kB Total swap = 1048572kB 501145 pages RAM 0 pages HighMem/MovableOnly 11825 pages reserved 0 pages hwpoisoned device-mapper: table: 253:14: thin-pool: Error allocating memory for pool device-mapper: ioctl: error adding target to table device-mapper: reload ioctl on (253:14) failed: Cannot allocate memory -- ## DEBUG: libdm-deptree.c:2646 Loading @PREFIX@vg-LV1 table (253:14) ## DEBUG: libdm-deptree.c:2590 Adding target to (253:14): 0 65536 thin-pool 253:11 253:13 128 0 0 ## DEBUG: ioctl/libdm-iface.c:1802 dm table (253:14) OF [16384] (*1) ## DEBUG: ioctl/libdm-iface.c:1802 dm reload (253:14) NF [16384] (*1) ## DEBUG: ioctl/libdm-iface.c:1834 device-mapper: reload ioctl on (253:14) failed: Cannot allocate memory lvm2-> Failed to activate pool logical volume @PREFIX@vg/LV1. Version-Release number of selected component (if applicable): kernel 4.0 lvm2 2.02.120 How reproducible: Using lvm2 test suite for days without rebooting machine. Particularly test/shell/lvconvert-thin.sh often fail. ## Line: 140 lvcreate -L32 -n $lv1 $vg ## Line: 141 lvcreate -L16 -n $lv2 $vg ## Line: 142 lvconvert --yes --thinpool $vg/$lv1 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1030538 [details] Full test trace with visible memory fault kernel report
Order 5 is only 128k of memory. Probably allocated by one of the slabs. At what point do you suggest I stop using kmalloc and switch to vmalloc()? 32k? 4k? Or are you suggesting the fragmentation is due to thinp?
Mikulas - do you have some advice here ?
We could try kmalloc and if it fails, use vmalloc. The code that tries kmalloc and falls back to vmalloc is already present in dm-ioctl.c in function copy_params. So, we should extract that piece of code into a separate function and call that function also from dm-thin.c. I will write a patch that does that.
https://github.com/jthornber/linux-2.6/commit/356f917245d0b61faf1204344f5d7fb5bac1ae21
I posted a patch set for this bug here. The patch set extracts the common pattern (try kmalloc and fallback to vmalloc) into a single function dm_kvmalloc and calls it from several places in device mapper. https://www.redhat.com/archives/dm-devel/2015-July/msg00004.html https://www.redhat.com/archives/dm-devel/2015-July/msg00005.html https://www.redhat.com/archives/dm-devel/2015-July/msg00006.html https://www.redhat.com/archives/dm-devel/2015-July/msg00007.html https://www.redhat.com/archives/dm-devel/2015-July/msg00008.html https://www.redhat.com/archives/dm-devel/2015-July/msg00009.html https://www.redhat.com/archives/dm-devel/2015-July/msg00010.html https://www.redhat.com/archives/dm-devel/2015-July/msg00011.html
Do you want the Fedora kernel to carry any of these patches before landing in upstream, or should we just wait for them to be merged? The kernel tested in the original comment is "old" now, so we'd be looking at adding whatever in rawhide first if something were to be added.
(In reply to Josh Boyer from comment #7) > Do you want the Fedora kernel to carry any of these patches before landing > in upstream, or should we just wait for them to be merged? > > The kernel tested in the original comment is "old" now, so we'd be looking > at adding whatever in rawhide first if something were to be added. I'll be sending a set of 4.2-rc fixes to Linus next week. AFAIK this issue isn't so common as to warrant a rush _now_. But this is the fix that is staged and destined for upstream (and stable@): https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=a822c83e47d97cdef38c4352e1ef62d9f46cfe98 You're welcome to pick it up and carry in Fedora rawhide now if you like. The patches that Mikulas listed in comment#6 have been reworked some and will hopefully land in Linux 4.3 (dm-thinp will be adapted accordingly at that time).
Fix has been upstream (and in Fedora) since July 2015.