Hide Forgot
+++ This bug was initially created as a clone of Bug #1015024 +++ Description of problem: Conversion of lvm2 raid seems to trigger kmem_cache_sanity_check and results in failure upon dm table reload: DEBUG: ioctl/libdm-iface.c:1750 dm table (253:11) OF [16384] (*1) DEBUG: libdm-deptree.c:2511 Suppressed @PREFIX@vg-LV1_rmeta_0 (253:11) identical table reload. DEBUG: libdm-deptree.c:2476 Loading @PREFIX@vg-LV1 table (253:19) DEBUG: libdm-deptree.c:2420 Adding target to (253:19): 0 1536 raid raid5_ls 3 128 region_size 512 4 253:11 253:12 253:20 253:21 253:15 253:16 253:17 253:18 DEBUG: ioctl/libdm-iface.c:1750 dm table (253:19) OF [16384] (*1) DEBUG: ioctl/libdm-iface.c:1750 dm reload (253:19) NF [16384] (*1) DEBUG: ioctl/libdm-iface.c:1768 device-mapper: reload ioctl on failed: Input/output error DEBUG: libdm-deptree.c:2572 <backtrace> DEBUG: activate/dev_manager.c:2680 <backtrace> DEBUG: activate/dev_manager.c:2728 <backtrace> DEBUG: activate/activate.c:1078 <backtrace> DEBUG: activate/activate.c:1676 <backtrace> DEBUG: locking/locking.c:394 <backtrace> DEBUG: locking/locking.c:464 <backtrace> DEBUG: metadata/raid_manip.c:1751 Failed to suspend @PREFIX@vg/LV1 before committing changes DEBUG: lvconvert.c:2638 <backtrace> #lvconvert-raid.sh:238+ lvconvert --replace @TESTDIR@/dev/mapper/@PREFIX@pv2 @PREFIX@vg/LV1 mdX: bitmap initialized from disk: read 1 pages, set 0 of 1 bits md: recovery of RAID array mdX md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. md: using 128k window, over a total of 256k. @TESTDIR@/dev/mapper/@PREFIX@vg-LV1_rmeta_4 not set up by udev: Falling back to direct node creation. @TESTDIR@/dev/mapper/@PREFIX@vg-LV1_rimage_4 not set up by udev: Falling back to direct node creation. md: mdX: recovery done. md/raid:mdX: device dm-18 operational as raid disk 3 md/raid:mdX: device dm-16 operational as raid disk 2 md/raid:mdX: device dm-12 operational as raid disk 0 kmem_cache_sanity_check (raid5-ffff88004ae96010): Cache name already exists. CPU: 0 PID: 14234 Comm: lvm Not tainted 3.11.1-300.fc20.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 00007ffffffff000 ffff8800757979f0 ffffffff81643cbb ffff880076f3df00 ffff880075797a60 ffffffff8115d45d 0000000000000000 0000000000000000 00000000000005d0 0000000000000000 ffff8800371a5cd0 ffff880075797fd8 Call Trace: [<ffffffff81643cbb>] dump_stack+0x45/0x56 [<ffffffff8115d45d>] kmem_cache_create_memcg+0x12d/0x380 [<ffffffff8115d6db>] kmem_cache_create+0x2b/0x30 [<ffffffffa012aa05>] setup_conf+0x5c5/0x790 [raid456] [<ffffffff8113ebcd>] ? mempool_create_node+0xdd/0x140 [<ffffffff8118e527>] ? kmem_cache_alloc_trace+0x1d7/0x230 [<ffffffff8113e8b0>] ? mempool_alloc_slab+0x20/0x20 [<ffffffffa012b748>] run+0x858/0xa50 [raid456] [<ffffffff811dbff6>] ? bioset_create+0x216/0x2e0 [<ffffffff8113d365>] ? filemap_write_and_wait+0x55/0x60 [<ffffffff814daecc>] md_run+0x3fc/0x980 [<ffffffff811dc76d>] ? bio_put+0x7d/0xa0 [<ffffffff814d2bb8>] ? sync_page_io+0xc8/0x140 [<ffffffffa014051c>] raid_ctr+0xecc/0x135d [dm_raid] [<ffffffff814e6527>] dm_table_add_target+0x167/0x470 [<ffffffff814e952b>] table_load+0x10b/0x320 [<ffffffff814e9420>] ? list_devices+0x180/0x180 [<ffffffff814ea315>] ctl_ioctl+0x255/0x500 [<ffffffff814ea5d3>] dm_ctl_ioctl+0x13/0x20 [<ffffffff811b9bdd>] do_vfs_ioctl+0x2dd/0x4b0 [<ffffffff811aa3a1>] ? __sb_end_write+0x31/0x60 [<ffffffff811a7f92>] ? vfs_write+0x172/0x1e0 [<ffffffff811b9e31>] SyS_ioctl+0x81/0xa0 [<ffffffff81652e59>] system_call_fastpath+0x16/0x1b md/raid:mdX: couldn't allocate 4338kB for buffers md: pers->run() failed ... device-mapper: table: 253:19: raid: Fail to run raid array device-mapper: ioctl: error adding target to table device-mapper: reload ioctl on failed: Input/output error Failed to suspend @PREFIX@vg/LV1 before committing changes Node @TESTDIR@/dev/mapper/@PREFIX@vg-LV1_rmeta_1 was not removed by udev. Falling back to direct node removal. Node @TESTDIR@/dev/mapper/@PREFIX@vg-LV1_rimage_1 was not removed by udev. Falling back to direct node removal. Version-Release number of selected component (if applicable): 3.11.1-300.fc20.x86_64 How reproducible: lvconvert-raid test from internal lvm2 test suite occasionally triggers this issue - it seems to be time-dependent Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Jonathan Earl Brassow on 2013-10-21 22:00:50 EDT --- Bug fixed by the following upstream commit: commit 3e374919b314f20e2a04f641ebc1093d758f66a4 Author: Christoph Lameter <cl> Date: Sat Sep 21 21:56:34 2013 +0000 slab_common: Do not check for duplicate slab names SLUB can alias multiple slab kmem_create_requests to one slab cache to save memory and increase the cache hotness. As a result the name of the slab can be stale. Only check the name for duplicates if we are in debug mode where we do not merge multiple caches. This fixes the following problem reported by Jonathan Brassow: The problem with kmem_cache* is this: *) Assume CONFIG_SLUB is set 1) kmem_cache_create(name="foo-a") - creates new kmem_cache structure 2) kmem_cache_create(name="foo-b") - If identical cache characteristics, it will be merged with the previously created cache associated with "foo-a". The cache's refcount will be incremented and an alias will be created via sysfs_slab_alias(). 3) kmem_cache_destroy(<ptr>) - Attempting to destroy cache associated with "foo-a", but instead the refcount is simply decremented. I don't even think the sysfs aliases are ever removed... 4) kmem_cache_create(name="foo-a") - This FAILS because kmem_cache_sanity_check colides with the existing name ("foo-a") associated with the non-removed cache. This is a problem for RAID (specifically dm-raid) because the name used for the kmem_cache_create is ("raid%d-%p", level, mddev). If the cache persists for long enough, the memory address of an old mddev will be reused for a new mddev - causing an identical formulation of the cache name. Even though kmem_cache_destory had long ago been used to delete the old cache, the merging of caches has cause the name and cache of that old instance to be preserved and causes a colision (and thus failure) in kmem_cache_create(). I see this regularly in my testing. Reported-by: Jonathan Brassow <jbrassow> Signed-off-by: Christoph Lameter <cl> Signed-off-by: Pekka Enberg <penberg> --- Additional comment from Jonathan Earl Brassow on 2013-10-21 22:01:43 EDT --- I will leave in 'NEW' until someone pulls it in to fedora. --- Additional comment from Josh Boyer on 2013-10-22 08:51:34 EDT --- This patch needs to be backported to 3.11 kernel The bug is reported against rawhide. Rawhide already has this commit. Does this need to be sent to upstream stable?
Without this patch f19 randomly fails when lvm2 is using md-raid mirrors.
I applied the patch. My question about upstream stable still stands though.
kernel-3.11.9-300.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.9-300.fc20
kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19
kernel-3.11.9-100.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.11.9-100.fc18
Package kernel-3.11.9-100.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.9-100.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-21822/kernel-3.11.9-100.fc18 then log in and leave karma (feedback).
kernel-3.11.9-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.11.9-300.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.11.9-100.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.