Description of problem: Starting with kernel 4.1 - lvm2 could not longer properly create raid1 LVs using dm-raid1 mdraid wrapping target. Here is log of failing lv creation: device-mapper: raid: Superblocks created for new array md/raid1:mdX: active with 2 out of 2 mirrors Choosing daemon_sleep default (5 sec) created bitmap (32 pages) for device mdX attempt to access beyond end of device dm-6: rw=13329, want=0, limit=16384 md: super_written gets error=-5, uptodate=0 md/raid1:mdX: Disk failure on dm-7, disabling device. md/raid1:mdX: Operation continuing on 1 devices. attempt to access beyond end of device dm-4: rw=13329, want=0, limit=16384 md: super_written gets error=-5, uptodate=0 attempt to access beyond end of device dm-4: rw=13329, want=0, limit=16384 md: super_written gets error=-5, uptodate=0 mdX: bitmap file is out of date, doing full recovery mdX: bitmap initialized from disk: read 3 pages, set 65536 of 65536 bits RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:dm-5 disk 1, wo:1, o:0, dev:dm-7 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:dm-5 From what I've experienced: lvcreate -L31G vg [--alloc anywhere] still works while using sie -L32G fails. User could succeed using bigger '--regionsize 1M' in this case, however with 4.0 kernel even 512K region size normally works. Also the messages produced by kernel are somewhat confusing since created bitmap sizes or error message do not quite respond to passed device and their sizes and the amount of free space they have to hold bitmaps. From my experiment 4.0 kernel was the last where standard args used for raid1 creation worked. With mdraid changes introduced in 4.1 kernel it no longer works. If user even tries size like 1TB - were interesting overflow errors are even seen i.e.: attempt to access beyond end of device dm-6: rw=13329, want=18446744073709551120, limit=64 md: super_written gets error=-5, uptodate=0 -- The bad part for lvm2 is - it's being completely unnoticed by lvcreate command - it return success and no error is printed - even though there is technically create plain 'stripe' device with dead leg - which could be only noticed if user is familiar with reported lvs attributes. Version-Release number of selected component (if applicable): 4.2 kernel How reproducible: Steps to Reproduce: 1. lvcreate --type raid1 -L32G [--alloc anywhere] --nosync vg 2. 3. Actual results: lvcreate pretends there is working raid1 Expected results: lvcreate creates working raid1 and if can't - report correct error. Additional info:
Neil Brown suggests: > Probably bug fixed by: > commit d3b178adb3a3adf54ecf77758138b654c3ee7f09 > when dm-raid created a bitmap it didn't zero out unused fields. > So when we started using another field it confused dm-raid. > The patch zeros things properly, and ignores the field when dm-raid is > in use. However you told me you still saw this bug with current upstream kernels, including 4.2-rc8. Is that correct?
Tested on 4.2.0-0.rc8.git1.1.fc24.x86_64
Created attachment 1068013 [details] Ooops stack trace Here is reproduced kernel oops with recent 4.2-rc8 rawhide kernel. This oops is reach via lvm2 test suite usage: make check_local T=lvcreate-large-raid.sh So the invalid 'want=' not just causes unusable raid volumes with 32G and bigger sizes, but may lead to complete kernel deadlock.
This is the md bitmap flaw I got ths workaround for: diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index e51de52..7bc7595 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -446,6 +446,7 @@ void bitmap_update_sb(struct bitmap *bitmap) sb->sectors_reserved = cpu_to_le32(bitmap->mddev-> bitmap_info.space); kunmap_atomic(sb); + bitmap->storage.sb_page->index = 0; write_page(bitmap, bitmap->storage.sb_page, 1); Neil is aware of it and investigating as well
I hit this on my Fedora workstation with kernel-4.1.6-201.fc22.x86_64 and lvm2-2.02.116-3.fc22.x86_64. I guess there are really two issues here... The first is that the superblock is not written properly on creation but LVM still starts the volume, so the user will then go ahead and create a filesystem and fill it with data. But then some time later on the next reboot the volume is unrecoverable. I lost some data because of this, so it seems like quite a serious bug. Maybe LVM could stop and re-assemble the array as part of the creation process? That would catch these kinds of problems where the array starts but it cannot actually be assembled again. The underlying problem is that the dm raid1 code in the kernel is supposed to write a superblock when the array is created, but it's not doing that correctly. Is that right? I found a related patch posted to the raid list: http://thread.gmane.org/gmane.linux.raid/49398 but this seems to be a workaround to deal with the bad superblocks at assembly time, not fixing the problem of why the superblocks are wrong in the first place?
(In reply to Heinz Mauelshagen from comment #4) > This is the md bitmap flaw I got ths workaround for: This patch fixes the problem. Now I can create raid1 volumes without errors and they are successfully deactivated and re-activated. Seems like the superblock is being written properly now. BTW why is the patch in a private comment?
(In reply to Dan Callaghan from comment #6) > This patch fixes the problem. Now I can create raid1 volumes without errors > and they are successfully deactivated and re-activated. Seems like the > superblock is being written properly now. I was using a 4.1.7 Fedora kernel plus the patch in comment 4.
Re: comment 5. That patches does fix the problem of why the superblocks are wrong by adding the __GFP_ZERO flag so that that unused fields in the superblock are zeroed. The patch went upstream as the commit mentioned in comment 1 Re: comment 3. The stacktrace shows that a page->index was 00000000fffffff1 (eax>>2). I think that is a very different bug and probably not an md bug. It is hard to be sure though.
(In reply to Dan Callaghan from comment #6) > BTW why is the patch in a private comment? All made public.
Neil meanwhile found the flaw (i.e. bitmap->cluster_slot containing -1 in the non-clustered case) worked around with my patch as of comment #4: diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index 50c9373cebdd..4f22e919787a 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -1995,7 +1995,8 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks, if (bitmap->mddev->bitmap_info.offset || bitmap->mddev->bitmap_info.file) ret = bitmap_storage_alloc(&store, chunks, !bitmap->mddev->bitmap_info.external, - bitmap->cluster_slot); + mddev_is_clustered(bitmap->mddev) + ? bitmap->cluster_slot : 0); if (ret) goto err;
Is there any chance to get this bug fixed in Fedora 22? In the actual version 4.2.3-200.fc22.x86_64 I can't create new RADI1 LVs.
After update Fedora 22 to kernel 4.2.6-201.fc22.x86_64 it seems be fixed. I tested it and it was possible to create LV >32GB.
(In reply to Frank Haefemeier from comment #12) > After update Fedora 22 to kernel 4.2.6-201.fc22.x86_64 it seems be fixed. I > tested it and it was possible to create LV >32GB. Fixed with upstream commit da6fb7a9e. WFM with Fedora kernel 4.2.6-200.fc22.x86_64, closing.