Bug 1257636 - Unable to create RAID1 > 32GB with kernels v4.1 - v4.3-rc2-27-gda6fb7a
Unable to create RAID1 > 32GB with kernels v4.1 - v4.3-rc2-27-gda6fb7a
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
rawhide
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Heinz Mauelshagen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-27 09:56 EDT by Zdenek Kabelac
Modified: 2015-12-17 12:43 EST (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-17 12:43:48 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Ooops stack trace (7.35 KB, text/plain)
2015-08-28 10:08 EDT, Zdenek Kabelac
no flags Details

  None (edit)
Description Zdenek Kabelac 2015-08-27 09:56:07 EDT
Description of problem:

Starting with kernel 4.1  - lvm2 could not longer properly create raid1 LVs using dm-raid1 mdraid wrapping target.

Here is log of failing lv creation:

device-mapper: raid: Superblocks created for new array
md/raid1:mdX: active with 2 out of 2 mirrors
Choosing daemon_sleep default (5 sec)
created bitmap (32 pages) for device mdX
attempt to access beyond end of device
dm-6: rw=13329, want=0, limit=16384
md: super_written gets error=-5, uptodate=0
md/raid1:mdX: Disk failure on dm-7, disabling device.
md/raid1:mdX: Operation continuing on 1 devices.
attempt to access beyond end of device
dm-4: rw=13329, want=0, limit=16384
md: super_written gets error=-5, uptodate=0
attempt to access beyond end of device
dm-4: rw=13329, want=0, limit=16384
md: super_written gets error=-5, uptodate=0
mdX: bitmap file is out of date, doing full recovery
mdX: bitmap initialized from disk: read 3 pages, set 65536 of 65536 bits
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:dm-5
 disk 1, wo:1, o:0, dev:dm-7
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:dm-5


From what I've experienced:

lvcreate -L31G vg  [--alloc anywhere]    still works
while using sie  -L32G fails.

User could succeed using bigger '--regionsize 1M' in this case, however with 4.0 kernel even 512K region size normally works.

Also the messages produced by kernel are somewhat confusing since created bitmap sizes or error message do not quite respond to passed device and their sizes and the amount of free space they have to hold bitmaps.

From my experiment 4.0 kernel was the last where standard args used for raid1 creation worked.

With mdraid changes introduced in 4.1 kernel it no longer works.

If user even tries size like 1TB - were interesting overflow errors are even seen i.e.:

attempt to access beyond end of device
dm-6: rw=13329, want=18446744073709551120, limit=64
md: super_written gets error=-5, uptodate=0


--

The bad part for lvm2 is -  it's being completely unnoticed by lvcreate command - it return success and no error is printed - even though there is technically create plain 'stripe' device with dead leg - which could be only noticed if user is familiar with reported  lvs  attributes.


Version-Release number of selected component (if applicable):
4.2 kernel

How reproducible:


Steps to Reproduce:
1. lvcreate --type raid1 -L32G [--alloc anywhere] --nosync vg  
2.
3.

Actual results:
lvcreate pretends there is working raid1

Expected results:
lvcreate creates working raid1 and if can't - report correct error.

Additional info:
Comment 1 Alasdair Kergon 2015-08-28 08:43:52 EDT
Neil Brown suggests:

> Probably bug fixed by:
> commit d3b178adb3a3adf54ecf77758138b654c3ee7f09
> when dm-raid created a bitmap it didn't zero out unused fields.
> So when we started using another field it confused dm-raid.
> The patch zeros things properly, and ignores the field when dm-raid is
> in use.

However you told me you still saw this bug with current upstream kernels, including 4.2-rc8.  Is that correct?
Comment 2 Zdenek Kabelac 2015-08-28 09:20:42 EDT
Tested on 4.2.0-0.rc8.git1.1.fc24.x86_64
Comment 3 Zdenek Kabelac 2015-08-28 10:08:28 EDT
Created attachment 1068013 [details]
Ooops stack trace

Here is reproduced  kernel oops  with recent 4.2-rc8 rawhide kernel.

This oops is reach via lvm2 test suite usage:

make check_local T=lvcreate-large-raid.sh

So the invalid 'want=' not just causes unusable raid volumes with 32G and bigger sizes, but may lead to complete kernel deadlock.
Comment 4 Heinz Mauelshagen 2015-08-31 10:54:25 EDT
This is the md bitmap flaw I got ths workaround for:

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index e51de52..7bc7595 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -446,6 +446,7 @@ void bitmap_update_sb(struct bitmap *bitmap)
        sb->sectors_reserved = cpu_to_le32(bitmap->mddev->
                                           bitmap_info.space);
        kunmap_atomic(sb);
+       bitmap->storage.sb_page->index = 0;
        write_page(bitmap, bitmap->storage.sb_page, 1);
 
Neil is aware of it and investigating as well
Comment 5 Dan Callaghan 2015-09-15 22:22:09 EDT
I hit this on my Fedora workstation with kernel-4.1.6-201.fc22.x86_64 and lvm2-2.02.116-3.fc22.x86_64.

I guess there are really two issues here... The first is that the superblock is not written properly on creation but LVM still starts the volume, so the user will then go ahead and create a filesystem and fill it with data. But then some time later on the next reboot the volume is unrecoverable. I lost some data because of this, so it seems like quite a serious bug.

Maybe LVM could stop and re-assemble the array as part of the creation process? That would catch these kinds of problems where the array starts but it cannot actually be assembled again.

The underlying problem is that the dm raid1 code in the kernel is supposed to write a superblock when the array is created, but it's not doing that correctly. Is that right?

I found a related patch posted to the raid list:

http://thread.gmane.org/gmane.linux.raid/49398

but this seems to be a workaround to deal with the bad superblocks at assembly time, not fixing the problem of why the superblocks are wrong in the first place?
Comment 6 Dan Callaghan 2015-09-16 01:19:14 EDT
(In reply to Heinz Mauelshagen from comment #4)
> This is the md bitmap flaw I got ths workaround for:

This patch fixes the problem. Now I can create raid1 volumes without errors and they are successfully deactivated and re-activated. Seems like the superblock is being written properly now.

BTW why is the patch in a private comment?
Comment 7 Dan Callaghan 2015-09-16 01:20:02 EDT
(In reply to Dan Callaghan from comment #6)
> This patch fixes the problem. Now I can create raid1 volumes without errors
> and they are successfully deactivated and re-activated. Seems like the
> superblock is being written properly now.

I was using a 4.1.7 Fedora kernel plus the patch in comment 4.
Comment 8 NeilBrown 2015-09-16 02:47:25 EDT
Re: comment 5.
That patches does fix the problem of why the superblocks are wrong by adding the __GFP_ZERO flag so that that unused fields in the superblock are zeroed.
The patch went upstream as the commit mentioned in comment 1

Re: comment 3.
The stacktrace shows that a page->index was 00000000fffffff1 (eax>>2).

I think that is a very different bug and probably not an md bug. It is hard to be sure though.
Comment 9 Alasdair Kergon 2015-09-16 07:49:53 EDT
(In reply to Dan Callaghan from comment #6)
> BTW why is the patch in a private comment?

All made public.
Comment 10 Heinz Mauelshagen 2015-10-01 16:01:28 EDT
Neil meanwhile found the flaw (i.e. bitmap->cluster_slot containing -1 in the non-clustered case) worked around with my patch as of comment #4:

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c

index 50c9373cebdd..4f22e919787a 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1995,7 +1995,8 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
 	if (bitmap->mddev->bitmap_info.offset || bitmap->mddev->bitmap_info.file)
 		ret = bitmap_storage_alloc(&store, chunks,
 					   !bitmap->mddev->bitmap_info.external,
-					   bitmap->cluster_slot);
+					   mddev_is_clustered(bitmap->mddev)
+					   ? bitmap->cluster_slot : 0);
 	if (ret)
 		goto err;
Comment 11 Frank Haefemeier 2015-10-28 17:11:52 EDT
Is there any chance to get this bug fixed in Fedora 22? In the actual version 4.2.3-200.fc22.x86_64 I can't create new RADI1 LVs.
Comment 12 Frank Haefemeier 2015-12-17 12:23:36 EST
After update Fedora 22 to kernel 4.2.6-201.fc22.x86_64 it seems be fixed. I tested it and it was possible to create LV >32GB.
Comment 13 Heinz Mauelshagen 2015-12-17 12:43:48 EST
(In reply to Frank Haefemeier from comment #12)
> After update Fedora 22 to kernel 4.2.6-201.fc22.x86_64 it seems be fixed. I
> tested it and it was possible to create LV >32GB.

Fixed with upstream commit da6fb7a9e.
WFM with Fedora kernel 4.2.6-200.fc22.x86_64, closing.

Note You need to log in before you can comment on or make changes to this bug.