Description of problem: Extending an lvmraid(7) type1 mirror for the second time seems to result in the mirror legs not getting synced, *if* there is another type1 mirror in the vg. Version-Release number of selected component (if applicable): 2.02.176 (4.1ubuntu3) How reproducible: Seems to be reliably reproducible here on Ubuntu 18.04 at least. Steps to Reproduce: # quickly fill two 10G files with random data openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero | dd bs=$((1024*1024*1024)) count=10 of=pv1.img iflag=fullblock openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero | dd bs=$((1024*1024*1024)) count=10 of=pv2.img iflag=fullblock # change loop devices if you have loads of snaps in use losetup /dev/loop10 pv1.img losetup /dev/loop11 pv2.img pvcreate /dev/loop10 pvcreate /dev/loop11 vgcreate testvg /dev/loop10 /dev/loop11 lvcreate --type raid1 -L2G -n test testvg watch lvs -o +raid_sync_action,sync_percent,raid_mismatch_count testvg # wait for sync lvcreate --type raid1 -L2G -n test2 testvg watch lvs -o +raid_sync_action,sync_percent,raid_mismatch_count testvg # wait for sync # the following will sync OK, observe kernel message for output from md subsys noting time taken # lvextend -L+2G testvg/test2 watch lvs -o +raid_sync_action,sync_percent,raid_mismatch_count testvg # wait for sync # the following will FAIL to sync, the sync will seem to complete instantly, e.g: # Feb 02 15:22:50 asr-host kernel: md: resync of RAID array mdX # Feb 02 15:22:50 asr-host kernel: md: mdX: resync done. # lvextend -L+2G testvg/test2 lvchange --syncaction check testvg/test2 watch lvs -o +raid_sync_action,sync_percent,raid_mismatch_count testvg # observe error count Actual results: The sync after the final lvextend completes instantly, and a subsequent lvchange --syncaction check reports a high number for raid_mismatch_count Expected results: The sync after the final lvextend should take at least a few seconds, and a subsequent lvchange --syncaction check should not report any errors for raid_mismatch_count (unless the underlying hardware has failed.) Additional info: Launchpad bug: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1814389
ok, so it seems you are creating 2 RAID1 LVs and extending the second one twice - ultimately leaving the second one 3x its original size. I've repeated the commands that you ran on RHEL7.7. I found that the second extend (which is where the problem began in your case) did not instantly return for me and took about the same amount of time as the first; proceeding from ~66% to 100% performing the resync. HOWEVER, I too encountered the strange mismatch count. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert SyncAction Cpy%Sync Mismatches test testvg rwi-a-r--- 2.00g 100.00 idle 100.00 0 test2 testvg rwi-a-r-m- 6.00g 100.00 idle 100.00 1986560 Importantly, I noticed that 'Mismatches' was 0 up to when the 'check' got to 66% - then the number began increasing. This is highly suggestive that the second extend did not have an effective resync. Further, when I wrote out 5GiB to 'testvgf/test2', and rechecked 'Mismatches', I got: LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert SyncAction Cpy%Sync Mismatches test testvg rwi-a-r--- 2.00g 100.00 idle 100.00 0 test2 testvg rwi-a-r-m- 6.00g 100.00 idle 100.00 983040 with 'Mismatches' only starting to count at 80+% (~5/6ths), right where you would expect if the last 1GiB was left alone. There is definitely something to be investigated here.
Importantly, I do not believe there is any danger of corruption or data loss. New writes seem to be operating correctly, so only the unused areas of the disk might contain different data and thus report mismatches. It is a bug and worth investigating, but it shouldn't cause any problems other than scaring the user with a non-zero 'Mismatches' count.
It's been a while since I looked at all this, I assume bug #1640630 is related .. I'm pretty sure I still have LVs where my monthly check does not in fact check the whole volume; this potentially runs the risk of not spotting e.g. hardware data corruption.
This does not need 2 RaidLVs in the VG, 4TiB is the magical size after which extents are not being synchronized any more. On Fedora 30 for instance (5.1.7-300.fc30.x86_64 / 2.02.183(2)), create a 4GiB 2-legged raid1 LV in an empty VG which initially syncs fine but any extension after that do not (they immediate jump to 100 sync percent), thus the extended, unsynced segments are most likely subject to check failures: [root@vm161 ~]# vgs --noh t t 64 0 0 wz--n- 63.75g 63.75g [root@vm161 ~]# lvcreate -L4g -nt -m1 t;lvs --noh t/t Logical volume "t" created. t t rwi-a-r--- 4.00g 0.00 [root@vm161 ~]# lvs --noh t/t t t rwi-a-r--- 4.00g 53.98 [root@vm161 ~]# lvextend -y -L+1g t/t;lvs -noh t/t # <-- this immediately jumps to 100 sync percent! Extending 2 mirror images. Size of logical volume t/t changed from 4.00 GiB (1024 extents) to 5.00 GiB (1280 extents). Logical volume t/t successfully resized. t t rwi-a-r--- 5.00g 100.00 As Jon elaborated on in comment #2, this is not a corruption or data loss issue, because any newly written data to the extended segment will be mirrored. The extended segment is similar to creating a new raid1 LV with the --nosync option _and_ avoiding to read what has not been written before. You'd also get 'check' failures for such --nosync created raid1 unless you removed a previous, synchronized raid1 of the same or larger size and recreated it anew with --nosync which is likely to result in the exact same allocation (unless you also changed allocation arguments). Jon, relative to your '...does not check the whole volume...' remark in comment #2: got examples/logs of that?
> Jon, relative to your '...does not check the whole volume...' remark in comment #2: > got examples/logs of that? Not sure if this was actually meant for me - I've just manually scrubbed my RAID LVs and it's working OK at the moment, but I swear I've seen problems in the past where RAID LVs would get "stuck" and repeatedly not fully check/scrub. Hard to be certain though as my health is variable and it's a while ago that I was looking at this. I've had a couple of disk changes in my array over the last few months and when that happened I likely consolidated the volumes in question, which may have made a difference..
(In reply to Heinz Mauelshagen from comment #4) > This does not need 2 RaidLVs in the VG, 4TiB is the magical size after which Typo: 4GiB it is in my testing
*** Bug 1669999 has been marked as a duplicate of this bug. ***
Analysis so far shows, that extension past 4GiB returns 2GiB skipped sectors to the MD sync thread from the bitmap layer for the extended segment thus preventing synchronization. Found a solution by requesting recovery imperatively in the dm-raid target via the MD_RECOVERY_REQUESTED flag from MD. Will investigate a bit more aiming to avoid issues with the MD recovery sync request path into the bitmap layer.
After the second extension, the bitmap skip too much. [ 1007.906146] md/raid1:mdX: not clean -- starting background reconstruction [ 1007.945422] md/raid1:mdX: active with 2 out of 2 mirrors [ 1008.046904] md: resync of RAID array mdX [ 1008.069887] /* We can skip this 0:4194304 block, and probably several more */ [ 1008.111110] /* We can skip this 4194304:4194304 block, and probably several more */ [ 1008.155622] /* We can skip this 8388608:4194304 block, and probably several more */ [ 1008.200560] md: mdX: resync done. I'll update more information.
Hi all md uses bitmap to check if it needs to do resync. The bitmap uses one bit to describe one area of disk space. The bitmap chunk is the size. In this case, it's 2MB. The bitmap is stored on each member disk of raid device. It reads bitmap during creating raid device. Function md_bitmap_load->md_bitmap_init_from_disk does this job. For the first creation, it reads the bitmap from one member disk and sets every bit to 1. It needs a full resync. In this case, there are 1024 bits in bitmap. For the first exchange, lvextend -L+2G testvg/test: It reads the bitmap from the disk. The first 1024 bits are not set, so it skip this area (2G) The following 1024 bits are set. So it does resync for this area (2G) [ 1199.510291] mdX: bitmap initialized from disk: read 1 pages, set 1024 of 2048 bits [ 1199.531596] md: resync of RAID array mdX [ 1209.863815] md: mdX: resync done. For the second exchange, lvextend -L+2G testvg/test: It reads the bitmap from one member disk. It finds all the bits are zero. [ 1370.240692] mdX: bitmap initialized from disk: read 1 pages, set 0 of 3072 bits [ 1370.263343] md: resync of RAID array mdX [ 1370.267883] md: mdX: resync done. paddr = kmap_atomic(page); if (test_bit(BITMAP_HOSTENDIAN, &bitmap->flags)) b = test_bit(bit, paddr); else b = test_bit_le(bit, paddr); kunmap_atomic(paddr); if (b) { <<<<---------------- all bits are zero /* if the disk bit is set, set the memory bit */ int needed = ((sector_t)(i+1) << bitmap->counts.chunkshift >= start); md_bitmap_set_memory_bits(bitmap, (sector_t)i << bitmap->counts.chunkshift, needed); bit_cnt++; } So there are some codes in lvm that sets the bitmap to 1 during lvextend. But it doesn't do this job during the second extending. Thanks Xiao
Fixed based on md_bitmap_resize() usage. Kernel patch series sent @dm-devel: https://www.redhat.com/archives/dm-devel/2019-October/msg00005.html patchwork references: https://patchwork.kernel.org/patch/11169133 https://patchwork.kernel.org/patch/11169131 https://patchwork.kernel.org/patch/11169135 https://patchwork.kernel.org/patch/11169137
My brain's not working well at the moment for health reasons - will this have affected mdadm / mdraid as well as LVM? Or does mdadm have its own interface? I'm confused about the underlying architecture here..
Also, meant to say - thank you all for looking into this - looks like it was quite involved. I one of those people who seems to be irredeemably attractive to obscure corner cases in software :)
(In reply to Steve D from comment #17) > My brain's not working well at the moment for health reasons - will this > have affected mdadm / mdraid as well as LVM? Or does mdadm have its own > interface? I'm confused about the underlying architecture here.. All the best :-) No mdraid issues here, just the device-mapper kernel side, i.e. the dm-raid target interfacing into kernel mdraid needs fixes.
In since kernel v5.5-rc7.