Test lvconvert-raid-reshape-stripes-load-reload.sh seems to now invoke this kernel bug: #lvconvert-raid-reshape-stripes-load-reload.sh:78+ for i in {0..5} #lvconvert-raid-reshape-stripes-load-reload.sh:80+ dmsetup table LVMTEST1286113vg-LV1 #lvconvert-raid-reshape-stripes-load-reload.sh:80+ dmsetup load LVMTEST1286113vg-LV1 /srv/buildbot/lvm2-slave/Fedora_Rawhide_x86_64_KVM/build/test/shell/lvconvert-raid-reshape-stripes-load-reload.sh: line 78: 1286907 Done dmsetup table $vg-$lv1 1286908 Segmentation fault | dmsetup load $vg-$lv1 set +vx; STACKTRACE; set -vx ------------[ cut here ]------------ kernel BUG at drivers/md/raid5.c:7567! invalid opcode: 0000 [#2] SMP PTI CPU: 1 PID: 1286908 Comm: dmsetup Tainted: G D W --------- --- 5.11.0-0.rc3.122.fc34.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 RIP: 0010:raid5_run+0x407/0x4a0 [raid456] Code: 00 8b 83 3c 01 00 00 39 83 bc 00 00 00 0f 85 ac 00 00 00 48 c7 44 24 08 00 00 00 00 8b bb 30 01 00 00 85 ff 0f 84 8a fd ff ff <0f> 0b 48 8b 43 48 48 c7 c6 60 e3 3b c0 48 c7 c7 58 7c 3c c0 48 85 RSP: 0018:ffffa44540a1bb00 EFLAGS: 00010206 RAX: 0000000000000080 RBX: ffff921486940058 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: ffff921486940058 R08: 0000000000000040 R09: 0000000000000000 R10: 000000000000000f R11: ffffa44540035000 R12: ffff921486940070 R13: 0000000000000000 R14: ffff921486940000 R15: ffff921486940070 FS: 00007ff9f5d193c0(0000) GS:ffff9214f8300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc84532a8c8 CR3: 00000000286f0005 CR4: 00000000000206e0 Call Trace: ? bioset_init+0x1e7/0x270 md_run+0x528/0xc20 raid_ctr+0x1370/0x284a [dm_raid] dm_table_add_target+0x178/0x340 table_load+0x10c/0x350 ? dev_suspend+0x2c0/0x2c0 ctl_ioctl+0x1bd/0x450 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7ff9f6172f8b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 be 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fffe42e8368 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ff9f6270c42 RCX: 00007ff9f6172f8b RDX: 000055b3b15b4d80 RSI: 00000000c138fd09 RDI: 0000000000000003 RBP: 00007fffe42e8430 R08: 000000000000ffff R09: 0000000000000000 R10: 00007ff9f61d3980 R11: 0000000000000202 R12: 00007ff9f62e8e12 R13: 0000000000000000 R14: 00007ff9f62e8e12 R15: 00007ff9f62e8e12 Modules linked in: dm_writecache brd raid0 raid10 dm_delay xfs essiv authenc reiserfs dm_crypt dm_integrity raid1 dm_thin_pool dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_cache_smq dm_cache dm_persistent_data dm_bio_prison loop rfkill joydev virtio_net net_failover virtio_balloon failover i2c_piix4 fuse ip_tables crct10dif_pclmul cirrus crc32_pclmul drm_kms_helper crc32c_intel cec drm ghash_clmulni_intel serio_raw ata_generic pata_acpi virtio_blk [last unloaded: scsi_debug] ---[ end trace 950955d2e98f8bbb ]--- RIP: 0010:raid5_run+0x407/0x4a0 [raid456] Code: 00 8b 83 3c 01 00 00 39 83 bc 00 00 00 0f 85 ac 00 00 00 48 c7 44 24 08 00 00 00 00 8b bb 30 01 00 00 85 ff 0f 84 8a fd ff ff <0f> 0b 48 8b 43 48 48 c7 c6 60 e3 3b c0 48 c7 c7 58 7c 3c c0 48 85 RSP: 0018:ffffa44540773b00 EFLAGS: 00010206 RAX: 0000000000000080 RBX: ffff9214bce30058 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: ffff9214bce30058 R08: 0000000000000040 R09: 0000000000000000 R10: 000000000000000f R11: ffffa44540035000 R12: ffff9214bce30070 R13: 0000000000000000 R14: ffff9214bce30000 R15: ffff9214bce30070 FS: 00007ff9f5d193c0(0000) GS:ffff9214f8300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc84532a8c8 CR3: 00000000286f0005 CR4: 00000000000206e0 ##lvconvert-raid-reshape-stripes-load-reload.sh:80+ set +vx ## - /srv/buildbot/lvm2-slave/Fedora_Rawhide_x86_64_KVM/build/test/shell/lvconvert-raid-reshape-stripes-load-reload.sh:80 ## 1 STACKTRACE() called from /srv/buildbot/lvm2-slave/Fedora_Rawhide_x86_64_KVM/build/test/shell/lvconvert-raid-reshape-stripes-load-reload.sh:80 Happens with current kernel in rawhide 5.11-rc3 and lvm2 2.03.11 Looks like some error in driver initialization/validation code.
Actually this issue was already opened as bug 1859336 for kernel 5.8 - so it's still present.
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34.
The BUG_ON gets triggered on single core systems easily, much less on multi core (see below). Rational: 1. the md sync thread calls end_reshape() from raid5_sync_request when done reshaping; end_reshape() _only_ updates the reshape position to MaxSector but keeps the changed layout configuration, i.e. any delta disks, chunk sector or raid algorithm changes; that inconclusive configuration is stored in the superblock 2. dm-raid constructs a mapping loading such inconsistent superblock as of step 1 before step 3 was able to finish and calls md_run() which leads to the bug in raid5.c as of the description 3. the MD RAID personality finish_reshape() is called which resets the reshape information about chunk sectors, delta disks etc.; this is explaining why the BUG is rarely seen on multi-core machines as finish_reshape() races with the dm-raid constructor as of step 2 thus may finish before the superblock gets loaded in step 2. Also, dm-raid postsuspend may even prevent the MD sync thread from calling finish_reshape() and storing superblocks completely.
Upstream patch submitted -> https://listman.redhat.com/archives/dm-devel/2021-April/msg00182.html
(In reply to Heinz Mauelshagen from comment #6) > Upstream patch submitted -> > https://listman.redhat.com/archives/dm-devel/2021-April/msg00182.html This patch is now upstream (and was marked for stable@), see: http://git.kernel.org/linus/f99a8e4373eeacb279bc9696937a55adbff7a28a