Description of problem: When raid LV (let's for simplicity say raid1) is having disk issue with particular raid image device - user still should be able to activate such device in 'degraded' mode (lvm.conf activation/activation_mode="degraded") ATM internal lvm2 logic detects missing device for rimageX, then the activation code merging meaning of degraded mode with partial LV activation and the resulting table tries to push a raid where faulty device is replaced with 'error' segments instead of activation of raid device is 'degreaded' mode (without ANY LV_rimage_X-missing_Y_Z device in dm table) How to test - create raid & wait for sync replace one PV with error device try degraded activation Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: device-mapper: raid: New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified device-mapper: table: 253:9: raid: Unable to assemble array: Invalid superblocks device-mapper: ioctl: error adding target to table Expected results: Correct activation of 'raid' with just 1 health leg. Additional info:
I've just got a similar issue on Fedora 25 with a RAID1 LV (which is actually the metadata disk of a thin device) but I got it after having repaired it. Now every time I try to activate the VG, it logs the same error: Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: Unable to assemble array: Invalid superblocks Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding target to table Is there any way to recover from this situation?
(In reply to Pierguido Lambri from comment #1) > I've just got a similar issue on Fedora 25 with a RAID1 LV (which is > actually the metadata disk of a thin device) but I got it after having > repaired it. > Now every time I try to activate the VG, it logs the same error: > > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device > injected into existing raid set without 'delta_disks' or 'rebuild' parameter > specified > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: > Unable to assemble array: Invalid superblocks > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding > target to table > > Is there any way to recover from this situation? i've never heard of this before... do you have a reproducer?
Created attachment 1255269 [details] Trace of lvm2 test suite lvconvert-raid.sh This is our buildbot sample where is this 'injection' visible.
Created attachment 1255284 [details] part of lvconvert-raid.txt with more debug I'm providing some more debug version (only cut of trace). [ 0:01] #libdm-deptree.c:2731 Loading @PREFIX@vg-LV1 table (253:10) [ 0:01] #libdm-deptree.c:2675 Adding target to (253:10): 0 8192 raid raid1 3 0 region_size 1024 2 253:11 253:12 253:13 253:14 [ 0:01] #ioctl/libdm-iface.c:1838 dm table (253:10) [ opencount flush ] [16384] (*1) [ 0:01] #ioctl/libdm-iface.c:1838 dm reload (253:10) [ noopencount flush ] [16384] (*1) [ 0:01] #activate/activate.c:2132 Requiring flush for LV @PREFIX@vg/LV1. [ 0:01] #mm/memlock.c:582 Entering critical section (suspending). [ 0:01] #mm/memlock.c:551 Lock: Memlock counters: locked:0 critical:1 daemon:0 suspended:0 [ 0:01] #mm/memlock.c:475 Locking memory [ 0:01] #libdm-config.c:1064 activation/use_mlockall not found in config: defaulting to 0 [ 0:01] 6,10381,158994530818,-;device-mapper: raid: Superblocks created for new raid set [ 0:01] 6,10382,158994540522,-;md/raid1:mdX: not clean -- starting background reconstruction [ 0:01] 6,10383,158994540537,-;md/raid1:mdX: active with 2 out of 2 mirrors [ 0:01] #mm/memlock.c:287 mlock 0KiB 5563adeda000 - 5563ae0df000 r-xp 00000000 08:06 10756824 This looks like primary suspect. 'raid' table is preloaded - and it does look like raid is already starting to do some action based on this new table characteristic. lvm2 expects 'pre-loaded' table is not having any effect until there is matching 'resume' operation.
This upstream patch: https://www.redhat.com/archives/lvm-devel/2017-February/msg00160.html should minimize the chance of hiting 'race' with mdraid core on removal of active origin with snapshots - were couple extra table reloads have been executed before final 'origin' removal. Note: this is purely addressing issue with 'lvremove -ff' from test suite. It's not addressing the bug from BZ description - there could be possibly a few more cases were we could be 'smarter' in lvm2 to avoid racy logic to happen.
See transcript below with degraded raid1 LV variations running fine. Closing. Only reopen in case of reproducer documented herein. # lvm version LVM version: 2.03.18(2)-git (2022-11-10) Library version: 1.02.189-git (2022-11-10) Driver version: 4.47.0 Configuration: ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix =/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 -- libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-def ault-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-fsadm --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig -- enable-cmdlib --enable-dmeventd --enable-blkid_wiping --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --with-cache=intern al --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-dlmcontrol --enable-lvmlockd-sanlock --enable-dbus-service --enable-notify-dbus --enable- dmfilemapd --with-writecache=internal --with-vdo=internal --with-vdo-format=/usr/bin/vdoformat --with-integrity=internal --with-default-use-devices-fi le=1 --disable-silent-rules --enable-app-machineid --enable-editline --disable-readline # lvmconfig --ty default activation/activation_mode # activation_mode="degraded" # vgcreate t /dev/sd[a-b] # vgs --unit=g --nohe -opvcount,size t 2 2047.99g # lvcreate -nt -L512 --ty raid1 t # lvs --nohe -ao+devices t t t rwi-a-r--- 512.00m 100.00 t_rimage_0(0),t_rimage_1(0) [t_rimage_0] t iwi-aor--- 512.00m /dev/sda(1) [t_rimage_1] t iwi-aor--- 512.00m /dev/sdb(1) [t_rmeta_0] t ewi-aor--- 4.00m /dev/sda(0) [t_rmeta_1] t ewi-aor--- 4.00m /dev/sdb(0) # echo offline > /sys/block/sda/device/state # lvs --nohe -ao+devices t WARNING: Couldn't find device with uuid ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv. WARNING: VG t is missing PV ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv (last written to /dev/sda). t t rwi-a-r-p- 512.00m 100.00 t_rimage_0(0),t_rimage_1(0) [t_rimage_0] t iwi-aor-p- 512.00m [unknown](1) [t_rimage_1] t iwi-aor--- 512.00m /dev/sdb(1) [t_rmeta_0] t ewi-aor-p- 4.00m [unknown](0) [t_rmeta_1] t ewi-aor--- 4.00m /dev/sdb(0) # mkfs -t ext4 /dev/t/t # fsck -fy /dev/t/t # lvchange -an t/t # lvs --nohe t -oattr t WARNING: Couldn't find device with uuid ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv. WARNING: VG t is missing PV ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv (last written to /dev/sda). rwi---r-p- # lvchange -ay t/t # lvs --nohe t -oattr t WARNING: Couldn't find device with uuid ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv. WARNING: VG t is missing PV ciKA6u-SeeH-YeQP-ogAb-eKew-u14o-i3Ddyv (last written to /dev/sda). rwi-a-r-p- # fsck -fy /dev/t/t # echo running > /sys/block/sda/device/state # lvs --nohe -ao+devices t t t rwi-a-r-r- 512.00m 100.00 t_rimage_0(0),t_rimage_1(0) [t_rimage_0] t Iwi-aor-r- 512.00m /dev/sda(1) [t_rimage_1] t iwi-aor--- 512.00m /dev/sdb(1) [t_rmeta_0] t ewi-aor-r- 4.00m /dev/sda(0) [t_rmeta_1] t ewi-aor--- 4.00m /dev/sdb(0) # lvchange --ref t/t # lvs --nohe -ao+devices t t t rwi-a-r--- 512.00m 100.00 t_rimage_0(0),t_rimage_1(0) [t_rimage_0] t iwi-aor--- 512.00m /dev/sda(1) [t_rimage_1] t iwi-aor--- 512.00m /dev/sdb(1) [t_rmeta_0] t ewi-aor--- 4.00m /dev/sda(0) [t_rmeta_1] t ewi-aor--- 4.00m /dev/sdb(0) # fsck -fy /dev/t/t ^ with sdb also wfm