Hide Forgot
Description of problem: When raid LV (let's for simplicity say raid1) is having disk issue with particular raid image device - user still should be able to activate such device in 'degraded' mode (lvm.conf activation/activation_mode="degraded") ATM internal lvm2 logic detects missing device for rimageX, then the activation code merging meaning of degraded mode with partial LV activation and the resulting table tries to push a raid where faulty device is replaced with 'error' segments instead of activation of raid device is 'degreaded' mode (without ANY LV_rimage_X-missing_Y_Z device in dm table) How to test - create raid & wait for sync replace one PV with error device try degraded activation Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: device-mapper: raid: New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified device-mapper: table: 253:9: raid: Unable to assemble array: Invalid superblocks device-mapper: ioctl: error adding target to table Expected results: Correct activation of 'raid' with just 1 health leg. Additional info:
I've just got a similar issue on Fedora 25 with a RAID1 LV (which is actually the metadata disk of a thin device) but I got it after having repaired it. Now every time I try to activate the VG, it logs the same error: Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: Unable to assemble array: Invalid superblocks Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding target to table Is there any way to recover from this situation?
(In reply to Pierguido Lambri from comment #1) > I've just got a similar issue on Fedora 25 with a RAID1 LV (which is > actually the metadata disk of a thin device) but I got it after having > repaired it. > Now every time I try to activate the VG, it logs the same error: > > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device > injected into existing raid set without 'delta_disks' or 'rebuild' parameter > specified > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: > Unable to assemble array: Invalid superblocks > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding > target to table > > Is there any way to recover from this situation? i've never heard of this before... do you have a reproducer?
Created attachment 1255269 [details] Trace of lvm2 test suite lvconvert-raid.sh This is our buildbot sample where is this 'injection' visible.
Created attachment 1255284 [details] part of lvconvert-raid.txt with more debug I'm providing some more debug version (only cut of trace). [ 0:01] #libdm-deptree.c:2731 Loading @PREFIX@vg-LV1 table (253:10) [ 0:01] #libdm-deptree.c:2675 Adding target to (253:10): 0 8192 raid raid1 3 0 region_size 1024 2 253:11 253:12 253:13 253:14 [ 0:01] #ioctl/libdm-iface.c:1838 dm table (253:10) [ opencount flush ] [16384] (*1) [ 0:01] #ioctl/libdm-iface.c:1838 dm reload (253:10) [ noopencount flush ] [16384] (*1) [ 0:01] #activate/activate.c:2132 Requiring flush for LV @PREFIX@vg/LV1. [ 0:01] #mm/memlock.c:582 Entering critical section (suspending). [ 0:01] #mm/memlock.c:551 Lock: Memlock counters: locked:0 critical:1 daemon:0 suspended:0 [ 0:01] #mm/memlock.c:475 Locking memory [ 0:01] #libdm-config.c:1064 activation/use_mlockall not found in config: defaulting to 0 [ 0:01] 6,10381,158994530818,-;device-mapper: raid: Superblocks created for new raid set [ 0:01] 6,10382,158994540522,-;md/raid1:mdX: not clean -- starting background reconstruction [ 0:01] 6,10383,158994540537,-;md/raid1:mdX: active with 2 out of 2 mirrors [ 0:01] #mm/memlock.c:287 mlock 0KiB 5563adeda000 - 5563ae0df000 r-xp 00000000 08:06 10756824 This looks like primary suspect. 'raid' table is preloaded - and it does look like raid is already starting to do some action based on this new table characteristic. lvm2 expects 'pre-loaded' table is not having any effect until there is matching 'resume' operation.
This upstream patch: https://www.redhat.com/archives/lvm-devel/2017-February/msg00160.html should minimize the chance of hiting 'race' with mdraid core on removal of active origin with snapshots - were couple extra table reloads have been executed before final 'origin' removal. Note: this is purely addressing issue with 'lvremove -ff' from test suite. It's not addressing the bug from BZ description - there could be possibly a few more cases were we could be 'smarter' in lvm2 to avoid racy logic to happen.