| Summary: | raidX is not supporting degraded activation properly | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | Zdenek Kabelac <zkabelac> | ||||||
| Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||||
| lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | NEW --- | Docs Contact: | |||||||
| Severity: | unspecified | ||||||||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, plambri, prajnoha, zkabelac | ||||||
| Version: | 2.02.169 | Flags: | rule-engine:
lvm-technical-solution?
rule-engine: lvm-test-coverage? |
||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Zdenek Kabelac
2016-12-08 15:38:13 UTC
I've just got a similar issue on Fedora 25 with a RAID1 LV (which is actually the metadata disk of a thin device) but I got it after having repaired it. Now every time I try to activate the VG, it logs the same error: Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: Unable to assemble array: Invalid superblocks Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding target to table Is there any way to recover from this situation? (In reply to Pierguido Lambri from comment #1) > I've just got a similar issue on Fedora 25 with a RAID1 LV (which is > actually the metadata disk of a thin device) but I got it after having > repaired it. > Now every time I try to activate the VG, it logs the same error: > > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: raid: New device > injected into existing raid set without 'delta_disks' or 'rebuild' parameter > specified > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: table: 253:7: raid: > Unable to assemble array: Invalid superblocks > Dec 27 20:28:09 plambri-affligem kernel: device-mapper: ioctl: error adding > target to table > > Is there any way to recover from this situation? i've never heard of this before... do you have a reproducer? Created attachment 1255269 [details]
Trace of lvm2 test suite lvconvert-raid.sh
This is our buildbot sample where is this 'injection' visible.
Created attachment 1255284 [details]
part of lvconvert-raid.txt with more debug
I'm providing some more debug version (only cut of trace).
[ 0:01] #libdm-deptree.c:2731 Loading @PREFIX@vg-LV1 table (253:10)
[ 0:01] #libdm-deptree.c:2675 Adding target to (253:10): 0 8192 raid raid1 3 0 region_size 1024 2 253:11 253:12 253:13 253:14
[ 0:01] #ioctl/libdm-iface.c:1838 dm table (253:10) [ opencount flush ] [16384] (*1)
[ 0:01] #ioctl/libdm-iface.c:1838 dm reload (253:10) [ noopencount flush ] [16384] (*1)
[ 0:01] #activate/activate.c:2132 Requiring flush for LV @PREFIX@vg/LV1.
[ 0:01] #mm/memlock.c:582 Entering critical section (suspending).
[ 0:01] #mm/memlock.c:551 Lock: Memlock counters: locked:0 critical:1 daemon:0 suspended:0
[ 0:01] #mm/memlock.c:475 Locking memory
[ 0:01] #libdm-config.c:1064 activation/use_mlockall not found in config: defaulting to 0
[ 0:01] 6,10381,158994530818,-;device-mapper: raid: Superblocks created for new raid set
[ 0:01] 6,10382,158994540522,-;md/raid1:mdX: not clean -- starting background reconstruction
[ 0:01] 6,10383,158994540537,-;md/raid1:mdX: active with 2 out of 2 mirrors
[ 0:01] #mm/memlock.c:287 mlock 0KiB 5563adeda000 - 5563ae0df000 r-xp 00000000 08:06 10756824
This looks like primary suspect.
'raid' table is preloaded - and it does look like raid is already starting to do some action based on this new table characteristic.
lvm2 expects 'pre-loaded' table is not having any effect until there is matching 'resume' operation.
This upstream patch: https://www.redhat.com/archives/lvm-devel/2017-February/msg00160.html should minimize the chance of hiting 'race' with mdraid core on removal of active origin with snapshots - were couple extra table reloads have been executed before final 'origin' removal. Note: this is purely addressing issue with 'lvremove -ff' from test suite. It's not addressing the bug from BZ description - there could be possibly a few more cases were we could be 'smarter' in lvm2 to avoid racy logic to happen. |