Bug 1741016

Summary: dmeventd doesn't fix a raid even when raid_fault_policy = "allocate"
Product: [Community] LVM and device-mapper Reporter: Nick Owens <mischief>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: dmeventd QA Contact: cluster-qe <cluster-qe>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, thornber, zkabelac
Version: unspecifiedFlags: pm-rhel: lvm-technical-solution?
pm-rhel: lvm-test-coverage?
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-04 16:17:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1560739    
Bug Blocks:    
Attachments:
Description Flags
log of repro
none
hacky patch to make dmeventd repair the raid none

Description Nick Owens 2019-08-14 05:21:15 UTC
Created attachment 1603586 [details]
log of repro

Description of problem:

i have a set of 3 disks. i've created a vg with all 3 and a raid1 lv on top.

if i remove a disk, dmeventd doesn't automatically fix the raid, even though FWICT it is supposed to.

Version-Release number of selected component (if applicable):

lvm2 2.03.02 on rhel 8.0 and on debian 10

How reproducible:

trivially

Steps to Reproduce:

i have attached a script log of the reproduction inside a rhel 8.0 virtual machine.

Actual results:

raid is not repaired automatically; invoking lvconvert --repair manually does fix it

Expected results:

raid is repaired automatically

Comment 1 Nick Owens 2019-08-14 05:27:16 UTC
i've attached a patch i made that skips over the condition in the relevant part of dmeventd. with this change, the raid is automatically repaired. (tested against lvm on debian 10; not tested on rhel).

i did some tracing of the textual events that the kernel is sending, and it appears that `status->insync_regions` is 0, thus preventing lvconvert --repair from running.

i'm not at all familiar with lvm code or the kernel half, but perhaps the condition in dmeventd is wrong, or the kernel event with a 0 insync_regions is spurious or incorrect.

Comment 2 Nick Owens 2019-08-14 05:27:50 UTC
Created attachment 1603587 [details]
hacky patch to make dmeventd repair the raid

Comment 3 Corey Marthaler 2019-08-14 14:33:41 UTC
This is a dup of the following bugs:
rhel7.7: https://bugzilla.redhat.com/show_bug.cgi?id=1560739
rhel8.0: https://bugzilla.redhat.com/show_bug.cgi?id=1729303

And mentioned earlier in rhel7: https://bugzilla.redhat.com/show_bug.cgi?id=1547979#c20

Comment 4 Nick Owens 2019-08-17 21:07:19 UTC
it looks like this was fixed in https://sourceware.org/git/?p=lvm2.git;a=commit;h=ad560a286a0b5d08086324e6194b060c136e9353.

Comment 5 Heinz Mauelshagen 2019-08-21 21:53:47 UTC
Assumed fixed by dependency 1560739

Comment 6 Heinz Mauelshagen 2019-12-04 16:17:10 UTC

*** This bug has been marked as a duplicate of bug 1560739 ***