Bug 1373637

Summary: LVM cache: Ability to repair cache origin if RAID
Product: Red Hat Enterprise Linux 7 Reporter: Jonathan Earl Brassow <jbrassow>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, zkabelac
Version: 7.2   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.169-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 21:47:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1385242    

Description Jonathan Earl Brassow 2016-09-06 20:08:09 UTC
Can repair cdata and cmeta sub-LVs, but not corig sub-LV

[root@bp-01 ~]# lvconvert --repair vg/cpool_cdata
mmap: Invalid argument
Attempt to replace failed RAID images (requires full device resync)? [y/n]: n
[root@bp-01 ~]# on.sh sdc
Turning on sdc
[root@bp-01 ~]# off.sh sdc
Turning off sdc
[root@bp-01 ~]# lvconvert --repair vg/cpool_cmeta
mmap: Invalid argument
Attempt to replace failed RAID images (requires full device resync)? [y/n]: n
[root@bp-01 ~]# lvconvert --repair vg/lv_corig
mmap: Invalid argument
  Cannot convert internal LV vg/lv_corig.

Comment 3 Jonathan Earl Brassow 2017-03-07 22:47:14 UTC
Just to be clear, we obviously want things to work in automatic mode to, so have 'allocate' as the fault handling policy and kill a dev in the _corig sub-LV.  It should work.

Zdenek/Agk should be consulted if they want to see the top-level LV handled too, but that would be extremely confusing to me (i.e. which sub-LV could be meant by that?).  Better and easier to address the affected sub-LV IMHO.  If more than one sub-LV is affected at the same time, dmeventd should raise multiple events for different dm devices and they should be handled in turn.

Also, please be aware that it is possible this has been fixed in an earlier release, but never formally tested and signed-off by QA.  That certainly would explain the fact that it already works.

Comment 4 Roman Bednář 2017-03-08 09:19:17 UTC
Adding QA ack for 7.4. Automated test for verification available. See QA Whiteboard.

Comment 5 Heinz Mauelshagen 2017-03-08 14:24:53 UTC
Any raid cache SubLVs and any raid corig SubLV can be repaired
# lvs -aoname,attr,size,segtype,syncpercent,datastripes,stripesize,reshapelenle,datacopies,regionsize,devices tb|sed 's/  *$//'
  LV                     Attr       LSize   Type       Cpy%Sync #DStr Stripe RSize #Cpy Region  Devices
  [cache]                Cwi---C--- 100.00m cache-pool 0.00         1     0           1      0  cache_cdata(0)
  [cache_cdata]          Cwi-aor--- 100.00m raid1      100.00       2     0           2 512.00k cache_cdata_rimage_0(0),cache_cdata_rimage_1(0)
  [cache_cdata_rimage_0] iwi-aor--- 100.00m linear                  1     0           1      0  /dev/sda(1)
  [cache_cdata_rimage_1] iwi-aor--- 100.00m linear                  1     0           1      0  /dev/sdaa(1)
  [cache_cdata_rmeta_0]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sda(0)
  [cache_cdata_rmeta_1]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdaa(0)
  [cache_cmeta]          ewi-aor---   8.00m raid1      100.00       2     0           2 512.00k cache_cmeta_rimage_0(0),cache_cmeta_rimage_1(0)
  [cache_cmeta_rimage_0] iwi-aor---   8.00m linear                  1     0           1      0  /dev/sdab(1)
  [cache_cmeta_rimage_1] iwi-aor---   8.00m linear                  1     0           1      0  /dev/sdac(1)
  [cache_cmeta_rmeta_0]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdab(0)
  [cache_cmeta_rmeta_1]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdac(0)
  [lvol0_pmspare]        ewi-------   8.00m linear                  1     0           1      0  /dev/sdd(0)
  r                      Cwi-a-C---   1.00g cache      0.00         1     0           1      0  r_corig(0)
  [r_corig]              rwi-aoC---   1.00g raid1      100.00       2     0           2 512.00k r_corig_rimage_0(0),r_corig_rimage_1(0)
  [r_corig_rimage_0]     iwi-aor---   1.00g linear                  1     0           1      0  /dev/sdad(1)
  [r_corig_rimage_1]     iwi-aor---   1.00g linear                  1     0           1      0  /dev/sdae(1)
  [r_corig_rmeta_0]      ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdad(0)
  [r_corig_rmeta_1]      ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdae(0)

# lvconvert -y --repair tb/cache_cdata
  tb/cache_cdata does not contain devices specified to replace.
  Faulty devices in tb/cache_cdata successfully replaced.

# lvconvert -y --repair tb/cache_cmeta
  tb/cache_cmeta does not contain devices specified to replace.
  Faulty devices in tb/cache_cmeta successfully replaced.

# lvconvert -y --repair tb/r_corig
  tb/r_corig does not contain devices specified to replace.
  Faulty devices in tb/r_corig successfully replaced.

Comment 7 Roman Bednář 2017-06-05 07:26:20 UTC
Verified. Posting links to test results:


Scenario: kill_primary_synced_raid1_2legs

PASS (warn policy):
https://beaker.cluster-qe.lab.eng.brq.redhat.com/logs/2017/06/587/58770/191571/530715/TESTOUT.log

PASS (allocate policy):
https://beaker.cluster-qe.lab.eng.brq.redhat.com/logs/2017/06/587/58769/191568/530706/TESTOUT.log

3.10.0-671.el7.x86_64

lvm2-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-libs-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-cluster-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 17:15:46 CEST 2017
cmirror-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017


NOTE:

Also verified that both scenarios fail with lvm2-2.02.166-1.el7

Comment 8 errata-xmlrpc 2017-08-01 21:47:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2222