1373637 – LVM cache: Ability to repair cache origin if RAID

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1373637 - LVM cache: Ability to repair cache origin if RAID

Summary: LVM cache: Ability to repair cache origin if RAID

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Heinz Mauelshagen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1385242
TreeView+	depends on / blocked

Reported:	2016-09-06 20:08 UTC by Jonathan Earl Brassow
Modified:	2021-09-03 12:40 UTC (History)
CC List:	9 users (show)
Fixed In Version:	lvm2-2.02.169-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 21:47:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2222	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2017-08-01 18:42:41 UTC

Description Jonathan Earl Brassow 2016-09-06 20:08:09 UTC

Can repair cdata and cmeta sub-LVs, but not corig sub-LV

[root@bp-01 ~]# lvconvert --repair vg/cpool_cdata
mmap: Invalid argument
Attempt to replace failed RAID images (requires full device resync)? [y/n]: n
[root@bp-01 ~]# on.sh sdc
Turning on sdc
[root@bp-01 ~]# off.sh sdc
Turning off sdc
[root@bp-01 ~]# lvconvert --repair vg/cpool_cmeta
mmap: Invalid argument
Attempt to replace failed RAID images (requires full device resync)? [y/n]: n
[root@bp-01 ~]# lvconvert --repair vg/lv_corig
mmap: Invalid argument
  Cannot convert internal LV vg/lv_corig.

Comment 3 Jonathan Earl Brassow 2017-03-07 22:47:14 UTC

Just to be clear, we obviously want things to work in automatic mode to, so have 'allocate' as the fault handling policy and kill a dev in the _corig sub-LV.  It should work.

Zdenek/Agk should be consulted if they want to see the top-level LV handled too, but that would be extremely confusing to me (i.e. which sub-LV could be meant by that?).  Better and easier to address the affected sub-LV IMHO.  If more than one sub-LV is affected at the same time, dmeventd should raise multiple events for different dm devices and they should be handled in turn.

Also, please be aware that it is possible this has been fixed in an earlier release, but never formally tested and signed-off by QA.  That certainly would explain the fact that it already works.

Comment 4 Roman Bednář 2017-03-08 09:19:17 UTC

Adding QA ack for 7.4. Automated test for verification available. See QA Whiteboard.

Comment 5 Heinz Mauelshagen 2017-03-08 14:24:53 UTC

Any raid cache SubLVs and any raid corig SubLV can be repaired
# lvs -aoname,attr,size,segtype,syncpercent,datastripes,stripesize,reshapelenle,datacopies,regionsize,devices tb|sed 's/  *$//'
  LV                     Attr       LSize   Type       Cpy%Sync #DStr Stripe RSize #Cpy Region  Devices
  [cache]                Cwi---C--- 100.00m cache-pool 0.00         1     0           1      0  cache_cdata(0)
  [cache_cdata]          Cwi-aor--- 100.00m raid1      100.00       2     0           2 512.00k cache_cdata_rimage_0(0),cache_cdata_rimage_1(0)
  [cache_cdata_rimage_0] iwi-aor--- 100.00m linear                  1     0           1      0  /dev/sda(1)
  [cache_cdata_rimage_1] iwi-aor--- 100.00m linear                  1     0           1      0  /dev/sdaa(1)
  [cache_cdata_rmeta_0]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sda(0)
  [cache_cdata_rmeta_1]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdaa(0)
  [cache_cmeta]          ewi-aor---   8.00m raid1      100.00       2     0           2 512.00k cache_cmeta_rimage_0(0),cache_cmeta_rimage_1(0)
  [cache_cmeta_rimage_0] iwi-aor---   8.00m linear                  1     0           1      0  /dev/sdab(1)
  [cache_cmeta_rimage_1] iwi-aor---   8.00m linear                  1     0           1      0  /dev/sdac(1)
  [cache_cmeta_rmeta_0]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdab(0)
  [cache_cmeta_rmeta_1]  ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdac(0)
  [lvol0_pmspare]        ewi-------   8.00m linear                  1     0           1      0  /dev/sdd(0)
  r                      Cwi-a-C---   1.00g cache      0.00         1     0           1      0  r_corig(0)
  [r_corig]              rwi-aoC---   1.00g raid1      100.00       2     0           2 512.00k r_corig_rimage_0(0),r_corig_rimage_1(0)
  [r_corig_rimage_0]     iwi-aor---   1.00g linear                  1     0           1      0  /dev/sdad(1)
  [r_corig_rimage_1]     iwi-aor---   1.00g linear                  1     0           1      0  /dev/sdae(1)
  [r_corig_rmeta_0]      ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdad(0)
  [r_corig_rmeta_1]      ewi-aor---   4.00m linear                  1     0           1      0  /dev/sdae(0)

# lvconvert -y --repair tb/cache_cdata
  tb/cache_cdata does not contain devices specified to replace.
  Faulty devices in tb/cache_cdata successfully replaced.

# lvconvert -y --repair tb/cache_cmeta
  tb/cache_cmeta does not contain devices specified to replace.
  Faulty devices in tb/cache_cmeta successfully replaced.

# lvconvert -y --repair tb/r_corig
  tb/r_corig does not contain devices specified to replace.
  Faulty devices in tb/r_corig successfully replaced.

Comment 7 Roman Bednář 2017-06-05 07:26:20 UTC

Verified. Posting links to test results:


Scenario: kill_primary_synced_raid1_2legs

PASS (warn policy):
https://beaker.cluster-qe.lab.eng.brq.redhat.com/logs/2017/06/587/58770/191571/530715/TESTOUT.log

PASS (allocate policy):
https://beaker.cluster-qe.lab.eng.brq.redhat.com/logs/2017/06/587/58769/191568/530706/TESTOUT.log

3.10.0-671.el7.x86_64

lvm2-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-libs-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
lvm2-cluster-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-event-libs-1.02.140-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 17:15:46 CEST 2017
cmirror-2.02.171-2.el7    BUILT: Wed May 24 16:02:34 CEST 2017


NOTE:

Also verified that both scenarios fail with lvm2-2.02.166-1.el7

Comment 8 errata-xmlrpc 2017-08-01 21:47:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2222

Note You need to log in before you can comment on or make changes to this bug.