1451933 – HA LVM agent needs to update metadata (pvscan --cache) before starting/relocation tagged resource

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1451933 - HA LVM agent needs to update metadata (pvscan --cache) before starting/relocation tagged resource

Summary: HA LVM agent needs to update metadata (pvscan --cache) before starting/reloca...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	resource-agents
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Oyvind Albrigtsen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-17 22:44 UTC by Corey Marthaler
Modified:	2017-08-01 15:00 UTC (History)
CC List:	5 users (show)
Fixed In Version:	resource-agents-3.9.5-102.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 15:00:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1844	0	normal	SHIPPED_LIVE	resource-agents bug fix and enhancement update	2017-08-01 17:49:20 UTC

Description Corey Marthaler 2017-05-17 22:44:03 UTC

Description of problem:
I found this when attempting to verify feature bug 1159328 (lvmcache support for RH Cluster). 

I set up HA cached volumes and then uncached them a couple ways ('lvconvert --splitcache' and 'lvconvert --uncache'), then attempted to relocate the resources. However w/o the clvmd HA method, as cache is only supported with tagging, there's no consistent metadata view, so the node being relocated to didn't know of the change. I checked the resource-agent and found that on start it either does either a 'vgscan $vg' (which isn't supported and will fail) or a vgscan, which doesn't properly pick up metadata changes like a 'pvscan --cache' would. This affects all different lvm types that could potentially be HA using the tagging method, and have metadata changed on the active node.


        ocf_log info "Activating volume group $vg"
        if [ "$LVM_MAJOR" -eq "1" ]; then
                ocf_run vgscan $vg
        else
                ocf_run vgscan
        fi


I took HA out of the picture to illustrate what the agent is basically doing here with two shared storage machines with no locking.


# HARDING-02
[root@harding-02 ~]# lvcreate -n origin -L 100M VG
  Logical volume "origin" created.
[root@harding-02 ~]# lvcreate --type cache-pool -n POOL -L 100M VG /dev/mapper/mpathb1
  Using default stripesize 64.00 KiB.
  Logical volume "POOL" created.
[root@harding-02 ~]# lvconvert --yes --type cache --cachepool VG/POOL VG/origin
  Logical volume VG/origin is now cached.
[root@harding-02 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize   Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C--- 100.00m                       0.00   0.49   0.00     POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao---- 100.00m                                              /dev/mapper/mpathb1(4)
  [POOL_cmeta]    VG  ewi-ao----   8.00m                                              /dev/mapper/mpathb1(2)
  [lvol0_pmspare] VG  ewi-------   8.00m                                              /dev/mapper/mpathb1(0)
  origin          VG  Cwi-a-C--- 100.00m [POOL] [origin_corig] 0.00   0.49   0.00     origin_corig(0)       
  [origin_corig]  VG  owi-aoC--- 100.00m                                              /dev/mapper/mpatha1(0)


# HARDING-03
[root@harding-03 ~]# pvscan --cache  # it now has a consistent storage view
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                                              POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-------  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi---C---  100.00m [POOL] [origin_corig]                        origin_corig(0)       
  [origin_corig]  VG  owi---C---  100.00m                                              /dev/mapper/mpatha1(0)


# HARDING-02 meta data change
[root@harding-02 ~]# lvconvert --splitcache VG/origin
  Logical volume VG/origin is not cached and cache pool VG/POOL is unused.
[root@harding-02 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Cpy%Sync Devices               
  POOL            VG  Cwi---C--- 100.00m                                    POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi------- 100.00m                                    /dev/mapper/mpathb1(4)
  [POOL_cmeta]    VG  ewi-------   8.00m                                    /dev/mapper/mpathb1(2)
  [lvol0_pmspare] VG  ewi-------   8.00m                                    /dev/mapper/mpathb1(0)
  origin          VG  -wi-a----- 100.00m                                    /dev/mapper/mpatha1(0)
[root@harding-02 ~]# lvchange -an VG/origin




# HARDING-03 w/o a consistent storage view now

# vgscan doesn't take an argument so that's invalid in the script
[root@harding-03 ~]# vgscan VG
  Command does not accept argument: VG.

[root@harding-03 ~]# vgscan
  Reading volume groups from cache.
  Found volume group "VG" using metadata type lvm2

# Still thinks this volume is cached when it's not
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                                              POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-------  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi---C---  100.00m [POOL] [origin_corig]                        origin_corig(0)       
  [origin_corig]  VG  owi---C---  100.00m                                              /dev/mapper/mpatha1(0)
[root@harding-03 ~]# lvchange -ay VG/origin
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                       0.12   0.68   0.00     POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao----  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-ao----    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi-a-C---  100.00m [POOL] [origin_corig] 0.12   0.68   0.00     origin_corig(0)       
  [origin_corig]  VG  owi-aoC---  100.00m                                              /dev/mapper/mpatha1(0)

[root@harding-03 ~]# pvscan --cache  # now it has a consistent view but it's too late.
[root@harding-03 ~]# lvs -a -o +devices
  Internal error: WARNING: Segment type cache found does not match expected type striped for VG/origin.
  LV              VG  Attr       LSize    Pool Origin Data%  Meta%  Cpy%Sync Devices               
  POOL            VG  Cwi---C---  100.00m                                    POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao----  100.00m                                    /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-ao----    8.00m                                    /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                    /dev/mapper/mpathc1(0)
  origin          VG  -wi-XX--X-  100.00m                                    /dev/mapper/mpatha1(0)


 

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-99.el7.x86_64

Comment 2 Corey Marthaler 2017-05-19 16:20:18 UTC

It appears that 'vgscan --cache' would also suffice in these cache alteration cases. I edited the resource agent to use it, altered the LVs, relocated resources, and saw no issues.

Comment 3 Oyvind Albrigtsen 2017-05-22 13:04:00 UTC

https://github.com/ClusterLabs/resource-agents/pull/980

Comment 4 Oyvind Albrigtsen 2017-05-30 14:37:35 UTC

Additional patch for warning message when not using writethrough cache-mode: https://github.com/ClusterLabs/resource-agents/pull/984

Comment 8 Corey Marthaler 2017-06-13 23:42:07 UTC

Verified that the splitcache and uncache scenarios listed in comment #0 now work with the latest resource-agent. Marking verified in resource-agents-3.9.5-104.

That said, any "more intensive" VG metadata alteration scenarios end up resulting in bug 1430948.

Comment 9 errata-xmlrpc 2017-08-01 15:00:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844

Note You need to log in before you can comment on or make changes to this bug.