Bug 1451933

Summary: HA LVM agent needs to update metadata (pvscan --cache) before starting/relocation tagged resource
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: agk, cfeist, cluster-maint, fdinitto, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-102.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 15:00:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-05-17 22:44:03 UTC
Description of problem:
I found this when attempting to verify feature bug 1159328 (lvmcache support for RH Cluster). 

I set up HA cached volumes and then uncached them a couple ways ('lvconvert --splitcache' and 'lvconvert --uncache'), then attempted to relocate the resources. However w/o the clvmd HA method, as cache is only supported with tagging, there's no consistent metadata view, so the node being relocated to didn't know of the change. I checked the resource-agent and found that on start it either does either a 'vgscan $vg' (which isn't supported and will fail) or a vgscan, which doesn't properly pick up metadata changes like a 'pvscan --cache' would. This affects all different lvm types that could potentially be HA using the tagging method, and have metadata changed on the active node.


        ocf_log info "Activating volume group $vg"
        if [ "$LVM_MAJOR" -eq "1" ]; then
                ocf_run vgscan $vg
        else
                ocf_run vgscan
        fi


I took HA out of the picture to illustrate what the agent is basically doing here with two shared storage machines with no locking.


# HARDING-02
[root@harding-02 ~]# lvcreate -n origin -L 100M VG
  Logical volume "origin" created.
[root@harding-02 ~]# lvcreate --type cache-pool -n POOL -L 100M VG /dev/mapper/mpathb1
  Using default stripesize 64.00 KiB.
  Logical volume "POOL" created.
[root@harding-02 ~]# lvconvert --yes --type cache --cachepool VG/POOL VG/origin
  Logical volume VG/origin is now cached.
[root@harding-02 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize   Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C--- 100.00m                       0.00   0.49   0.00     POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao---- 100.00m                                              /dev/mapper/mpathb1(4)
  [POOL_cmeta]    VG  ewi-ao----   8.00m                                              /dev/mapper/mpathb1(2)
  [lvol0_pmspare] VG  ewi-------   8.00m                                              /dev/mapper/mpathb1(0)
  origin          VG  Cwi-a-C--- 100.00m [POOL] [origin_corig] 0.00   0.49   0.00     origin_corig(0)       
  [origin_corig]  VG  owi-aoC--- 100.00m                                              /dev/mapper/mpatha1(0)


# HARDING-03
[root@harding-03 ~]# pvscan --cache  # it now has a consistent storage view
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                                              POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-------  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi---C---  100.00m [POOL] [origin_corig]                        origin_corig(0)       
  [origin_corig]  VG  owi---C---  100.00m                                              /dev/mapper/mpatha1(0)


# HARDING-02 meta data change
[root@harding-02 ~]# lvconvert --splitcache VG/origin
  Logical volume VG/origin is not cached and cache pool VG/POOL is unused.
[root@harding-02 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Cpy%Sync Devices               
  POOL            VG  Cwi---C--- 100.00m                                    POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi------- 100.00m                                    /dev/mapper/mpathb1(4)
  [POOL_cmeta]    VG  ewi-------   8.00m                                    /dev/mapper/mpathb1(2)
  [lvol0_pmspare] VG  ewi-------   8.00m                                    /dev/mapper/mpathb1(0)
  origin          VG  -wi-a----- 100.00m                                    /dev/mapper/mpatha1(0)
[root@harding-02 ~]# lvchange -an VG/origin




# HARDING-03 w/o a consistent storage view now

# vgscan doesn't take an argument so that's invalid in the script
[root@harding-03 ~]# vgscan VG
  Command does not accept argument: VG.

[root@harding-03 ~]# vgscan
  Reading volume groups from cache.
  Found volume group "VG" using metadata type lvm2

# Still thinks this volume is cached when it's not
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                                              POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-------  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi---C---  100.00m [POOL] [origin_corig]                        origin_corig(0)       
  [origin_corig]  VG  owi---C---  100.00m                                              /dev/mapper/mpatha1(0)
[root@harding-03 ~]# lvchange -ay VG/origin
[root@harding-03 ~]# lvs -a -o +devices
  LV              VG  Attr       LSize    Pool   Origin         Data%  Meta%  Cpy%Sync Devices               
  [POOL]          VG  Cwi---C---  100.00m                       0.12   0.68   0.00     POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao----  100.00m                                              /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-ao----    8.00m                                              /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                              /dev/mapper/mpathc1(0)
  origin          VG  Cwi-a-C---  100.00m [POOL] [origin_corig] 0.12   0.68   0.00     origin_corig(0)       
  [origin_corig]  VG  owi-aoC---  100.00m                                              /dev/mapper/mpatha1(0)

[root@harding-03 ~]# pvscan --cache  # now it has a consistent view but it's too late.
[root@harding-03 ~]# lvs -a -o +devices
  Internal error: WARNING: Segment type cache found does not match expected type striped for VG/origin.
  LV              VG  Attr       LSize    Pool Origin Data%  Meta%  Cpy%Sync Devices               
  POOL            VG  Cwi---C---  100.00m                                    POOL_cdata(0)         
  [POOL_cdata]    VG  Cwi-ao----  100.00m                                    /dev/mapper/mpathc1(4)
  [POOL_cmeta]    VG  ewi-ao----    8.00m                                    /dev/mapper/mpathc1(2)
  [lvol0_pmspare] VG  ewi-------    8.00m                                    /dev/mapper/mpathc1(0)
  origin          VG  -wi-XX--X-  100.00m                                    /dev/mapper/mpatha1(0)


 

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-99.el7.x86_64

Comment 2 Corey Marthaler 2017-05-19 16:20:18 UTC
It appears that 'vgscan --cache' would also suffice in these cache alteration cases. I edited the resource agent to use it, altered the LVs, relocated resources, and saw no issues.

Comment 3 Oyvind Albrigtsen 2017-05-22 13:04:00 UTC
https://github.com/ClusterLabs/resource-agents/pull/980

Comment 4 Oyvind Albrigtsen 2017-05-30 14:37:35 UTC
Additional patch for warning message when not using writethrough cache-mode: https://github.com/ClusterLabs/resource-agents/pull/984

Comment 8 Corey Marthaler 2017-06-13 23:42:07 UTC
Verified that the splitcache and uncache scenarios listed in comment #0 now work with the latest resource-agent. Marking verified in resource-agents-3.9.5-104.

That said, any "more intensive" VG metadata alteration scenarios end up resulting in bug 1430948.

Comment 9 errata-xmlrpc 2017-08-01 15:00:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844