Bug 2148997

Summary: Problem with a move of the LVM-activate resource with partial_activation='True'
Product: Red Hat Enterprise Linux 8 Reporter: Simon Foucek <sfoucek>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: ASSIGNED --- QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.7CC: agk, cfeist, cluster-maint, fdinitto, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2151203 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2151203    

Description Simon Foucek 2022-11-28 13:13:38 UTC
Description of problem:

Strange behavior of the 'move' action on resource LVM-activate.
When creating LVM-activate, I set partial_activation='True', but the resource refuses to move to the node that sees the partial VG and writes strange logs on startup.

Version-Release number of selected component (if applicable):
pacemaker-2.1.4-5.el8.x86_64


How reproducible:
1. Create VG and LV on it
2. Create an LVM-activate resource with partial_activation='True' on a 2-node cluster
3. Set the offline state of a PV on the next node
4. Call move on LVM-activate resource

Steps to Reproduce:
>[root@virt-512 ~]# lvcreate -ay -v --addtag pacemaker --config activation{volume_list=[\"@pacemaker\"]} --name raidlv --type raid1 --extents 100%VG --mirrors 1 --nosync raidvg
  Reloading config files
  Converted 100% of VG (1530) extents into 765 (with mimages 2 and stripes 1 for segtype raid1).
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
  Creating logical volume raidlv
  Found fewer allocatable extents for logical volume raidlv than requested: using 765 extents (reduced by 2).
  Creating logical volume raidlv_rimage_0
  Creating logical volume raidlv_rmeta_0
  Creating logical volume raidlv_rimage_1
  Creating logical volume raidlv_rmeta_1
  Archiving volume group "raidvg" metadata (seqno 6).
  activation/volume_list configuration setting defined: Checking the list to match raidvg/raidlv_rmeta_0.
  Creating raidvg-raidlv_rmeta_0
  Loading table for raidvg-raidlv_rmeta_0 (253:2).
  Resuming raidvg-raidlv_rmeta_0 (253:2).
  activation/volume_list configuration setting defined: Checking the list to match raidvg/raidlv_rmeta_1.
  Creating raidvg-raidlv_rmeta_1
  Loading table for raidvg-raidlv_rmeta_1 (253:3).
  Resuming raidvg-raidlv_rmeta_1 (253:3).
  Initializing 4.00 KiB of logical volume raidvg/raidlv_rmeta_0 with value 0.
  Initializing 4.00 KiB of logical volume raidvg/raidlv_rmeta_1 with value 0.
  Removing raidvg-raidlv_rmeta_0 (253:2)
  Removing raidvg-raidlv_rmeta_1 (253:3)
  Archiving volume group "raidvg" metadata (seqno 7).
  Activating logical volume raidvg/raidlv.
  activation/volume_list configuration setting defined: Checking the list to match raidvg/raidlv.
  Creating raidvg-raidlv_rmeta_0
  Loading table for raidvg-raidlv_rmeta_0 (253:2).
  Resuming raidvg-raidlv_rmeta_0 (253:2).
  Creating raidvg-raidlv_rimage_0
  Loading table for raidvg-raidlv_rimage_0 (253:3).
  Resuming raidvg-raidlv_rimage_0 (253:3).
  Creating raidvg-raidlv_rmeta_1
  Loading table for raidvg-raidlv_rmeta_1 (253:4).
  Resuming raidvg-raidlv_rmeta_1 (253:4).
  Creating raidvg-raidlv_rimage_1
  Loading table for raidvg-raidlv_rimage_1 (253:5).
  Resuming raidvg-raidlv_rimage_1 (253:5).
  Creating raidvg-raidlv
  Loading table for raidvg-raidlv (253:6).
  Resuming raidvg-raidlv (253:6).
  Monitored LVM-0fQ5eFt5s2Yq7U8tCseKe4S84AC5SR36FnwsmR8nCaWszdSXVt2xbCQglNDRetZu for events
  Wiping known signatures on logical volume raidvg/raidlv.
  Initializing 4.00 KiB of logical volume raidvg/raidlv with value 0.
  Logical volume "raidlv" created.
  Creating volume group backup "/etc/lvm/backup/raidvg" (seqno 8).
  Reloading config files

>[root@virt-512 ~]# pcs resource create raidvg ocf:heartbeat:LVM-activate vg_access_mode='system_id' vgname='raidvg' partial_activation='True'
>[root@virt-512 ~]# pcs status
Cluster name: STSRHTS14445
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-510 (version 2.1.4-5.el8-dc6eb4362e) - partition with quorum
  * Last updated: Mon Nov 28 13:28:44 2022
  * Last change:  Mon Nov 28 13:28:42 2022 by root via cibadmin on virt-512
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ virt-510 virt-512 ]

Full List of Resources:
  * fence-virt-510	(stonith:fence_xvm):	 Started virt-510
  * fence-virt-512	(stonith:fence_xvm):	 Started virt-512
  * Clone Set: locking-clone [locking]:
    * Started: [ virt-510 virt-512 ]
  * raidvg	(ocf::heartbeat:LVM-activate):	 Started virt-512

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

>[root@virt-510 ~]# echo offline > /sys/block/sda/device/state
>[root@virt-512 ~]# pcs resource move raidvg
Warning: Creating location constraint 'cli-ban-raidvg-on-virt-512' with a score of -INFINITY for resource raidvg on virt-512.
	This will prevent raidvg from running on virt-512 until the constraint is removed
	This will be the case even if virt-512 is the last node in the cluster
>[root@virt-512 ~]# pcs status
Cluster name: STSRHTS14445

WARNINGS:
Following resources have been moved and their move constraints are still in place: 'raidvg'
Run 'pcs constraint location' or 'pcs resource clear <resource id>' to view or remove the constraints, respectively

Cluster Summary:
  * Stack: corosync
  * Current DC: virt-510 (version 2.1.4-5.el8-dc6eb4362e) - partition with quorum
  * Last updated: Mon Nov 28 13:30:28 2022
  * Last change:  Mon Nov 28 13:30:24 2022 by root via crm_resource on virt-512
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ virt-510 virt-512 ]

Full List of Resources:
  * fence-virt-510	(stonith:fence_xvm):	 Started virt-510
  * fence-virt-512	(stonith:fence_xvm):	 Started virt-512
  * Clone Set: locking-clone [locking]:
    * Started: [ virt-510 virt-512 ]
  * raidvg	(ocf::heartbeat:LVM-activate):	 Stopped

Failed Resource Actions:
  * raidvg_start_0 on virt-510 'error' (1): call=37, status='complete', exitreason='raidvg: failed to activate.', last-rc-change='Mon Nov 28 13:30:24 2022', queued=0ms, exec=1628ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
>[root@virt-512 ~]# pcs resource debug-start raidvg
Operation force-start for raidvg (ocf:heartbeat:LVM-activate) returned 0 (ok)
  pvscan[95039] PV /dev/sda online.
  pvscan[95039] PV /dev/vda2 online.
  pvscan[95039] PV /dev/sdb online.
  pvscan[95039] PV /dev/sdc online.
  pvscan[95039] PV /dev/sdd online.
  pvscan[95039] PV /dev/sde online.
  pvscan[95039] PV /dev/sdf online.
active

Nov 28 13:30:52 INFO: Activating raidvg
  Volume Group system ID is already "virt-512".
Nov 28 13:30:53 INFO:  PARTIAL MODE. Incomplete logical volumes will be processed. 
Nov 28 13:30:53 INFO: raidvg: activated successfully.



Actual results:
Resource refuses to move to a node with partial VG, even if partial_activation='True'. Furthermore, the debug-start command shows that all problems are OK, even though it does not start on the given node.

Expected results:

The resource is either moved in order or the debug-start command prints why the move could not be performed

Additional info:

Comment 1 Oyvind Albrigtsen 2022-11-28 14:06:51 UTC
If you add --full to debug-start you should be able to see exactly where it fails.

Comment 2 Simon Foucek 2022-11-28 14:22:20 UTC
'debug-start --full raidvg' exits with 0 too, there are no error logs inside

Comment 3 Simon Foucek 2022-11-28 14:28:25 UTC
>[root@virt-280 ~]# pcs resource debug-start --full raidvg
(unpack_rsc_op_failure) 	warning: Unexpected result (error: raidvg: failed to activate.) was recorded for start of raidvg on virt-281 at Nov 28 15:14:47 2022 | rc=1 id=raidvg_last_failure_0
(log_xmllib_err) 	error: XML Error: Entity: line 1: parser error : Start tag expected, '<' not found
(log_xmllib_err) 	error: XML Error:   pvscan[280336] PV /dev/sda online.
(log_xmllib_err) 	error: XML Error:   ^
(string2xml) 	warning: Parsing failed (domain=1, level=3, code=4): Start tag expected, '<' not found
...

Ok, I found this error message inside

Comment 4 Simon Foucek 2022-11-28 14:44:13 UTC
(In reply to Simon Foucek from comment #3)
> >[root@virt-280 ~]# pcs resource debug-start --full raidvg
> (unpack_rsc_op_failure) 	warning: Unexpected result (error: raidvg: failed
> to activate.) was recorded for start of raidvg on virt-281 at Nov 28
> 15:14:47 2022 | rc=1 id=raidvg_last_failure_0
> (log_xmllib_err) 	error: XML Error: Entity: line 1: parser error : Start tag
> expected, '<' not found
> (log_xmllib_err) 	error: XML Error:   pvscan[280336] PV /dev/sda online.
> (log_xmllib_err) 	error: XML Error:   ^
> (string2xml) 	warning: Parsing failed (domain=1, level=3, code=4): Start tag
> expected, '<' not found
> ...
> 
> Ok, I found this error message inside

I find it strange that if I put the offline state of a specific PV on both nodes, the resource has no problem running/starting on the current node. When I move it, it generates this error and does not start. If I then remove the constraint and call the start action, it starts fine on the current node again.

Comment 5 Oyvind Albrigtsen 2022-11-28 14:56:43 UTC
Can you try running it with pcs --debug ?

That should show you which commands pcs runs and it's output.

Comment 6 Simon Foucek 2022-11-28 15:20:14 UTC
(In reply to Oyvind Albrigtsen from comment #5)
> Can you try running it with pcs --debug ?
> 
> That should show you which commands pcs runs and it's output.
Here is end of the output with errors:

Running: /usr/sbin/crm_resource -r raidvg --force-start
Return Value: 1
--Debug Output Start--
crm_resource: Error performing operation: Error occurred
Operation force-start for raidvg (ocf:heartbeat:LVM-activate) returned 1 (error: raidvg: failed to activate.)
  pvscan[292413] PV /dev/vda2 online.
  pvscan[292413] PV /dev/sdb ignore foreign VG.
  pvscan[292413] PV /dev/sdc ignore foreign VG.
  pvscan[292413] PV /dev/sdd ignore foreign VG.
  pvscan[292413] PV /dev/sde ignore foreign VG.
  pvscan[292413] PV /dev/sdf ignore foreign VG.
active

  Cannot access VG raidvg with system ID virt-281 with local system ID virt-280.
Nov 28 16:14:42 INFO: Activating raidvg
  WARNING: VG raidvg is missing PV dedOMV-6V4n-pw2e-mkoV-0pBN-CP3s-GzyDXx (last written to /dev/sda).
  Cannot change VG raidvg while PVs are missing.
  See vgreduce --removemissing and vgextend --restoremissing.
  Cannot process volume group raidvg
  Cannot access VG raidvg with system ID virt-281 with local system ID virt-280.
Nov 28 16:14:42 ERROR:  PARTIAL MODE. Incomplete logical volumes will be processed. Cannot access VG raidvg with system ID virt-281 with local system ID virt-280. 
ocf-exit-reason:raidvg: failed to activate.
--Debug Output End--

crm_resource: Error performing operation: Error occurred
Operation force-start for raidvg (ocf:heartbeat:LVM-activate) returned 1 (error: raidvg: failed to activate.)
  pvscan[292413] PV /dev/vda2 online.
  pvscan[292413] PV /dev/sdb ignore foreign VG.
  pvscan[292413] PV /dev/sdc ignore foreign VG.
  pvscan[292413] PV /dev/sdd ignore foreign VG.
  pvscan[292413] PV /dev/sde ignore foreign VG.
  pvscan[292413] PV /dev/sdf ignore foreign VG.
active

  Cannot access VG raidvg with system ID virt-281 with local system ID virt-280.
Nov 28 16:14:42 INFO: Activating raidvg
  WARNING: VG raidvg is missing PV dedOMV-6V4n-pw2e-mkoV-0pBN-CP3s-GzyDXx (last written to /dev/sda).
  Cannot change VG raidvg while PVs are missing.
  See vgreduce --removemissing and vgextend --restoremissing.
  Cannot process volume group raidvg
  Cannot access VG raidvg with system ID virt-281 with local system ID virt-280.
Nov 28 16:14:42 ERROR:  PARTIAL MODE. Incomplete logical volumes will be processed. Cannot access VG raidvg with system ID virt-281 with local system ID virt-280. 
ocf-exit-reason:raidvg: failed to activate.



It seems that problem is in denial of access because of the wrong system_id, but the resource should cover this by itself, I think. Also, partial_activation='True', so there shouldn't be a problem with missing PVs and the resource should move successfully.

Comment 8 Simon Foucek 2022-11-29 15:23:19 UTC
Tried the same scenario with lvmlockd, worked properly. The error is present only with system_id option.

Comment 9 David Teigland 2022-12-06 18:27:53 UTC
These issues are described here: https://bugzilla.redhat.com/show_bug.cgi?id=2066156#c2

There were two changes I mentioned we could make in that bugzilla:

1. reverting an LVM-activate commit related to partial activation that I believe is incorrect:
   https://github.com/ClusterLabs/resource-agents/commit/30b800156921c9f1524faef07185221a944358f7

2. adding a new vgchange option to enable modifying an incomplete VG so the system ID can be changed (the cause of the error reported above.)

Neither change has been made.

I implemented 2, but we have not pushed it out as a feature until there is a clear need for it (the cluster group could weigh in on that). 
A devel branch with the feature is https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=03653a0a570e94222eda49149e260cf1a8358bd1
The larger issue would be the testing required to support lvm raid with cluster failover.

bug 2066156 was resolved because the user switched to using md raid instead of lvm raid.  This is what we would always recommend, even if the two issues above were resolved.

Using lvmlockd is also a valid solution, but it involves added complexity, so in many cases system ID based failover will be preferred.

Comment 10 Simon Foucek 2022-12-07 08:24:37 UTC
Thank you for your response! I have these questions:
1. Is there a reason to keep the partial_activation option if it doesn't work correctly? Because if I imagine a user situation where I have an N-nodes cluster, some physical volume fails, so all nodes see VG/LV as partial, then the node running my resource fails, and the resource fails to move. This seems to be an essential and crucial feature of the HA cluster, and it's broken right now.
2. If we implement the second option from the devel branch, can you provide a little bit more description about the issue with lvm raid and cluster failover testing which will occur?

Comment 11 David Teigland 2022-12-07 16:13:43 UTC
(In reply to Simon Foucek from comment #10)
> 1. Is there a reason to keep the partial_activation option if it doesn't work correctly? 

As explained in the other bug, I think it should be removed because it doesn't do what I think you want.

> Because if I imagine a user situation where I have an
> N-nodes cluster, some physical volume fails, so all nodes see VG/LV as
> partial, then the node running my resource fails, and the resource fails to
> move. This seems to be an essential and crucial feature of the HA cluster,
> and it's broken right now.

It's clearly useful, and it works right now if you use md raid under the VG.  That's what most users do, and that's what we recommend if you want to use software raid.

It's possible that there's a good reason the user needs to use lvm raid instead of md raid, and in that case we'd be interested to hear more about that.  In that case, we can look at adding the devel patch that would permit failing over VGs with missing PVs.

> 2. If we implement the second option from the devel branch, can you provide
> a little bit more description about the issue with lvm raid and cluster
> failover testing which will occur?

QE will need to write and run tests for failover under those conditions, which I suspect they don't have.  Those scenarios get complicated when you consider all the combinations of: number of failed devices, which specific devices fail, which LVs are using the failed devices, and whether the raid level for those LVs can tolerate that number of failed devices.  Some LVs in the VG may be able to tolerate the missing devices, and other LVs may not.  All of these complications are avoided if you just use md raid under the VG, and that's why I suspect we would prefer to state that the only way we handle raid in an HA setup is via md.