Bug 1688149

Summary: pacemaker cluster will never settle
Product: Red Hat Enterprise Linux 8 Reporter: michal novacek <mnovacek>
Component: pacemakerAssignee: Reid Wahl <nwahl>
Status: ON_QA --- QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 8.0CC: cluster-maint, jrehova, kgaillot, lmiksik, nwahl
Target Milestone: pre-dev-freezeKeywords: Reopened, Triaged
Target Release: 8.9   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.6-4.el8 Doc Type: Bug Fix
Doc Text:
Cause: Pacemaker previously assigned clone instances to equally scored nodes without considering the instances' current nodes. Consequence: If a clone had equally scored location constraints on a subset of nodes, clone instances could be assigned to a different node each time and continuously stopped and restarted by the cluster. Fix: Instances are now assigned to their current node whenever possible. Result: Clone instances do not get restarted unnecessarily.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-01 07:39:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 2.1.7
Embargoed:
Bug Depends On: 1682116    
Bug Blocks:    
Attachments:
Description Flags
'pcs cluster report' output none

Description michal novacek 2019-03-13 09:50:45 UTC
Created attachment 1543573 [details]
'pcs cluster report' output

Description of problem:
In a two node cluster with VirtualDomain resources and gfs2 filesystems [1]
seemingly running hapily.

I found out that 'crm_resource --wait' never finishes. It shows a lot of
pending actions [3] that seems not to ever finish. Once monitoring acion times
out virtual machine is shut down and started again (very undesired).
This also means that cluster is never settled.

Version-Release number of selected component (if applicable):
corosync-3.0.0-2.el8.x86_64
pacemaker-2.0.1-4.el8.x86_64
resource-agents-4.1.1-17.el8.x86_64

How reproducible: always


Steps to Reproduce:
1. create two node cluster [1], [2] and observe pending actions

Actual results: cluster never settled, lots of pending actions

Expected results: cluster settled


Additional info:

> [1]: pcs config
Cluster Name: STSRHTS3983
Corosync Nodes:
 light-01.cluster-qe.lab.eng.brq.redhat.com light-03.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 light-01.cluster-qe.lab.eng.brq.redhat.com light-03.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Clone: locking-clone
  Meta Attrs: interleave=true
  Group: locking
   Resource: dlm (class=ocf provider=pacemaker type=controld)
    Operations: monitor interval=30s (dlm-monitor-interval-30s)
                start interval=0s timeout=90s (dlm-start-interval-0s)
                stop interval=0s timeout=100s (dlm-stop-interval-0s)
   Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd)
    Attributes: with_cmirrord=1
    Operations: monitor interval=30s (lvmlockd-monitor-interval-30s)
                start interval=0s timeout=90s (lvmlockd-start-interval-0s)
                stop interval=0s timeout=90s (lvmlockd-stop-interval-0s)
 Clone: group-var-lib-libvirt-images-clone
  Meta Attrs: clone-max=2 interleave=true ordered=true
  Group: group-var-lib-libvirt-images
   Resource: lv-var-lib-libvirt-images (class=ocf provider=heartbeat type=LVM-activate)
    Attributes: activation_mode=shared lvname=images0 vg_access_mode=lvmlockd vgname=shared
    Operations: monitor interval=30s timeout=90s (lv-var-lib-libvirt-images-monitor-interval-30s)
                start interval=0s timeout=90s (lv-var-lib-libvirt-images-start-interval-0s)
                stop interval=0s timeout=90s (lv-var-lib-libvirt-images-stop-interval-0s)
   Resource: fs-var-lib-libvirt-images (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
    Operations: monitor interval=30s (fs-var-lib-libvirt-images-monitor-interval-30s)
                notify interval=0s timeout=60s (fs-var-lib-libvirt-images-notify-interval-0s)
                start interval=0s timeout=60s (fs-var-lib-libvirt-images-start-interval-0s)
                stop interval=0s timeout=60s (fs-var-lib-libvirt-images-stop-interval-0s)
 Clone: group-etc-libvirt-qemu-clone
  Meta Attrs: clone-max=2 interleave=true ordered=true
  Group: group-etc-libvirt-qemu
   Resource: vg-etc-libvirt-qemu (class=ocf provider=heartbeat type=LVM-activate)
    Attributes: activation_mode=shared lvname=etc0 vg_access_mode=lvmlockd vgname=shared
    Operations: monitor interval=30s timeout=90s (vg-etc-libvirt-qemu-monitor-interval-30s)
                start interval=0s timeout=90s (vg-etc-libvirt-qemu-start-interval-0s)
                stop interval=0s timeout=90s (vg-etc-libvirt-qemu-stop-interval-0s)
   Resource: fs-etc-libvirt-qemu (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
    Operations: monitor interval=30s (fs-etc-libvirt-qemu-monitor-interval-30s)
                notify interval=0s timeout=60s (fs-etc-libvirt-qemu-notify-interval-0s)
                start interval=0s timeout=60s (fs-etc-libvirt-qemu-start-interval-0s)
                stop interval=0s timeout=60s (fs-etc-libvirt-qemu-stop-interval-0s)
 Resource: pool-10-37-165-129 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-37-165-129.xml hypervisor=qemu:///system migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Utilization: cpu=2 hv_memory=1024
  Operations: migrate_from interval=0 timeout=120s (pool-10-37-165-129-migrate_from-interval-0)
              migrate_to interval=0 timeout=120s (pool-10-37-165-129-migrate_to-interval-0)
              monitor interval=10s timeout=30s (pool-10-37-165-129-monitor-interval-10s)
              start interval=0s timeout=90s (pool-10-37-165-129-start-interval-0s)
              stop interval=0s timeout=90s (pool-10-37-165-129-stop-interval-0s)
 Resource: pool-10-37-165-65 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-37-165-65.xml hypervisor=qemu:///system migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Utilization: cpu=2 hv_memory=1024
  Operations: migrate_from interval=0 timeout=120s (pool-10-37-165-65-migrate_from-interval-0)
              migrate_to interval=0 timeout=120s (pool-10-37-165-65-migrate_to-interval-0)
              monitor interval=10s timeout=30s (pool-10-37-165-65-monitor-interval-10s)
              start interval=0s timeout=90s (pool-10-37-165-65-start-interval-0s)
              stop interval=0s timeout=90s (pool-10-37-165-65-stop-interval-0s)

Stonith Devices:
 Resource: fence-light-01 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 ipaddr=light-01-ilo lanplus=0 login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=light-01.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-light-01-monitor-interval-60s)
 Resource: fence-light-03 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=light-03-ilo lanplus=0 login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=light-03.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-light-03-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: group-etc-libvirt-qemu-clone
    Enabled on: light-01.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-group-etc-libvirt-qemu-clone-light-01.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: light-03.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-group-etc-libvirt-qemu-clone-light-03.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-37-165-129 (score:-INFINITY) (id:location-group-etc-libvirt-qemu-clone-pool-10-37-165-129--INFINITY)
    Disabled on: pool-10-37-165-65 (score:-INFINITY) (id:location-group-etc-libvirt-qemu-clone-pool-10-37-165-65--INFINITY)
  Resource: group-var-lib-libvirt-images-clone
    Enabled on: light-01.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-group-var-lib-libvirt-images-clone-light-01.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: light-03.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-group-var-lib-libvirt-images-clone-light-03.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-37-165-129 (score:-INFINITY) (id:location-group-var-lib-libvirt-images-clone-pool-10-37-165-129--INFINITY)
    Disabled on: pool-10-37-165-65 (score:-INFINITY) (id:location-group-var-lib-libvirt-images-clone-pool-10-37-165-65--INFINITY)
  Resource: locking-clone
    Disabled on: pool-10-37-165-129 (score:-INFINITY) (id:location-locking-clone-pool-10-37-165-129--INFINITY)
    Disabled on: pool-10-37-165-65 (score:-INFINITY) (id:location-locking-clone-pool-10-37-165-65--INFINITY)
Ordering Constraints:
  start locking-clone then start group-var-lib-libvirt-images-clone (kind:Mandatory) (id:order-locking-clone-group-var-lib-libvirt-images-clone-mandatory)
  start locking-clone then start group-etc-libvirt-qemu-clone (kind:Mandatory) (id:order-locking-clone-group-etc-libvirt-qemu-clone-mandatory)
  start group-var-lib-libvirt-images-clone then start pool-10-37-165-129 (kind:Mandatory) (id:order-group-var-lib-libvirt-images-clone-pool-10-37-165-129-mandatory)
  start group-etc-libvirt-qemu-clone then start pool-10-37-165-129 (kind:Mandatory) (id:order-group-etc-libvirt-qemu-clone-pool-10-37-165-129-mandatory)
  start group-var-lib-libvirt-images-clone then start pool-10-37-165-65 (kind:Mandatory) (id:order-group-var-lib-libvirt-images-clone-pool-10-37-165-65-mandatory)
  start group-etc-libvirt-qemu-clone then start pool-10-37-165-65 (kind:Mandatory) (id:order-group-etc-libvirt-qemu-clone-pool-10-37-165-65-mandatory)
Colocation Constraints:
  group-var-lib-libvirt-images-clone with locking-clone (score:INFINITY) (id:colocation-group-var-lib-libvirt-images-clone-locking-clone-INFINITY)
  group-etc-libvirt-qemu-clone with locking-clone (score:INFINITY) (id:colocation-group-etc-libvirt-qemu-clone-locking-clone-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 100
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS3983
 dc-version: 2.0.1-4.el8-0eb7991564
 have-watchdog: false
 no-quorum-policy: freeze

Quorum:
  Options:


> [2]: pcs resource config
 Clone: locking-clone
  Meta Attrs: interleave=true
  Group: locking
   Resource: dlm (class=ocf provider=pacemaker type=controld)
    Operations: monitor interval=30s (dlm-monitor-interval-30s)
                start interval=0s timeout=90s (dlm-start-interval-0s)
                stop interval=0s timeout=100s (dlm-stop-interval-0s)
   Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd)
    Attributes: with_cmirrord=1
    Operations: monitor interval=30s (lvmlockd-monitor-interval-30s)
                start interval=0s timeout=90s (lvmlockd-start-interval-0s)
                stop interval=0s timeout=90s (lvmlockd-stop-interval-0s)
 Clone: group-var-lib-libvirt-images-clone
  Meta Attrs: clone-max=2 interleave=true ordered=true
  Group: group-var-lib-libvirt-images
   Resource: lv-var-lib-libvirt-images (class=ocf provider=heartbeat type=LVM-activate)
    Attributes: activation_mode=shared lvname=images0 vg_access_mode=lvmlockd vgname=shared
    Operations: monitor interval=30s timeout=90s (lv-var-lib-libvirt-images-monitor-interval-30s)
                start interval=0s timeout=90s (lv-var-lib-libvirt-images-start-interval-0s)
                stop interval=0s timeout=90s (lv-var-lib-libvirt-images-stop-interval-0s)
   Resource: fs-var-lib-libvirt-images (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
    Operations: monitor interval=30s (fs-var-lib-libvirt-images-monitor-interval-30s)
                notify interval=0s timeout=60s (fs-var-lib-libvirt-images-notify-interval-0s)
                start interval=0s timeout=60s (fs-var-lib-libvirt-images-start-interval-0s)
                stop interval=0s timeout=60s (fs-var-lib-libvirt-images-stop-interval-0s)
 Clone: group-etc-libvirt-qemu-clone
  Meta Attrs: clone-max=2 interleave=true ordered=true
  Group: group-etc-libvirt-qemu
   Resource: vg-etc-libvirt-qemu (class=ocf provider=heartbeat type=LVM-activate)
    Attributes: activation_mode=shared lvname=etc0 vg_access_mode=lvmlockd vgname=shared
    Operations: monitor interval=30s timeout=90s (vg-etc-libvirt-qemu-monitor-interval-30s)
                start interval=0s timeout=90s (vg-etc-libvirt-qemu-start-interval-0s)
                stop interval=0s timeout=90s (vg-etc-libvirt-qemu-stop-interval-0s)
   Resource: fs-etc-libvirt-qemu (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
    Operations: monitor interval=30s (fs-etc-libvirt-qemu-monitor-interval-30s)
                notify interval=0s timeout=60s (fs-etc-libvirt-qemu-notify-interval-0s)
                start interval=0s timeout=60s (fs-etc-libvirt-qemu-start-interval-0s)
                stop interval=0s timeout=60s (fs-etc-libvirt-qemu-stop-interval-0s)
 Resource: pool-10-37-165-129 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-37-165-129.xml hypervisor=qemu:///system migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Utilization: cpu=2 hv_memory=1024
  Operations: migrate_from interval=0 timeout=120s (pool-10-37-165-129-migrate_from-interval-0)
              migrate_to interval=0 timeout=120s (pool-10-37-165-129-migrate_to-interval-0)
              monitor interval=10s timeout=30s (pool-10-37-165-129-monitor-interval-10s)
              start interval=0s timeout=90s (pool-10-37-165-129-start-interval-0s)
              stop interval=0s timeout=90s (pool-10-37-165-129-stop-interval-0s)
 Resource: pool-10-37-165-65 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/pool-10-37-165-65.xml hypervisor=qemu:///system migration_transport=ssh
  Meta Attrs: allow-migrate=true
  Utilization: cpu=2 hv_memory=1024
  Operations: migrate_from interval=0 timeout=120s (pool-10-37-165-65-migrate_from-interval-0)
              migrate_to interval=0 timeout=120s (pool-10-37-165-65-migrate_to-interval-0)
              monitor interval=10s timeout=30s (pool-10-37-165-65-monitor-interval-10s)
              start interval=0s timeout=90s (pool-10-37-165-65-start-interval-0s)
              stop interval=0s timeout=90s (pool-10-37-165-65-stop-interval-0s)


> [3]: crm_resource --wait --timeout=20
Pending actions:
	Action 92: pool-10-37-165-65_start_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 91: pool-10-37-165-65_stop_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 90: pool-10-37-165-129_start_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 89: pool-10-37-165-129_stop_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 80: fs-etc-libvirt-qemu:1_monitor_30000	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 79: fs-etc-libvirt-qemu:1_start_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 78: fs-etc-libvirt-qemu:1_stop_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 77: vg-etc-libvirt-qemu:1_monitor_30000	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 76: vg-etc-libvirt-qemu:1_start_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 75: vg-etc-libvirt-qemu:1_stop_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 70: fs-etc-libvirt-qemu:0_monitor_30000	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 69: fs-etc-libvirt-qemu:0_start_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 68: fs-etc-libvirt-qemu:0_stop_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 67: vg-etc-libvirt-qemu:0_monitor_30000	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 66: vg-etc-libvirt-qemu:0_start_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 65: vg-etc-libvirt-qemu:0_stop_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 56: fs-var-lib-libvirt-images:1_monitor_30000	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 55: fs-var-lib-libvirt-images:1_start_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 54: fs-var-lib-libvirt-images:1_stop_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 53: lv-var-lib-libvirt-images:1_monitor_30000	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 52: lv-var-lib-libvirt-images:1_start_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 51: lv-var-lib-libvirt-images:1_stop_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 46: fs-var-lib-libvirt-images:0_monitor_30000	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 45: fs-var-lib-libvirt-images:0_start_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 44: fs-var-lib-libvirt-images:0_stop_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 43: lv-var-lib-libvirt-images:0_monitor_30000	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 42: lv-var-lib-libvirt-images:0_start_0	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 41: lv-var-lib-libvirt-images:0_stop_0	on light-03.cluster-qe.lab.eng.brq.redhat.com
	Action 16: pool-10-37-165-129_monitor_10000	on light-01.cluster-qe.lab.eng.brq.redhat.com
	Action 8: pool-10-37-165-65_monitor_10000	on light-03.cluster-qe.lab.eng.brq.redhat.com
Error performing operation: Timer expired

Comment 1 Ken Gaillot 2019-03-14 22:13:05 UTC
It's actually not running happily; starting at Mar 12 16:03:37 in the logs (when the LVM-activate/Filesystem/VirtualDomain resources and their constraints are added), the resources are continuously restarting. :(

Removing the location constraints for the Filesystem resources seems to work around the problem. (They are equally scored location constraints for both nodes in a symmetric cluster, so they have no effect.)

I also see location constraints keeping various resources off the VirtualDomain resources. Those are not Pacemaker Remote nodes, so the constraints do not mean anything. However those constraints aren't causing any problems.

There is a separate issue with the simulation (but not the cluster) thinking the fence devices need to be restarted. That might interfere with the --wait as well. This is a known issue that has not been investigated.

Can you try the workaround and see if it helps? We need to fix the underlying issues, but given how difficult it is to get anything into GA at this point, a workaround would be good to have.

Comment 2 michal novacek 2019-03-15 16:16:09 UTC
I can confirm that removing the positive constraint for filesystem works around the problem.

Comment 8 Ken Gaillot 2020-11-25 18:27:13 UTC
An update:

(In reply to Ken Gaillot from comment #1)
> It's actually not running happily; starting at Mar 12 16:03:37 in the logs
> (when the LVM-activate/Filesystem/VirtualDomain resources and their
> constraints are added), the resources are continuously restarting. :(

Looking at the logs more closely, I was off a bit: the configuration was being repeatedly changed during this time, so resources were starting and stopping appropriately. Problems actually start at Mar 12 16:32:27.

> Removing the location constraints for the Filesystem resources seems to work
> around the problem. (They are equally scored location constraints for both
> nodes in a symmetric cluster, so they have no effect.)

Changing the location constraints to have a score less than INFINITY also works around the problem.

Pacemaker assigns an instance number to clone instances on each node. What is going wrong here is that every time Pacemaker runs its scheduler, it assigns different instance numbers to the existing active instances compared to what it wants the final result to be, so it thinks the instances need to be moved.

The cause for that still needs to be found and fixed.

> There is a separate issue with the simulation (but not the cluster) thinking
> the fence devices need to be restarted. That might interfere with the --wait
> as well. This is a known issue that has not been investigated.

As an aside, the simulation issue has been fixed, though the fix will not make it into RHEL 8.4. However, that issue does not affect --wait when used with a live cluster.

Comment 13 RHEL Program Management 2021-02-01 07:39:27 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 14 Ken Gaillot 2021-02-01 14:43:53 UTC
(In reply to RHEL Program Management from comment #13)
> After evaluating this issue, there are no plans to address it further or fix
> it in an upcoming release.  Therefore, it is being closed.  If plans change
> such that this issue will be fixed in an upcoming release, then the bug can
> be reopened.

This is still a high priority and I am hopeful the fix will be in RHEL 8.5. Once we are further along in 8.5 release planning, we will likely reopen this.

Comment 15 Reid Wahl 2023-07-18 00:23:15 UTC
This is fixed by upstream commit 018ad6d5.