2166388 – Cleaning up migration history can result in unnecessary resource recovery

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2166388 - Cleaning up migration history can result in unnecessary resource recovery

Summary: Cleaning up migration history can result in unnecessary resource recovery

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	8.8
Hardware:	All
OS:	All
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.8
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2166393
TreeView+	depends on / blocked

Reported:	2023-02-01 17:09 UTC by Ken Gaillot
Modified:	2023-05-16 09:52 UTC (History)
CC List:	2 users (show)
Fixed In Version:	pacemaker-2.1.5-6.el8
Doc Type:	No Doc Update
Doc Text:	The goal is to have a fix before the issue makes it into the release.
Clone Of:
Clones:	2166393 (view as bug list)
Environment:
Last Closed:	2023-05-16 08:35:22 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CLUSTERQE-6390	None	None	None	2023-02-01 18:02:02 UTC
Red Hat Issue Tracker	RHELPLAN-147332	None	None	None	2023-02-01 17:12:32 UTC
Red Hat Product Errata	RHBA-2023:2818	None	None	None	2023-05-16 08:35:31 UTC

Description Ken Gaillot 2023-02-01 17:09:12 UTC

Description of problem: If resource history is cleaned on a node with past live migration history for a resource, in some cases the resource (active elsewhere) could be unnecessarily recovered.


Version-Release number of selected component (if applicable): pacemaker-2.1.5-5.el8


How reproducible: Reliably


Steps to Reproduce:
1. Create and start a cluster with at least three nodes.
2. Create a migrateable resource, for example "pcs resource create migrator ocf:pacemaker:Stateful meta allow-migrate=true"
3. Ban the migrateable resource from its current node. It should live-migrate to a new node.
4. Ban the migrateable resource its new node. It should live-migrate to a third node.
5. Run "pcs resource refresh" for the migrateable resource with node= the second node that it was active on.

Actual results: The resource is restarted on its current node.

Expected results: No resource activity occurs after refreshing the middle node.

Comment 4 Markéta Smazová 2023-02-27 19:01:39 UTC

after fix
----------

>    [root@virt-282 ~]# rpm -q pacemaker
>    pacemaker-2.1.5-8.el8.x86_64

Create and start a cluster with at least three nodes, create a migrateable resource:

>    [root@virt-282 ~]# pcs resource create migrator ocf:pacemaker:Dummy meta allow-migrate=true
>    [root@virt-282 ~]# pcs status
>    Cluster name: STSRHTS31958
>    Status of pacemakerd: 'Pacemaker is running' (last updated 2023-02-27 16:30:53 +01:00)
>    Cluster Summary:
>      * Stack: corosync
>      * Current DC: virt-283 (version 2.1.5-8.el8-a3f44794f94) - partition with quorum
>      * Last updated: Mon Feb 27 16:30:53 2023
>      * Last change:  Mon Feb 27 16:30:50 2023 by root via cibadmin on virt-282
>      * 3 nodes configured
>      * 4 resource instances configured

>    Node List:
>      * Online: [ virt-282 virt-283 virt-284 ]

>    Full List of Resources:
>      * fence-virt-282	(stonith:fence_xvm):	 Started virt-282
>      * fence-virt-283	(stonith:fence_xvm):	 Started virt-283
>      * fence-virt-284	(stonith:fence_xvm):	 Started virt-284
>      * migrator	(ocf::pacemaker:Dummy):	 Started virt-282

>    Daemon Status:
>      corosync: active/enabled
>      pacemaker: active/enabled
>      pcsd: active/enabled

Ban the resource from its current node. It should live-migrate to a new node:

>    [root@virt-282 ~]# pcs resource ban migrator
>    Warning: Creating location constraint 'cli-ban-migrator-on-virt-282' with a score of -INFINITY for resource migrator on virt-282.
>        This will prevent migrator from running on virt-282 until the constraint is removed
>        This will be the case even if virt-282 is the last node in the cluster

>    [root@virt-282 ~]# pcs resource
>      * migrator	(ocf::pacemaker:Dummy):	 Started virt-283

Ban the resource from its new node. It should live-migrate to a third node:

>    [root@virt-282 ~]# pcs resource ban migrator
>    Warning: Creating location constraint 'cli-ban-migrator-on-virt-283' with a score of -INFINITY for resource migrator on virt-283.
>        This will prevent migrator from running on virt-283 until the constraint is removed
>        This will be the case even if virt-283 is the last node in the cluster

>    [root@virt-282 ~]# pcs resource
>      * migrator	(ocf::pacemaker:Dummy):	 Started virt-284

Check the resource operations on the second node, where was resource active on:

>    [root@virt-282 ~]# crm_resource --list-all-operations --resource migrator --node virt-283
>    migrator	(ocf::pacemaker:Dummy):	 Started: migrator_migrate_from_0 (node=virt-283, call=58, rc=0, last-rc-change='Mon Feb 27 16:31:05 2023', exec=39ms): complete
>    migrator	(ocf::pacemaker:Dummy):	 Started: migrator_monitor_10000 (node=virt-283, call=60, rc=0, last-rc-change='Mon Feb 27 16:31:05 2023', exec=48ms): complete
>    migrator	(ocf::pacemaker:Dummy):	 Started: migrator_migrate_to_0 (node=virt-283, call=63, rc=0, last-rc-change='Mon Feb 27 16:31:19 2023', exec=54ms): complete
>    migrator	(ocf::pacemaker:Dummy):	 Started: migrator_stop_0 (node=virt-283, call=66, rc=0, last-rc-change='Mon Feb 27 16:31:19 2023', exec=30ms): complete

Run `pcs resource refresh` for the resource with node parameter set to the second node (virt-283) that it was active on:

>    [root@virt-282 ~]# date && pcs resource refresh migrator node=virt-283
>    Mon 27 Feb 16:33:08 CET 2023
>    Cleaned up migrator on virt-283
>    Waiting for 1 reply from the controller
>    ... got reply (done)

Check the resource operations on the second node again:

>    [root@virt-282 ~]# crm_resource --list-all-operations --resource migrator --node virt-283
>    migrator	(ocf::pacemaker:Dummy):	 Started: migrator_monitor_0 (node=virt-283, call=72, rc=7, last-rc-change='Mon Feb 27 16:33:10 2023', exec=22ms): complete

RESULT: Resource did not restart on the second node.

marking verified in pacemaker-2.1.5-8.el8

Comment 6 errata-xmlrpc 2023-05-16 08:35:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2818

Note You need to log in before you can comment on or make changes to this bug.