Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1650754

Summary:	Refresh of specific bundle resource causes unnecessary recovery
Product:	Red Hat Enterprise Linux 8	Reporter:	Marian Krcmarik <mkrcmari>
Component:	pacemaker	Assignee:	Ken Gaillot <kgaillot>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.3	CC:	cluster-maint
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-01 07:30:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marian Krcmarik 2018-11-16 22:47:07 UTC

Description of problem:
The following cluster (openstack) with remote nodes and bundle on them is configured:
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.19-8.el7_6.1-c3c624ea3d) - partition with quorum
Last updated: Fri Nov 16 22:27:56 2018
Last change: Fri Nov 16 20:35:13 2018 by hacluster via crmd on controller-2

18 nodes configured
52 resources configured

Online: [ controller-0 controller-1 controller-2 ]
RemoteOnline: [ database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-2 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]

Full list of resources:

 database-0	(ocf::pacemaker:remote):	Started controller-1
 database-1	(ocf::pacemaker:remote):	Started controller-1
 database-2	(ocf::pacemaker:remote):	Started controller-2
 messaging-0	(ocf::pacemaker:remote):	Started controller-2
 messaging-1	(ocf::pacemaker:remote):	Started controller-1
 messaging-2	(ocf::pacemaker:remote):	Started controller-2
 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started messaging-0
   rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Started messaging-1
   rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Started messaging-2
 Docker container set: galera-bundle [192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest]
   galera-bundle-0	(ocf::heartbeat:galera):	Master database-0
   galera-bundle-1	(ocf::heartbeat:galera):	Master database-1
   galera-bundle-2	(ocf::heartbeat:galera):	Master database-2
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Slave controller-0
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Master controller-2
 ip-192.168.24.8	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-2
 ip-172.17.1.13	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.12	(ocf::heartbeat:IPaddr2):	Started controller-2
 ip-172.17.3.33	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.4.13	(ocf::heartbeat:IPaddr2):	Started controller-1
 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest]
   haproxy-bundle-docker-0	(ocf::heartbeat:docker):	Started controller-0
   haproxy-bundle-docker-1	(ocf::heartbeat:docker):	Started controller-1
   haproxy-bundle-docker-2	(ocf::heartbeat:docker):	Started controller-2
 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest]
   openstack-cinder-volume-docker-0	(ocf::heartbeat:docker):	Started controller-2
 stonith-fence_ipmilan-5254008f15a5	(stonith:fence_ipmilan):	Started controller-2
 stonith-fence_ipmilan-52540033b7e7	(stonith:fence_ipmilan):	Started controller-1
 stonith-fence_ipmilan-5254001b9601	(stonith:fence_ipmilan):	Started controller-2
 stonith-fence_ipmilan-525400ebe877	(stonith:fence_ipmilan):	Started controller-1
 stonith-fence_ipmilan-5254009054e4	(stonith:fence_ipmilan):	Started controller-1
 stonith-fence_ipmilan-5254001ba424	(stonith:fence_ipmilan):	Started controller-2
 stonith-fence_ipmilan-5254004a16bb	(stonith:fence_ipmilan):	Started controller-2
 stonith-fence_ipmilan-5254009dcf5b	(stonith:fence_ipmilan):	Started controller-1
 stonith-fence_ipmilan-5254006c13af	(stonith:fence_ipmilan):	Started controller-1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

The bug 1646350 includes a fix for avoiding unncecessary recovery of follwing examples:
1. pcs resource refresh messaging-0
2. pcs resource refresh rabbitmq-bundle

However follwing command will end up in unnecessarry recovery -> restart of container:
pcs resource refresh rabbitmq-bundle-0


Version-Release number of selected component (if applicable):
pacemaker-cluster-libs-1.1.19-8.el7_6.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.1.x86_64
pacemaker-1.1.19-8.el7_6.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212253.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.1.x86_64
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Set up a cluster as shown above with bundles running on remote nodes in A/A mode.
2. refresh bundle resource on specific node -> pcs resource refresh

Actual results:
Container managed inside bundle is being restarted even though It is running without problems

Expected results:
No recovery/restart of container

Additional info:
Some sosreports will be attached.

Comment 3 Ken Gaillot 2018-12-19 18:25:41 UTC

Further testing shows this is a problem when the bundle's remote connection (the overall bundle name plus -INSTANCE_NUMBER) gets cleaned and the connection has a past failure.

That is, cleanup and refresh work fine if there is no past connection failure, but lead to the bundle being considered failed and recovered if there is a past connection failure, regardless of whether the overall bundle name or the connection name is given to clean/refresh.

A connection failure can be forced with 'killall -9 pacemaker-remoted' on the node hosting the bundle instance.

Comment 4 Ken Gaillot 2020-05-06 00:58:21 UTC

This will be considered for RHEL 8 only

Comment 7 RHEL Program Management 2021-02-01 07:30:34 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 8 Ken Gaillot 2021-02-01 14:41:11 UTC

(In reply to RHEL Program Management from comment #7)
> After evaluating this issue, there are no plans to address it further or fix
> it in an upcoming release.  Therefore, it is being closed.  If plans change
> such that this issue will be fixed in an upcoming release, then the bug can
> be reopened.

This is still a priority but we do not yet know when developer time will become available. When we know what release it will be in, we will reopen.