1830552 – pcs status on remotes is not working on rhel8.2 any longer

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1830552 - pcs status on remotes is not working on rhel8.2 any longer

Summary: pcs status on remotes is not working on rhel8.2 any longer

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	8.3
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1832914
TreeView+	depends on / blocked

Reported:	2020-05-02 17:33 UTC by Michele Baldessari
Modified:	2021-04-20 08:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pcs-0.10.6-1.el8
Doc Type:	Bug Fix
Doc Text:	Cause: User runs 'pcs status' on a remote node. Consequence: Pcs exits with an error complaining corosync.conf is missing. This is wrong as corosync.conf is expected to be missing on remote nodes. Fix: If corosync.conf is missing, read cluster name from CIB instead of corosync.conf. Gracefully skip obtaining and displaying information which depend on corosync.conf presence. Result: The 'pcs status' command works on remote nodes.
Clone Of:
Clones:	1832914 (view as bug list)
Environment:
Last Closed:	2020-11-04 02:28:16 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
proposed fix + tests (12.66 KB, patch) 2020-05-05 09:54 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:4617	0	None	None	None	2020-11-04 02:28:37 UTC

Description Michele Baldessari 2020-05-02 17:33:30 UTC

Description of problem:
So the following on rhel 8.1 used to work:
[root@compute-0 ~]# ls -l /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory
[root@compute-0 ~]# rpm -q pacemaker pcs
pacemaker-2.0.2-3.el8_1.2.x86_64
pcs-0.10.2-4.el8.x86_64
[root@compute-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat May  2 17:31:33 2020
Last change: Sat May  2 17:29:47 2020 by root via cibadmin on controller-0

11 nodes configured
2 resources configured

Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
RemoteOnline: [ compute-0 compute-1 ]

Full list of resources:

 compute-0      (ocf::pacemaker:remote):        Started controller-0
 compute-1      (ocf::pacemaker:remote):        Started controller-1

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/enabled


Whereas on rhel 8.2 the very same broke:
[root@compute-0 ~]# rpm -q pacemaker pcs
pacemaker-2.0.3-5.el8.x86_64
pcs-0.10.4-6.el8.x86_64
[root@compute-0 ~]# pcs status
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory
[root@compute-0 ~]# ls -l /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory


I.e. /etc/corosync/corosync.conf is never created, which is expected, but somehow only on rhel8.2 pcs does not like this.

crm_mon does work just fine on the remote node on both rhel 8.1 and rhel 8.2

Comment 1 Tomas Jelinek 2020-05-04 07:55:18 UTC

This regression was introduced when moving the status command to the new pcs architecture.


corosync.conf is needed there for two reasons:

1) get the cluster name
Here, pcs should check if the corosync.conf file exists. If it's missing, get the cluster name from CIB instead.

2) list nodes from corosync.conf to check if we can connect to pcsd on them
This wasn't working before either. Since we have no list of nodes when corosync.conf is missing, this should be just skipped.

# pcs status --full
Cluster name: rhel82
Cluster Summary:
  * Stack: corosync
  * Current DC: rh82-node2 (2) (version 2.0.3-5.el8-4b1f869f0f) - partition with quorum
  * Last updated: Mon May  4 09:51:45 2020
  * Last change:  Mon May  4 09:36:51 2020 by root via cibadmin on rh82-node2
  * 3 nodes configured
  * 5 resource instances configured

Node List:
  * Online: [ rh82-node2 (2) rh82-node3 (3) ]
  * RemoteOnline: [ rh82-node1 ]

Full List of Resources:
  * xvm (stonith:fence_xvm):    Started rh82-node3
  * d1  (ocf::pacemaker:Dummy): Started rh82-node1
  * d2  (ocf::pacemaker:Dummy): Started rh82-node3
  * d3  (ocf::pacemaker:Dummy): Started rh82-node2
  * rh82-node1  (ocf::pacemaker:remote):        Started rh82-node2

Migration Summary:


PCSD Status:
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

Comment 2 Tomas Jelinek 2020-05-05 09:54:12 UTC

Created attachment 1685156 [details]
proposed fix + tests

Test:
* add a remote node to a cluster: pcs cluster node add-remote ...
* run 'pcs status' on the remote node
* details in comment 0 and comment 1

Comment 7 Miroslav Lisik 2020-06-11 14:37:23 UTC

Test:

root@r8-node-01 rpms]# rpm -q pcs
pcs-0.10.6-1.el8.x86_64
[root@r8-node-02 ~]# rpm -q pcs
pcs-0.10.6-1.el8.x86_64

[root@r8-node-02 ~]# pcs status nodes
Pacemaker Nodes:
 Online: r8-node-01
 Standby:
 Standby with resource(s) running:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online: r8-node-02
 Standby:
 Standby with resource(s) running:
 Maintenance:
 Offline:
[root@r8-node-02 ~]# pcs status --full
Cluster name: HAcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: r8-node-01 (1) (version 2.0.3-5.el8-4b1f869f0f) - partition with quorum
  * Last updated: Thu Jun 11 16:35:22 2020
  * Last change:  Thu Jun 11 16:34:33 2020 by root via cibadmin on r8-node-01
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ r8-node-01 (1) ]
  * RemoteOnline: [ r8-node-02 ]

Full List of Resources:
  * fence-r8-node-01	(stonith:fence_xvm):	Started r8-node-01
  * fence-r8-node-02	(stonith:fence_xvm):	Started r8-node-01
  * r8-node-02	(ocf::pacemaker:remote):	Started r8-node-01

Migration Summary:

Tickets:

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/disabled

Comment 10 Nina Hostakova 2020-07-24 07:50:43 UTC

BEFORE_FIX
=========
[root@virt-044 ~]# rpm -q pcs
pcs-0.10.4-6.el8.x86_64

[root@virt-044 ~]# pcs cluster node add-remote virt-043
No addresses specified for host 'virt-043', using 'virt-043'
Sending 'pacemaker authkey' to 'virt-043'
virt-043: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-043'
virt-043: successful run of 'pacemaker_remote enable'
virt-043: successful run of 'pacemaker_remote start'

[root@virt-044 sts-rhel8.3]# pcs status --full
Cluster name: STSRHTS10850
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-044 (1) (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
  * Last updated: Fri Jul 24 09:35:34 2020
  * Last change:  Fri Jul 24 09:35:27 2020 by root via cibadmin on virt-044
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ virt-044 (1) virt-048 (2) ]
  * RemoteOnline: [ virt-043 ]

Full List of Resources:
  * fence-virt-044	(stonith:fence_xvm):	Started virt-048
  * fence-virt-048	(stonith:fence_xvm):	Started virt-048
  * virt-043	(ocf::pacemaker:remote):	Started virt-044

Migration Summary:

Tickets:

PCSD Status:
  virt-044: Online
  virt-048: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


# Check the remote node
[root@virt-043 ~]# pcs cluster corosync
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

[root@virt-043 ~]# pcs status 
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

> Status could not have been displayed on remote node because corosync.conf was unavailable 


AFTER_FIX
=========
[root@virt-158 ~]# rpm -q pcs
pcs-0.10.6-3.el8.x86_6

[root@virt-158 ~]# pcs cluster node add-remote virt-160
No addresses specified for host 'virt-160', using 'virt-160'
Sending 'pacemaker authkey' to 'virt-160'
virt-160: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-160'
virt-160: successful run of 'pacemaker_remote enable'
virt-160: successful run of 'pacemaker_remote start'

[root@virt-158 ~]# pcs status --full
Cluster name: STSRHTS32139
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-159 (2) (version 2.0.4-3.el8-2deceaa3ae) - partition with quorum
  * Last updated: Fri Jul 24 08:47:27 2020
  * Last change:  Fri Jul 24 08:46:30 2020 by root via cibadmin on virt-158
  * 3 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-158 (1) virt-159 (2) ]
  * RemoteOnline: [ virt-160 ]

Full List of Resources:
  * fence-virt-158	(stonith:fence_xvm):	 Started virt-159
  * fence-virt-159	(stonith:fence_xvm):	 Started virt-158
  * fence-virt-160	(stonith:fence_xvm):	 Started virt-159
  * virt-160	(ocf::pacemaker:remote):	 Started virt-158

Migration Summary:

Tickets:

PCSD Status:
  virt-158: Online
  virt-159: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


# Check the remote node
[root@virt-160 ~]# pcs cluster corosync
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

[root@virt-160 ~]# pcs status --full
Cluster name: STSRHTS32139
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-159 (2) (version 2.0.4-3.el8-2deceaa3ae) - partition with quorum
  * Last updated: Fri Jul 24 08:49:03 2020
  * Last change:  Fri Jul 24 08:46:30 2020 by root via cibadmin on virt-158
  * 3 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-158 (1) virt-159 (2) ]
  * RemoteOnline: [ virt-160 ]

Full List of Resources:
  * fence-virt-158	(stonith:fence_xvm):	 Started virt-159
  * fence-virt-159	(stonith:fence_xvm):	 Started virt-158
  * fence-virt-160	(stonith:fence_xvm):	 Started virt-159
  * virt-160	(ocf::pacemaker:remote):	 Started virt-158

Migration Summary:

Tickets:

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/enabled

> Instead of corosync.conf, cluster name is taken from CIB, PCSD status is skipped
> Status is available even though corosync.conf is not present on the remote node



Marking verified in pcs-0.10.6-3.el8.

Comment 13 errata-xmlrpc 2020-11-04 02:28:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4617

Note You need to log in before you can comment on or make changes to this bug.