Bug 1830552

Summary: pcs status on remotes is not working on rhel8.2 any longer
Product: Red Hat Enterprise Linux 8 Reporter: Michele Baldessari <michele>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 8.2CC: cfeist, cluster-maint, idevat, mlisik, mmazoure, mpospisi, nhostako, omular, tojeline
Target Milestone: rcKeywords: Regression, ZStream
Target Release: 8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.10.6-1.el8 Doc Type: Bug Fix
Doc Text:
Cause: User runs 'pcs status' on a remote node. Consequence: Pcs exits with an error complaining corosync.conf is missing. This is wrong as corosync.conf is expected to be missing on remote nodes. Fix: If corosync.conf is missing, read cluster name from CIB instead of corosync.conf. Gracefully skip obtaining and displaying information which depend on corosync.conf presence. Result: The 'pcs status' command works on remote nodes.
Story Points: ---
Clone Of:
: 1832914 (view as bug list) Environment:
Last Closed: 2020-11-04 02:28:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1832914    
Attachments:
Description Flags
proposed fix + tests none

Description Michele Baldessari 2020-05-02 17:33:30 UTC
Description of problem:
So the following on rhel 8.1 used to work:
[root@compute-0 ~]# ls -l /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory
[root@compute-0 ~]# rpm -q pacemaker pcs
pacemaker-2.0.2-3.el8_1.2.x86_64
pcs-0.10.2-4.el8.x86_64
[root@compute-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat May  2 17:31:33 2020
Last change: Sat May  2 17:29:47 2020 by root via cibadmin on controller-0

11 nodes configured
2 resources configured

Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
RemoteOnline: [ compute-0 compute-1 ]

Full list of resources:

 compute-0      (ocf::pacemaker:remote):        Started controller-0
 compute-1      (ocf::pacemaker:remote):        Started controller-1

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/enabled


Whereas on rhel 8.2 the very same broke:
[root@compute-0 ~]# rpm -q pacemaker pcs
pacemaker-2.0.3-5.el8.x86_64
pcs-0.10.4-6.el8.x86_64
[root@compute-0 ~]# pcs status
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory
[root@compute-0 ~]# ls -l /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory


I.e. /etc/corosync/corosync.conf is never created, which is expected, but somehow only on rhel8.2 pcs does not like this.

crm_mon does work just fine on the remote node on both rhel 8.1 and rhel 8.2

Comment 1 Tomas Jelinek 2020-05-04 07:55:18 UTC
This regression was introduced when moving the status command to the new pcs architecture.


corosync.conf is needed there for two reasons:

1) get the cluster name
Here, pcs should check if the corosync.conf file exists. If it's missing, get the cluster name from CIB instead.

2) list nodes from corosync.conf to check if we can connect to pcsd on them
This wasn't working before either. Since we have no list of nodes when corosync.conf is missing, this should be just skipped.

# pcs status --full
Cluster name: rhel82
Cluster Summary:
  * Stack: corosync
  * Current DC: rh82-node2 (2) (version 2.0.3-5.el8-4b1f869f0f) - partition with quorum
  * Last updated: Mon May  4 09:51:45 2020
  * Last change:  Mon May  4 09:36:51 2020 by root via cibadmin on rh82-node2
  * 3 nodes configured
  * 5 resource instances configured

Node List:
  * Online: [ rh82-node2 (2) rh82-node3 (3) ]
  * RemoteOnline: [ rh82-node1 ]

Full List of Resources:
  * xvm (stonith:fence_xvm):    Started rh82-node3
  * d1  (ocf::pacemaker:Dummy): Started rh82-node1
  * d2  (ocf::pacemaker:Dummy): Started rh82-node3
  * d3  (ocf::pacemaker:Dummy): Started rh82-node2
  * rh82-node1  (ocf::pacemaker:remote):        Started rh82-node2

Migration Summary:


PCSD Status:
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

Comment 2 Tomas Jelinek 2020-05-05 09:54:12 UTC
Created attachment 1685156 [details]
proposed fix + tests

Test:
* add a remote node to a cluster: pcs cluster node add-remote ...
* run 'pcs status' on the remote node
* details in comment 0 and comment 1

Comment 7 Miroslav Lisik 2020-06-11 14:37:23 UTC
Test:

root@r8-node-01 rpms]# rpm -q pcs
pcs-0.10.6-1.el8.x86_64
[root@r8-node-02 ~]# rpm -q pcs
pcs-0.10.6-1.el8.x86_64

[root@r8-node-02 ~]# pcs status nodes
Pacemaker Nodes:
 Online: r8-node-01
 Standby:
 Standby with resource(s) running:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online: r8-node-02
 Standby:
 Standby with resource(s) running:
 Maintenance:
 Offline:
[root@r8-node-02 ~]# pcs status --full
Cluster name: HAcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: r8-node-01 (1) (version 2.0.3-5.el8-4b1f869f0f) - partition with quorum
  * Last updated: Thu Jun 11 16:35:22 2020
  * Last change:  Thu Jun 11 16:34:33 2020 by root via cibadmin on r8-node-01
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ r8-node-01 (1) ]
  * RemoteOnline: [ r8-node-02 ]

Full List of Resources:
  * fence-r8-node-01	(stonith:fence_xvm):	Started r8-node-01
  * fence-r8-node-02	(stonith:fence_xvm):	Started r8-node-01
  * r8-node-02	(ocf::pacemaker:remote):	Started r8-node-01

Migration Summary:

Tickets:

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/disabled

Comment 10 Nina Hostakova 2020-07-24 07:50:43 UTC
BEFORE_FIX
=========
[root@virt-044 ~]# rpm -q pcs
pcs-0.10.4-6.el8.x86_64

[root@virt-044 ~]# pcs cluster node add-remote virt-043
No addresses specified for host 'virt-043', using 'virt-043'
Sending 'pacemaker authkey' to 'virt-043'
virt-043: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-043'
virt-043: successful run of 'pacemaker_remote enable'
virt-043: successful run of 'pacemaker_remote start'

[root@virt-044 sts-rhel8.3]# pcs status --full
Cluster name: STSRHTS10850
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-044 (1) (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
  * Last updated: Fri Jul 24 09:35:34 2020
  * Last change:  Fri Jul 24 09:35:27 2020 by root via cibadmin on virt-044
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ virt-044 (1) virt-048 (2) ]
  * RemoteOnline: [ virt-043 ]

Full List of Resources:
  * fence-virt-044	(stonith:fence_xvm):	Started virt-048
  * fence-virt-048	(stonith:fence_xvm):	Started virt-048
  * virt-043	(ocf::pacemaker:remote):	Started virt-044

Migration Summary:

Tickets:

PCSD Status:
  virt-044: Online
  virt-048: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


# Check the remote node
[root@virt-043 ~]# pcs cluster corosync
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

[root@virt-043 ~]# pcs status 
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

> Status could not have been displayed on remote node because corosync.conf was unavailable 


AFTER_FIX
=========
[root@virt-158 ~]# rpm -q pcs
pcs-0.10.6-3.el8.x86_6

[root@virt-158 ~]# pcs cluster node add-remote virt-160
No addresses specified for host 'virt-160', using 'virt-160'
Sending 'pacemaker authkey' to 'virt-160'
virt-160: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-160'
virt-160: successful run of 'pacemaker_remote enable'
virt-160: successful run of 'pacemaker_remote start'

[root@virt-158 ~]# pcs status --full
Cluster name: STSRHTS32139
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-159 (2) (version 2.0.4-3.el8-2deceaa3ae) - partition with quorum
  * Last updated: Fri Jul 24 08:47:27 2020
  * Last change:  Fri Jul 24 08:46:30 2020 by root via cibadmin on virt-158
  * 3 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-158 (1) virt-159 (2) ]
  * RemoteOnline: [ virt-160 ]

Full List of Resources:
  * fence-virt-158	(stonith:fence_xvm):	 Started virt-159
  * fence-virt-159	(stonith:fence_xvm):	 Started virt-158
  * fence-virt-160	(stonith:fence_xvm):	 Started virt-159
  * virt-160	(ocf::pacemaker:remote):	 Started virt-158

Migration Summary:

Tickets:

PCSD Status:
  virt-158: Online
  virt-159: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


# Check the remote node
[root@virt-160 ~]# pcs cluster corosync
Error: Unable to read /etc/corosync/corosync.conf: No such file or directory

[root@virt-160 ~]# pcs status --full
Cluster name: STSRHTS32139
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-159 (2) (version 2.0.4-3.el8-2deceaa3ae) - partition with quorum
  * Last updated: Fri Jul 24 08:49:03 2020
  * Last change:  Fri Jul 24 08:46:30 2020 by root via cibadmin on virt-158
  * 3 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-158 (1) virt-159 (2) ]
  * RemoteOnline: [ virt-160 ]

Full List of Resources:
  * fence-virt-158	(stonith:fence_xvm):	 Started virt-159
  * fence-virt-159	(stonith:fence_xvm):	 Started virt-158
  * fence-virt-160	(stonith:fence_xvm):	 Started virt-159
  * virt-160	(ocf::pacemaker:remote):	 Started virt-158

Migration Summary:

Tickets:

Daemon Status:
  corosync: inactive/disabled
  pacemaker: inactive/disabled
  pacemaker_remote: active/enabled
  pcsd: active/enabled

> Instead of corosync.conf, cluster name is taken from CIB, PCSD status is skipped
> Status is available even though corosync.conf is not present on the remote node



Marking verified in pcs-0.10.6-3.el8.

Comment 13 errata-xmlrpc 2020-11-04 02:28:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4617