Bug 1690419

Summary: Improve guest node error message when pacemaker_remote is running
Product: Red Hat Enterprise Linux 8 Reporter: michal novacek <mnovacek>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact: Steven J. Levine <slevine>
Priority: low    
Version: 8.0CC: cfeist, cluster-maint, idevat, lmiksik, mlisik, mmazoure, nhostako, omular, slevine, tojeline
Target Milestone: rcKeywords: EasyFix, Reopened, Triaged
Target Release: 8.5Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.10.8-3.el8 Doc Type: No Doc Update
Doc Text:
Improving an error message coming from pcs, no need to document it.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 17:33:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed fix + tests none

Description michal novacek 2019-03-19 12:52:02 UTC
Description of problem:
Actual error message does not clearly states that the problem is running service. It might seem that it's trying to start 'pacemaker_remote' instead of the fact that the problem is that it already is running.

Actual message:
Error: pool-10-37-165-14: Running cluster services: 'pacemaker_remote', the host seems to be in a cluster already

Expected message:
Error: pool-10-37-165-14: The host seems to be in a cluster already because the following services are found to be running: 'pacemaker_remote'
Error: pool-10-37-165-14: If the host is not part of the cluster stop the service(s) and retry.

Version-Release number of selected component (if applicable):
pcs-0.10.1-4.el8.x86_64


Additional info:

> $ pcs cluster node add-guest pool-10-37-165-14 R-pool-10-37-165-14 remote-connect-timeout=60'
Error: pool-10-37-165-14: Running cluster services: 'pacemaker_remote', the host seems to be in a cluster already
Error: Errors have occurred, therefore pcs is unable to continuepcs cluster node add-guest pool-10-37-165-14 R-pool-10-37-165-14 remote-connect-timeout=60'
Error: pool-10-37-165-14: Running cluster services: 'pacemaker_remote', the host seems to be in a cluster already
Error: Errors have occurred, therefore pcs is unable to continue

Comment 10 RHEL Program Management 2021-02-01 07:39:38 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 15 Tomas Jelinek 2021-06-24 12:32:43 UTC
Created attachment 1793888 [details]
proposed fix + tests

Test:

# pcs cluster node add-guest rh84-node3 d2
No addresses specified for host 'rh84-node3', using 'rh84-node3'
Error: rh84-node3: Required cluster services not installed: 'pacemaker_remote'
Error: rh84-node3: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Error: rh84-node3: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'rh84-node3' to remove those configuration files
Error: Errors have occurred, therefore pcs is unable to continue

Comment 16 Miroslav Lisik 2021-07-09 07:12:37 UTC
Test:

[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64

There is single node cluster running on r8-node-03 host

[root@r8-node-01 ~]# pcs resource create d-01 ocf:pacemaker:Dummy
[root@r8-node-01 ~]# pcs cluster node add-guest r8-node-03 d-01
No addresses specified for host 'r8-node-03', using 'r8-node-03'
Error: r8-node-03: Required cluster services not installed: 'pacemaker_remote'
Error: r8-node-03: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Error: r8-node-03: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'r8-node-03' to remove those configuration files
Error: Errors have occurred, therefore pcs is unable to continue

Comment 21 Michal Mazourek 2021-07-26 11:37:43 UTC
BEFORE:
=======

[root@virt-267 ~]# rpm -q pcs
pcs-0.10.8-2.el8.x86_64


## on a guest node
[root@virt-268 ~]# systemctl status pacemaker_remote
● pacemaker_remote.service - Pacemaker Remote executor daemon
   Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-07-26 10:40:15 CEST; 29s ago
{...}
[root@virt-268 ~]# pcs status
Error: error running crm_mon, is pacemaker running?
  Could not connect to the CIB: Transport endpoint is not connected
  crm_mon: Error: remote-node not connected to cluster


## on a local node
[root@virt-267 ~]# pcs cluster node add-guest virt-268 r1 
No addresses specified for host 'virt-268', using 'virt-268'
Error: virt-268: Running cluster services: 'pacemaker_remote', the host seems to be in a cluster already
Error: Errors have occurred, therefore pcs is unable to continue


AFTER:
======

[root@virt-290 ~]# rpm -q pcs
pcs-0.10.8-3.el8.x86_64


1. Running pacemaker_remote on a guest node

## on a guest node
[root@virt-291 ~]# systemctl status pacemaker_remote
● pacemaker_remote.service - Pacemaker Remote executor daemon
   Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-07-26 11:26:33 CEST; 1s ago
{...}
[root@virt-291 ~]# pcs status
Error: error running crm_mon, is pacemaker running?
  Could not connect to the CIB: Transport endpoint is not connected
  crm_mon: Error: remote-node not connected to cluster


## on a local node
[root@virt-290 ~]# pcs cluster node add-guest virt-291 r1
No addresses specified for host 'virt-291', using 'virt-291'
Error: virt-291: The host seems to be in a cluster already as the following service is found to be running: 'pacemaker_remote'. If the host is not part of a cluster, stop the service and retry
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-290 ~]# echo $?
1

> OK: The error message has improved, running service is shown and it is hinted to stop the service if the node is not part of the cluster


2. Setup cluster on a node, where is pacemaker_remote running

[root@virt-291 ~]# pcs cluster setup testcluster virt-291 --start --wait
No addresses specified for host 'virt-291', using 'virt-291'
Error: virt-291: The host seems to be in a cluster already as the following service is found to be running: 'pacemaker_remote'. If the host is not part of a cluster, stop the service and retry, use --force to override
Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to override
Error: Errors have occurred, therefore pcs is unable to continue

> OK: Same as the error message above, it is hinted to stop pacemaker_remote first, if the node is not in cluster


3. Running cluster on a guest node

## on a guest node
[root@virt-291 ~]# systemctl stop pacemaker_remote
[root@virt-291 ~]# pcs cluster setup testcluster virt-291 --start --wait
No addresses specified for host 'virt-291', using 'virt-291'
Destroying cluster on hosts: 'virt-291'...
virt-291: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'virt-291'
virt-291: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'virt-291'
virt-291: successful distribution of the file 'corosync authkey'
virt-291: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'virt-291'
virt-291: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'virt-291'...
Waiting for node(s) to start: 'virt-291'...
virt-291: Cluster started


## on a local node
[root@virt-290 ~]# pcs cluster node add-guest virt-291 r1
No addresses specified for host 'virt-291', using 'virt-291'
Error: virt-291: The host seems to be in a cluster already as the following services are found to be running: 'corosync', 'pacemaker'. If the host is not part of a cluster, stop the services and retry
Error: virt-291: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'virt-291' to remove those configuration files
Error: Errors have occurred, therefore pcs is unable to continue
[root@virt-290 ~]# echo $?
1

> OK: The improved error message is showing running services (corosync and pacemaker) and hinting to stop the services, if the node is not part of the cluster. Also it's saying, that the config files have been found (as the node is actually part of the cluster) and hinting to run cluster destroy command to delete these files.


4. No running services on the guest node

## on a guest node
[root@virt-291 ~]# systemctl status pacemaker_remote
● pacemaker_remote.service - Pacemaker Remote executor daemon
   Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
{...}
[root@virt-291 ~]# pcs status
Error: error running crm_mon, is pacemaker running?
  crm_mon: Error: cluster is not available on this node


## on a local node

[root@virt-290 ~]# pcs cluster node add-guest virt-291 r1
No addresses specified for host 'virt-291', using 'virt-291'
Sending 'pacemaker authkey' to 'virt-291'
virt-291: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'virt-291'
virt-291: successful run of 'pacemaker_remote enable'
virt-291: successful run of 'pacemaker_remote start'
[root@virt-290 ~]# echo $?
0

> OK


Marking as VERIFIED in pcs-0.10.8-3.el8

Comment 23 errata-xmlrpc 2021-11-09 17:33:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcs security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4142