Bug 1284404 - make restarting pcsd a synchronous operation
Summary: make restarting pcsd a synchronous operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcs
Version: 7.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-23 09:20 UTC by Tomas Jelinek
Modified: 2017-08-01 18:22 UTC (History)
7 users (show)

Fixed In Version: pcs-0.9.158-6.el7
Doc Type: Bug Fix
Doc Text:
Cause: Pcs restarts pcsd after distributing SSL certificates to the cluster nodes in order to reload the certificates. Consequence: Pcs does not wait for the restart to finish. Following pcs commands may exit with an error if they hit them moment pcsd is still being restarted. Fix: Make restarting pcsd on the nodes a synchronous operation. Result: Pcs waits for pcsd on the nodes to fully start.
Clone Of:
Environment:
Last Closed: 2017-08-01 18:22:57 UTC
Target Upstream Version:


Attachments (Terms of Use)
proposed fix (10.80 KB, patch)
2017-01-12 15:36 UTC, Tomas Jelinek
no flags Details | Diff
proposed fix (part2) (5.31 KB, patch)
2017-05-31 12:45 UTC, Ivan Devat
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1229822 None CLOSED [RFE] make "cluster setup --start", "cluster start" and "cluster standby" support --wait as well 2019-04-30 15:51:16 UTC
Red Hat Bugzilla 1394273 None CLOSED [cli] connection interrupted by pcsd restart results in a traceback 2019-04-30 15:51:16 UTC
Red Hat Bugzilla 1424944 None CLOSED [Ganesha] : Unable to bring up a Ganesha HA cluster on RHEL 6.9. 2019-04-30 15:51:16 UTC
Red Hat Bugzilla 1463327 None CLOSED Starting a larger cluster times out 2019-04-30 15:51:16 UTC
Red Hat Product Errata RHBA-2017:1958 normal SHIPPED_LIVE pcs bug fix and enhancement update 2017-08-01 18:09:47 UTC

Internal Links: 1229822 1394273 1424944 1463327

Description Tomas Jelinek 2015-11-23 09:20:19 UTC
We need to make restarting pcsd a synchronous operation so scripts calling pcs can reliably wait for the restart to finish and continue their execution. Otherwise a script may run a command when pcsd is being restarted which may cause the command fail.

Comment 5 Tomas Jelinek 2017-01-12 15:36:01 UTC
Created attachment 1239988 [details]
proposed fix

Test:

[root@rh73-node1:~]# pcs cluster setup --name test rh73-node1 rh73-node2 && pcs cluster start --all --wait
Destroying cluster on nodes: rh73-node1, rh73-node2...
rh73-node1: Stopping Cluster (pacemaker)...
rh73-node2: Stopping Cluster (pacemaker)...
rh73-node2: Successfully destroyed cluster
rh73-node1: Successfully destroyed cluster

Sending cluster config files to the nodes...
rh73-node1: Succeeded
rh73-node2: Succeeded

Synchronizing pcsd certificates on nodes rh73-node1, rh73-node2...
rh73-node1: Success
rh73-node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
rh73-node1: Success
rh73-node2: Success
rh73-node1: Starting Cluster...
rh73-node2: Starting Cluster...
Waiting for node(s) to start...
rh73-node2: Started
rh73-node1: Started

'pcs cluster start --all --wait' does not crash and successfully waits for the nodes to start.

Comment 7 Ivan Devat 2017-02-20 08:30:56 UTC
After Fix:

[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.156-1.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster setup --name=devcluster vm-rhel72-1 vm-rhel72-3 && pcs cluster start --all --wait
Destroying cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Stopping Cluster (pacemaker)...
vm-rhel72-3: Stopping Cluster (pacemaker)...
vm-rhel72-1: Successfully destroyed cluster
vm-rhel72-3: Successfully destroyed cluster

Sending cluster config files to the nodes...
vm-rhel72-1: Succeeded
vm-rhel72-3: Succeeded

Synchronizing pcsd certificates on nodes vm-rhel72-1, vm-rhel72-3...
vm-rhel72-3: Success
vm-rhel72-1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm-rhel72-3: Success
vm-rhel72-1: Success
vm-rhel72-1: Starting Cluster...
vm-rhel72-3: Starting Cluster...
Waiting for node(s) to start...
vm-rhel72-1: Started
vm-rhel72-3: Started

Comment 11 Tomas Jelinek 2017-05-30 08:12:48 UTC
Creating a new cluster from GUI fails with an error.

The GUI runs pcs cluster setup over network on one of nodes. The command restarts pcsd daemon on all nodes in the new cluster. Therefore the pcsd daemon, which was asked by GUI to run the command, gets restarted. This returns HTTP 400 to pcsd daemon on the GUI node. The GUI daemon returns an error to GUI and does not add the new cluster to a list of clusters.

Comment 12 Ivan Devat 2017-05-31 12:45:38 UTC
Created attachment 1283769 [details]
proposed fix (part2)

Comment 13 Ivan Devat 2017-05-31 12:46:55 UTC
After Fix:

[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.158-3.el7.x86_64

Open gui. Create a new cluster. Wait. The new cluster is added to a list of clusters.

Comment 20 errata-xmlrpc 2017-08-01 18:22:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958


Note You need to log in before you can comment on or make changes to this bug.