Bug 1284404
Summary: | make restarting pcsd a synchronous operation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Tomas Jelinek <tojeline> | ||||||
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.2 | CC: | cfeist, cluster-maint, idevat, omular, rsteiger, tlavigne, tojeline | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | pcs-0.9.158-6.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause:
Pcs restarts pcsd after distributing SSL certificates to the cluster nodes in order to reload the certificates.
Consequence:
Pcs does not wait for the restart to finish. Following pcs commands may exit with an error if they hit them moment pcsd is still being restarted.
Fix:
Make restarting pcsd on the nodes a synchronous operation.
Result:
Pcs waits for pcsd on the nodes to fully start.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-01 18:22:57 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Tomas Jelinek
2015-11-23 09:20:19 UTC
Created attachment 1239988 [details]
proposed fix
Test:
[root@rh73-node1:~]# pcs cluster setup --name test rh73-node1 rh73-node2 && pcs cluster start --all --wait
Destroying cluster on nodes: rh73-node1, rh73-node2...
rh73-node1: Stopping Cluster (pacemaker)...
rh73-node2: Stopping Cluster (pacemaker)...
rh73-node2: Successfully destroyed cluster
rh73-node1: Successfully destroyed cluster
Sending cluster config files to the nodes...
rh73-node1: Succeeded
rh73-node2: Succeeded
Synchronizing pcsd certificates on nodes rh73-node1, rh73-node2...
rh73-node1: Success
rh73-node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
rh73-node1: Success
rh73-node2: Success
rh73-node1: Starting Cluster...
rh73-node2: Starting Cluster...
Waiting for node(s) to start...
rh73-node2: Started
rh73-node1: Started
'pcs cluster start --all --wait' does not crash and successfully waits for the nodes to start.
After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.156-1.el7.x86_64 [vm-rhel72-1 ~] $ pcs cluster setup --name=devcluster vm-rhel72-1 vm-rhel72-3 && pcs cluster start --all --wait Destroying cluster on nodes: vm-rhel72-1, vm-rhel72-3... vm-rhel72-1: Stopping Cluster (pacemaker)... vm-rhel72-3: Stopping Cluster (pacemaker)... vm-rhel72-1: Successfully destroyed cluster vm-rhel72-3: Successfully destroyed cluster Sending cluster config files to the nodes... vm-rhel72-1: Succeeded vm-rhel72-3: Succeeded Synchronizing pcsd certificates on nodes vm-rhel72-1, vm-rhel72-3... vm-rhel72-3: Success vm-rhel72-1: Success Restarting pcsd on the nodes in order to reload the certificates... vm-rhel72-3: Success vm-rhel72-1: Success vm-rhel72-1: Starting Cluster... vm-rhel72-3: Starting Cluster... Waiting for node(s) to start... vm-rhel72-1: Started vm-rhel72-3: Started Creating a new cluster from GUI fails with an error. The GUI runs pcs cluster setup over network on one of nodes. The command restarts pcsd daemon on all nodes in the new cluster. Therefore the pcsd daemon, which was asked by GUI to run the command, gets restarted. This returns HTTP 400 to pcsd daemon on the GUI node. The GUI daemon returns an error to GUI and does not add the new cluster to a list of clusters. Created attachment 1283769 [details]
proposed fix (part2)
After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.158-3.el7.x86_64 Open gui. Create a new cluster. Wait. The new cluster is added to a list of clusters. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1958 |