| Summary: | when pcs cannot stop pacemaker on a node, it does not stop cman/corosync on the remaining nodes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Tomas Jelinek <tojeline> | ||||
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.8 | CC: | cfeist, cluster-maint, idevat, omular, rsteiger, swgreenl, tojeline | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcs-0.9.155-1.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause:
User wants to stop a cluster with some of the nodes unreachable.
Consequence:
Pcs exits gracefully with an error, saying the nodes are unreachable. Corosync/cman is however left running on the reachable nodes.
Fix:
Make pcs proceed and stop corosync/cman on reachable nodes.
Result:
Cluster is fully stopped on all nodes where possible.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-03-21 11:04:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Tomas Jelinek
2016-09-29 12:18:39 UTC
originally reported here: http://clusterlabs.org/pipermail/users/2016-September/004152.html Workaround: run "pcs cluster stop" again and specify only the nodes on which pacemaker has been stopped successfully. Tomas, Thank you very much for converting my pacemaker issue to this bugzilla. I had no idea that was even possible. I was able to cleanly reproduce this problem again today, so if you need any more information about my config, pacemaker or corosync logs, I will be happy to provide them. Now that I have a new RedHat bugzilla account, am I at liberty to open bugs here myself in the future? Thank you for your support. - Scott Greenlese - Linux on System Z - System Test - IBM Corp. Another question. I don't see any way for me to get on the .cc for email notifications. In fact, I see that my mail ID swgreenl.com is "excluded". Is it possible for me to get on the Mail To list? Thanks again... - Scott Please strike that last comment. I see that I am, in fact on the .cc mailing list. (The exclusion was when it sent out my comment to the .cc list, it excluded me because I was the author. ) Sorry for the confusion. (In reply to Scott Greenlese from comment #4) > Tomas, > > Thank you very much for converting my pacemaker issue to this bugzilla. I > had no idea that was even possible. > > I was able to cleanly reproduce this problem again today, so if you need any > more information about my config, pacemaker or corosync logs, I will be > happy to provide them. Thanks, I am good. I have all information I need to reproduce and fix the bug. > > Now that I have a new RedHat bugzilla account, am I at liberty to open bugs > here myself in the future? I am not 100% sure what permissions new accounts have but I believe you can open bugs here. > > Thank you for your support. > > - Scott Greenlese - Linux on System Z - System Test - IBM Corp. Created attachment 1210505 [details]
proposed fix
Test:
[root@rh68-node1:~]# pcs status pcsd
rh68-node1: Online
rh68-node2: Online
rh68-node3: Offline
[root@rh68-node1:~]# pcs cluster stop --all
rh68-node3: Unable to connect to rh68-node3 ([Errno 113] No route to host)
rh68-node1: Stopping Cluster (pacemaker)...
rh68-node2: Stopping Cluster (pacemaker)...
Error: unable to stop all nodes
rh68-node3: Unable to connect to rh68-node3 ([Errno 113] No route to host)
rh68-node3: Not stopping cluster - node is unreachable
rh68-node1: Stopping Cluster (cman)...
rh68-node2: Stopping Cluster (cman)...
Error: unable to stop all nodes
[root@rh68-node1:~]# echo $?
1
[root@rh68-node1:~]# service cman status
corosync is stopped
[root@rh68-node1:~]# service corosync status
corosync is stopped
Before Fix: [vm-rhel67-1 ~] $ rpm -q pcs pcs-0.9.154-1.el6.x86_64 [vm-rhel67-1 ~] $ service pcsd stop Stopping pcsd: [ OK ] [vm-rhel67-1 ~] $ pcs status pcsd vm-rhel67-1: Offline vm-rhel67-2: Online vm-rhel67-3: Online [vm-rhel67-1 ~] $ pcs cluster stop --all vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-3: Stopping Cluster (pacemaker)... vm-rhel67-2: Stopping Cluster (pacemaker)... Error: unable to stop all nodes vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) [vm-rhel67-1 ~] $ echo $? 1 [vm-rhel67-1 ~] $ pcs status nodes both Corosync Nodes: Online: vm-rhel67-1 vm-rhel67-2 Offline: vm-rhel67-3 Pacemaker Nodes: Online: vm-rhel67-1 Standby: vm-rhel67-2 Maintenance: Offline: vm-rhel67-3 Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: After Fix: [vm-rhel67-1 ~] $ rpm -q pcs pcs-0.9.155-1.el6.x86_64 [vm-rhel67-1 ~] $ service pcsd stop Stopping pcsd: [ OK ] [vm-rhel67-1 ~] $ pcs status pcsd vm-rhel67-1: Offline vm-rhel67-3: Online vm-rhel67-2: Online [vm-rhel67-1 ~] $ pcs cluster stop --all vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-2: Stopping Cluster (pacemaker)... vm-rhel67-3: Stopping Cluster (pacemaker)... Error: unable to stop all nodes vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-1: Not stopping cluster - node is unreachable vm-rhel67-3: Stopping Cluster (cman)... vm-rhel67-2: Stopping Cluster (cman)... Error: unable to stop all nodes [vm-rhel67-1 ~] $ echo $? 1 [vm-rhel67-1 ~] $ pcs status nodes both Corosync Nodes: Online: vm-rhel67-1 Offline: vm-rhel67-2 vm-rhel67-3 Pacemaker Nodes: Online: vm-rhel67-1 Standby: Maintenance: Offline: vm-rhel67-2 vm-rhel67-3 Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0707.html |