Hide Forgot
Description of problem: When stopping multiple nodes, pcs first stops pacemaker on the nodes, and only after pacemaker on all the nodes has stopped, it proceeds with stopping corosync / cman. If pcs is not able to stop pacemaker on at least one node, it exits with an error leaving corosync on the rest of the nodes running. Version-Release number of selected component (if applicable): pcs-0.9.148-7.el6_8.1.x86_64, pcs-0.9.152-8.el7.x86_64 How reproducible: always, easily Steps to Reproduce: [root@rh68-node1:~]# service pcsd stop Stopping pcsd: [ OK ] [root@rh68-node1:~]# pcs status nodes both Corosync Nodes: Online: rh68-node1 rh68-node2 rh68-node3 Offline: Pacemaker Nodes: Online: rh68-node1 rh68-node2 rh68-node3 Standby: Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: [root@rh68-node1:~]# pcs cluster stop --all rh68-node1: Unable to connect to rh68-node1 ([Errno 111] Connection refused) rh68-node3: Stopping Cluster (pacemaker)... rh68-node2: Stopping Cluster (pacemaker)... Error: unable to stop all nodes rh68-node1: Unable to connect to rh68-node1 ([Errno 111] Connection refused) [root@rh68-node1:~]# pcs status nodes both Corosync Nodes: Online: rh68-node1 rh68-node2 rh68-node3 Offline: Pacemaker Nodes: Online: rh68-node1 Standby: rh68-node2 rh68-node3 Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: Actual results: corosync running on node2 and node3, where pacemaker has been stopped Expected results: corosync stopped on node2 and node3 Additional info:
originally reported here: http://clusterlabs.org/pipermail/users/2016-September/004152.html
Workaround: run "pcs cluster stop" again and specify only the nodes on which pacemaker has been stopped successfully.
Tomas, Thank you very much for converting my pacemaker issue to this bugzilla. I had no idea that was even possible. I was able to cleanly reproduce this problem again today, so if you need any more information about my config, pacemaker or corosync logs, I will be happy to provide them. Now that I have a new RedHat bugzilla account, am I at liberty to open bugs here myself in the future? Thank you for your support. - Scott Greenlese - Linux on System Z - System Test - IBM Corp.
Another question. I don't see any way for me to get on the .cc for email notifications. In fact, I see that my mail ID swgreenl.com is "excluded". Is it possible for me to get on the Mail To list? Thanks again... - Scott
Please strike that last comment. I see that I am, in fact on the .cc mailing list. (The exclusion was when it sent out my comment to the .cc list, it excluded me because I was the author. ) Sorry for the confusion.
(In reply to Scott Greenlese from comment #4) > Tomas, > > Thank you very much for converting my pacemaker issue to this bugzilla. I > had no idea that was even possible. > > I was able to cleanly reproduce this problem again today, so if you need any > more information about my config, pacemaker or corosync logs, I will be > happy to provide them. Thanks, I am good. I have all information I need to reproduce and fix the bug. > > Now that I have a new RedHat bugzilla account, am I at liberty to open bugs > here myself in the future? I am not 100% sure what permissions new accounts have but I believe you can open bugs here. > > Thank you for your support. > > - Scott Greenlese - Linux on System Z - System Test - IBM Corp.
Created attachment 1210505 [details] proposed fix Test: [root@rh68-node1:~]# pcs status pcsd rh68-node1: Online rh68-node2: Online rh68-node3: Offline [root@rh68-node1:~]# pcs cluster stop --all rh68-node3: Unable to connect to rh68-node3 ([Errno 113] No route to host) rh68-node1: Stopping Cluster (pacemaker)... rh68-node2: Stopping Cluster (pacemaker)... Error: unable to stop all nodes rh68-node3: Unable to connect to rh68-node3 ([Errno 113] No route to host) rh68-node3: Not stopping cluster - node is unreachable rh68-node1: Stopping Cluster (cman)... rh68-node2: Stopping Cluster (cman)... Error: unable to stop all nodes [root@rh68-node1:~]# echo $? 1 [root@rh68-node1:~]# service cman status corosync is stopped [root@rh68-node1:~]# service corosync status corosync is stopped
Before Fix: [vm-rhel67-1 ~] $ rpm -q pcs pcs-0.9.154-1.el6.x86_64 [vm-rhel67-1 ~] $ service pcsd stop Stopping pcsd: [ OK ] [vm-rhel67-1 ~] $ pcs status pcsd vm-rhel67-1: Offline vm-rhel67-2: Online vm-rhel67-3: Online [vm-rhel67-1 ~] $ pcs cluster stop --all vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-3: Stopping Cluster (pacemaker)... vm-rhel67-2: Stopping Cluster (pacemaker)... Error: unable to stop all nodes vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) [vm-rhel67-1 ~] $ echo $? 1 [vm-rhel67-1 ~] $ pcs status nodes both Corosync Nodes: Online: vm-rhel67-1 vm-rhel67-2 Offline: vm-rhel67-3 Pacemaker Nodes: Online: vm-rhel67-1 Standby: vm-rhel67-2 Maintenance: Offline: vm-rhel67-3 Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: After Fix: [vm-rhel67-1 ~] $ rpm -q pcs pcs-0.9.155-1.el6.x86_64 [vm-rhel67-1 ~] $ service pcsd stop Stopping pcsd: [ OK ] [vm-rhel67-1 ~] $ pcs status pcsd vm-rhel67-1: Offline vm-rhel67-3: Online vm-rhel67-2: Online [vm-rhel67-1 ~] $ pcs cluster stop --all vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-2: Stopping Cluster (pacemaker)... vm-rhel67-3: Stopping Cluster (pacemaker)... Error: unable to stop all nodes vm-rhel67-1: Unable to connect to vm-rhel67-1 ([Errno 111] Connection refused) vm-rhel67-1: Not stopping cluster - node is unreachable vm-rhel67-3: Stopping Cluster (cman)... vm-rhel67-2: Stopping Cluster (cman)... Error: unable to stop all nodes [vm-rhel67-1 ~] $ echo $? 1 [vm-rhel67-1 ~] $ pcs status nodes both Corosync Nodes: Online: vm-rhel67-1 Offline: vm-rhel67-2 vm-rhel67-3 Pacemaker Nodes: Online: vm-rhel67-1 Standby: Maintenance: Offline: vm-rhel67-2 vm-rhel67-3 Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0707.html