Description of problem: pcs can get to a situation, when adding a new node to a cluster with sbd is impossible, because there are 2 conditions - one prompting the user that cluster has to be offline due to auto_tie_breaker and the other one that it is not possible to get CIB (because the cluster is offline). Pcs should be aware of this situation and provide a more intuitive output. Version-Release number of selected component (if applicable): found in pcs-0.11.4-6.el9 How reproducible: every time the number of nodes would be even after adding a node (thus the cluster would have auto_tie_breaker) and sbd without disks is used. Steps to Reproduce: ## enable sbd on 3 node cluster [root@virt-553 ~]# pcs stonith sbd enable Running SBD pre-enabling checks... virt-484: SBD pre-enabling checks done virt-493: SBD pre-enabling checks done virt-553: SBD pre-enabling checks done Distributing SBD config... virt-553: SBD config saved virt-493: SBD config saved virt-484: SBD config saved Enabling sbd... virt-493: sbd enabled virt-484: sbd enabled virt-553: sbd enabled Warning: Cluster restart is required in order to apply these changes. [root@virt-553 ~]# pcs cluster stop --all && pcs cluster start --all virt-553: Stopping Cluster (pacemaker)... virt-484: Stopping Cluster (pacemaker)... virt-493: Stopping Cluster (pacemaker)... virt-493: Stopping Cluster (corosync)... virt-553: Stopping Cluster (corosync)... virt-484: Stopping Cluster (corosync)... virt-553: Starting Cluster... virt-493: Starting Cluster... virt-484: Starting Cluster... ## Try to add new node to the cluster 1. in a running cluster [root@virt-553 ~]# pcs cluster node add virt-551 No addresses specified for host 'virt-551', using 'virt-551' No watchdog has been specified for node 'virt-551'. Using default watchdog '/dev/watchdog' Warning: auto_tie_breaker quorum option will be enabled to make SBD fencing effective. Cluster has to be offline to be able to make this change. Checking corosync is not running on nodes... Error: virt-493: corosync is running Error: virt-484: corosync is running Error: virt-553: corosync is running Running SBD pre-enabling checks... virt-551: SBD pre-enabling checks done Error: Errors have occurred, therefore pcs is unable to continue [root@virt-553 ~]# echo $? 1 > Node can't be added in a running cluster due to auto_tie_breaker sbd check. 2. in a stopped cluster [root@virt-553 ~]# pcs cluster stop --all virt-553: Stopping Cluster (pacemaker)... virt-493: Stopping Cluster (pacemaker)... virt-484: Stopping Cluster (pacemaker)... virt-484: Stopping Cluster (corosync)... virt-553: Stopping Cluster (corosync)... virt-493: Stopping Cluster (corosync)... [root@virt-553 ~]# pcs cluster node add virt-551 No addresses specified for host 'virt-551', using 'virt-551' No watchdog has been specified for node 'virt-551'. Using default watchdog '/dev/watchdog' Error: Unable to load CIB to get guest and remote nodes from it, those nodes cannot be considered in configuration validation, use --force to override Warning: auto_tie_breaker quorum option will be enabled to make SBD fencing effective. Cluster has to be offline to be able to make this change. Checking corosync is not running on nodes... virt-484: corosync is not running virt-553: corosync is not running virt-493: corosync is not running Running SBD pre-enabling checks... virt-551: SBD pre-enabling checks done Error: Errors have occurred, therefore pcs is unable to continue [root@virt-553 ~]# echo $? 1 > Node can't be added in a stopped cluster because CIB is unavailable. Actual results: In this state, it's not possible to add a node (without using --force), because the 2 checks are mutually exclusive - cluster needs to be stopped for auto_tie_breaker and cluster needs to be started to get CIB. Expected results: More intuitive error message, which explains the situation why the node in this state can never be added and what to do to solve it (for example disable sbd first). Alternative solutions can be discussed as well.
possible solutions: * enable auto_tie_breaker * disable sbd temporarily * use --force in pcs cluster node add
Upstream patch: https://github.com/ClusterLabs/pcs/commit/a0a6e4aadebe3cc782cf17c132813d728bc1552d Test: > Have a 3-node cluster with auto tie breaker disabled and SBD enabled [root@rh92-node1:~]# pcs quorum Options: auto_tie_breaker: 0 [root@rh92-node1:~]# pcs stonith sbd enable Running SBD pre-enabling checks... rh92-node1: SBD pre-enabling checks done rh92-node2: SBD pre-enabling checks done rh92-node3: SBD pre-enabling checks done Distributing SBD config... rh92-node1: SBD config saved rh92-node2: SBD config saved rh92-node3: SBD config saved Enabling sbd... rh92-node1: sbd enabled rh92-node3: sbd enabled rh92-node2: sbd enabled Warning: Cluster restart is required in order to apply these changes. [root@rh92-node1:~]# pcs cluster stop --all && pcs cluster start --all rh92-node2: Stopping Cluster (pacemaker)... rh92-node1: Stopping Cluster (pacemaker)... rh92-node3: Stopping Cluster (pacemaker)... rh92-node2: Stopping Cluster (corosync)... rh92-node3: Stopping Cluster (corosync)... rh92-node1: Stopping Cluster (corosync)... rh92-node1: Starting Cluster... rh92-node3: Starting Cluster... rh92-node2: Starting Cluster... > Try adding a node [root@rh92-node1:~]# pcs cluster node add rh92-node4 --enable --start No addresses specified for host 'rh92-node4', using 'rh92-node4' No watchdog has been specified for node 'rh92-node4'. Using default watchdog '/dev/watchdog' Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: Corosync is running on node 'rh92-node3' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Running SBD pre-enabling checks... rh92-node4: SBD pre-enabling checks done Error: Errors have occurred, therefore pcs is unable to continue > Try removing a node [root@rh92-node1:~]# pcs cluster node remove rh92-node3 Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Error: Errors have occurred, therefore pcs is unable to continue