Bug 2175797
| Summary: | Improve error message when adding a new node to a cluster with sbd is not possible | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Michal Mazourek <mmazoure> | |
| Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | |
| Status: | POST --- | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | low | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 9.2 | CC: | cluster-maint, idevat, mlisik, mpospisi, omular, tojeline | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 9.4 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Enhancement | ||
| Doc Text: |
Feature:
Provide a guidance in error messages when adding or removing a node in a cluster with odd number of nodes, SBD enabled without disks, and auto_tie_breaker disabled.
Reason:
Originally, pcs in this situation just informed, that it was going to enable auto_tie_breaker, and then exited with an error saying corosync was running. This was not explanatory and it didn't provide enough information for users to solve this issue.
Result:
Error messages have been updated. They now explain that auto_tie_breaker must be enabled due to SBD and provide instructions to stop the cluster, enable auto_tie_breaker, start the cluster and run command for adding or removing a node again.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2227234 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2227234 | |||
|
Description
Michal Mazourek
2023-03-06 14:59:59 UTC
possible solutions: * enable auto_tie_breaker * disable sbd temporarily * use --force in pcs cluster node add Upstream patch: https://github.com/ClusterLabs/pcs/commit/a0a6e4aadebe3cc782cf17c132813d728bc1552d Test: > Have a 3-node cluster with auto tie breaker disabled and SBD enabled [root@rh92-node1:~]# pcs quorum Options: auto_tie_breaker: 0 [root@rh92-node1:~]# pcs stonith sbd enable Running SBD pre-enabling checks... rh92-node1: SBD pre-enabling checks done rh92-node2: SBD pre-enabling checks done rh92-node3: SBD pre-enabling checks done Distributing SBD config... rh92-node1: SBD config saved rh92-node2: SBD config saved rh92-node3: SBD config saved Enabling sbd... rh92-node1: sbd enabled rh92-node3: sbd enabled rh92-node2: sbd enabled Warning: Cluster restart is required in order to apply these changes. [root@rh92-node1:~]# pcs cluster stop --all && pcs cluster start --all rh92-node2: Stopping Cluster (pacemaker)... rh92-node1: Stopping Cluster (pacemaker)... rh92-node3: Stopping Cluster (pacemaker)... rh92-node2: Stopping Cluster (corosync)... rh92-node3: Stopping Cluster (corosync)... rh92-node1: Stopping Cluster (corosync)... rh92-node1: Starting Cluster... rh92-node3: Starting Cluster... rh92-node2: Starting Cluster... > Try adding a node [root@rh92-node1:~]# pcs cluster node add rh92-node4 --enable --start No addresses specified for host 'rh92-node4', using 'rh92-node4' No watchdog has been specified for node 'rh92-node4'. Using default watchdog '/dev/watchdog' Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: Corosync is running on node 'rh92-node3' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Running SBD pre-enabling checks... rh92-node4: SBD pre-enabling checks done Error: Errors have occurred, therefore pcs is unable to continue > Try removing a node [root@rh92-node1:~]# pcs cluster node remove rh92-node3 Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Error: Errors have occurred, therefore pcs is unable to continue |