Bug 2175797
Summary: | Improve error message when adding a new node to a cluster with sbd is not possible | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Michal Mazourek <mmazoure> | |
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | |
Status: | CLOSED MIGRATED | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | low | Docs Contact: | ||
Priority: | medium | |||
Version: | 9.2 | CC: | cluster-maint, idevat, mlisik, mpospisi, omular, tojeline | |
Target Milestone: | rc | Keywords: | MigratedToJIRA, Triaged | |
Target Release: | 9.4 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Enhancement | ||
Doc Text: |
Feature:
Provide a guidance in error messages when adding or removing a node in a cluster with odd number of nodes, SBD enabled without disks, and auto_tie_breaker disabled.
Reason:
Originally, pcs in this situation just informed, that it was going to enable auto_tie_breaker, and then exited with an error saying corosync was running. This was not explanatory and it didn't provide enough information for users to solve this issue.
Result:
Error messages have been updated. They now explain that auto_tie_breaker must be enabled due to SBD and provide instructions to stop the cluster, enable auto_tie_breaker, start the cluster and run command for adding or removing a node again.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2227234 (view as bug list) | Environment: | ||
Last Closed: | 2023-09-22 20:05:14 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2227234 |
Description
Michal Mazourek
2023-03-06 14:59:59 UTC
possible solutions: * enable auto_tie_breaker * disable sbd temporarily * use --force in pcs cluster node add Upstream patch: https://github.com/ClusterLabs/pcs/commit/a0a6e4aadebe3cc782cf17c132813d728bc1552d Test: > Have a 3-node cluster with auto tie breaker disabled and SBD enabled [root@rh92-node1:~]# pcs quorum Options: auto_tie_breaker: 0 [root@rh92-node1:~]# pcs stonith sbd enable Running SBD pre-enabling checks... rh92-node1: SBD pre-enabling checks done rh92-node2: SBD pre-enabling checks done rh92-node3: SBD pre-enabling checks done Distributing SBD config... rh92-node1: SBD config saved rh92-node2: SBD config saved rh92-node3: SBD config saved Enabling sbd... rh92-node1: sbd enabled rh92-node3: sbd enabled rh92-node2: sbd enabled Warning: Cluster restart is required in order to apply these changes. [root@rh92-node1:~]# pcs cluster stop --all && pcs cluster start --all rh92-node2: Stopping Cluster (pacemaker)... rh92-node1: Stopping Cluster (pacemaker)... rh92-node3: Stopping Cluster (pacemaker)... rh92-node2: Stopping Cluster (corosync)... rh92-node3: Stopping Cluster (corosync)... rh92-node1: Stopping Cluster (corosync)... rh92-node1: Starting Cluster... rh92-node3: Starting Cluster... rh92-node2: Starting Cluster... > Try adding a node [root@rh92-node1:~]# pcs cluster node add rh92-node4 --enable --start No addresses specified for host 'rh92-node4', using 'rh92-node4' No watchdog has been specified for node 'rh92-node4'. Using default watchdog '/dev/watchdog' Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: Corosync is running on node 'rh92-node3' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Running SBD pre-enabling checks... rh92-node4: SBD pre-enabling checks done Error: Errors have occurred, therefore pcs is unable to continue > Try removing a node [root@rh92-node1:~]# pcs cluster node remove rh92-node3 Checking that corosync is not running on nodes... Error: Corosync is running on node 'rh92-node1' Error: Corosync is running on node 'rh92-node2' Error: SBD fencing is enabled in the cluster. To keep it effective, auto_tie_breaker quorum option needs to be enabled. This can only be done when the cluster is stopped. To proceed, stop the cluster, enable auto_tie_breaker, and start the cluster. Then, repeat the requested action. Use commands 'pcs cluster stop --all', 'pcs quorum update auto_tie_breaker=1', 'pcs cluster start --all'. Error: Errors have occurred, therefore pcs is unable to continue Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |