Bug 1794062
Summary: | Starting cluster using --wait on all nodes in parallel often ends up with error | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Ken Gaillot <kgaillot> | ||||
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | unspecified | Docs Contact: | Steven J. Levine <slevine> | ||||
Priority: | medium | ||||||
Version: | 8.2 | CC: | cfeist, clumens, cluster-maint, cluster-qe, idevat, kgaillot, mlisik, mmazoure, mpospisi, nhostako, omular, slevine, tojeline | ||||
Target Milestone: | rc | Keywords: | EasyFix, Triaged | ||||
Target Release: | 8.4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | pcs-0.10.8-1.el8 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause:
User runs 'pcs cluster start --wait' command.
Consequence:
Pcs is checking pacemaker daemons to see if the cluster already started. A race condition may happen when only part of pcs daemons on the local node has started, which causes pcs to report an error.
Fix:
Properly check status of all pacemaker daemons and wait for all of them to start.
Result:
'pcs cluster start --wait' succeeds.
|
Story Points: | --- | ||||
Clone Of: | 1793653 | Environment: | |||||
Last Closed: | 2021-05-18 15:12:05 UTC | Type: | Enhancement | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ken Gaillot
2020-01-22 15:24:51 UTC
>--- Additional comment from Ken Gaillot on 2020-01-22 15:12:10 UTC ---
> 1. I think the first error is something pcs needs to address. crm_node is correctly giving an error when the cluster is not running.
Pcs first tries to run 'crm_mon --one-shot --as-xml --inactive' to get cluster status from the local node. If that returns non-zero, pcs considers the local node offline (not fully started) and tries again later. If crm_mon returns 0, pcs proceeds and runs 'crm_node --name'. Is the idea "that crm_mon exiting with 0 => the cluster is started and ready" flawed? Or perhaps there has been a related change in pacemaker recently? Again, we are missing the full debug output from pcs here...
Nina, can you provide the full debug output from pcs, perhaps as an attachment if it's too long. See comment 1. Thanks. (In reply to Tomas Jelinek from comment #1) > >--- Additional comment from Ken Gaillot on 2020-01-22 15:12:10 UTC --- > > 1. I think the first error is something pcs needs to address. crm_node is correctly giving an error when the cluster is not running. > > Pcs first tries to run 'crm_mon --one-shot --as-xml --inactive' to get > cluster status from the local node. If that returns non-zero, pcs considers > the local node offline (not fully started) and tries again later. If crm_mon > returns 0, pcs proceeds and runs 'crm_node --name'. Is the idea "that > crm_mon exiting with 0 => the cluster is started and ready" flawed? Or > perhaps there has been a related change in pacemaker recently? Again, we are > missing the full debug output from pcs here... crm_mon will have exit status 102 when the cluster is down (which is "Not connected" in pacemaker exit codes). That's the same exit status crm_node will give if the cluster is down. However, I just realized there is a race condition that is the likely culprit here. The crm_mon command only needs to be able to query the CIB, while the crm_node command needs to be able to contact the controller. The CIB is literally the first sub-daemon started, and the controller the last. So there's a small window when the CIB is responding but the controller isn't. Probably the easiest solution would be to check the crm_node exit status, and if it's 102, do the same "try again later" you do for nonzero crm_mon. Alternatively, you could run the crm_node command first, retrying until it's not 102, and do the crm_mon second. I can imagine it would be useful to have a pacemaker tool specifically for checking the status of the local daemons, returning codes for "everything is down", "only corosync is up", "some pacemaker daemons are up", and "pacemaker is fully up". But checking for exit status 102 will be simpler. (In reply to Ken Gaillot from comment #8) > Probably the easiest solution would be to check the crm_node exit status, > and if it's 102, do the same "try again later" you do for nonzero crm_mon. > Alternatively, you could run the crm_node command first, retrying until it's > not 102, and do the crm_mon second. We'll do that. Thanks for the analysis! Created attachment 1746235 [details] proposed fix Test: Run 'pcs cluster start --wait' on all cluster nodes simultaneously. For more details see comment 0. Test: [root@r8-node-01 ~]# rpm -q pcs pcs-0.10.8-1.el8.x86_64 [root@r8-node-01 ~]# pcs cluster start --wait Starting Cluster... Waiting for node(s) to start... Started [root@r8-node-02 ~]# pcs cluster start --wait Starting Cluster... Waiting for node(s) to start... Started Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:1737 |