Bug 2228955
| Summary: | Race condition when DC and attribute writer are both shutting down | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ken Gaillot <kgaillot> | |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 8.4 | CC: | cfeist, cluster-maint, jrehova, msmazova | |
| Target Milestone: | rc | Keywords: | Triaged, ZStream | |
| Target Release: | 8.9 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | ||||
| Fixed In Version: | pacemaker-2.1.6-8.el8 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: A node's attribute manager writes all its transient node attributes from memory to the CIB after winning the election for attribute writer, even if its node has requested shutdown.
Consequence: If a node is DC, requests shutdown, and wins the attribute writer election after its controller has left the cluster but before its attribute manager has left, it can write out its shutdown attribute to the CIB. The next time it rejoins the cluster, it will be immediately shut down.
Fix: A node's attribute manager should not write out its attributes after winning an election if shutdown has been requested for its node.
Result: A leaving DC node does not have an unexpected shutdown the next time it rejoins.
|
Story Points: | --- | |
| Clone Of: | 2228933 | |||
| : | 2229013 (view as bug list) | Environment: | ||
| Last Closed: | 2023-11-14 15:32:36 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | 2.1.7 | |
| Embargoed: | ||||
| Bug Depends On: | 2228933 | |||
| Bug Blocks: | 2229013 | |||
|
Description
Ken Gaillot
2023-08-03 18:06:09 UTC
Fixed upstream as of commit f5263c94 Version of pacemaker: > [root@virt-543:~]# rpm -q pacemaker > pacemaker-2.1.6-7.el8.x86_64 Determining the DC node: > [root@virt-543:~]# crmadmin -D > Designated Controller is: virt-544 Determining the attribute writer node --> virt-546: > [root@virt-543:~]# for n in 543 544 545 546 547; do echo $n; qarsh -l root virt-$n "grep 'Recorded local node as attribute writer' /var/log/pacemaker/pacemaker.log | tail -1"; done > 543 > Aug 23 14:19:13 virt-543 pacemaker-attrd [65920] (attrd_declare_winner) notice: Recorded local node as attribute writer (was unset) > 544 > Aug 23 14:19:13 virt-544 pacemaker-attrd [65845] (attrd_declare_winner) notice: Recorded local node as attribute writer (was unset) > 545 > Aug 23 14:19:13 virt-545 pacemaker-attrd [65698] (attrd_declare_winner) notice: Recorded local node as attribute writer (was unset) > 546 > Aug 23 14:19:21 virt-546 pacemaker-attrd [65700] (attrd_declare_winner) notice: Recorded local node as attribute writer (was unset) > 547 > Aug 23 14:19:13 virt-547 pacemaker-attrd [65497] (attrd_declare_winner) notice: Recorded local node as attribute writer (was unset) Rebooting both DC and attribute writer nodes at the same time: > [root@virt-544 ~]# reboot > [root@virt-546 ~]# reboot Result: "shutdown" attribute is present in the CIB. > [root@virt-543:~]# pcs cluster cib | xmllint --xpath '//node_state/transient_attributes' - > <transient_attributes id="1"> > <instance_attributes id="status-1"> > <nvpair id="status-1-.feature-set" name="#feature-set" value="3.17.4"/> > </instance_attributes> > </transient_attributes><transient_attributes id="3"> > <instance_attributes id="status-3"> > <nvpair id="status-3-.feature-set" name="#feature-set" value="3.17.4"/> > </instance_attributes> > </transient_attributes><transient_attributes id="2"> > <instance_attributes id="status-2"> > <nvpair id="status-2-.feature-set" name="#feature-set" value="3.17.4"/> > <nvpair id="status-2-shutdown" name="shutdown" value="1692793538"/> > </instance_attributes> > </transient_attributes><transient_attributes id="5"> > <instance_attributes id="status-5"> > <nvpair id="status-5-.feature-set" name="#feature-set" value="3.17.4"/> > </instance_attributes> > </transient_attributes> The original fix was found to be incomplete. The completed fix has been merged in upstream main branch as of commit 58400e27. Marking Verified in version pacemaker-2.1.6-8.el8.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:6970 |