Bug 1956687
| Summary: | Pacemaker shuts down if a node is in the CPG membership when a stonith action against it completes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | NEW --- | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.3 | CC: | alex.zarifoglu, cfeist, cherrylegler, cluster-maint, franklegler, knickel, sbradley |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Reid Wahl
2021-05-04 08:44:51 UTC
When a node gets a notification of its own successful fencing, it could check its system uptime (or at least pacemaker uptime, which cluster nodes already check for election purposes). If the fencing completion time happened before the node's start, the node can assume it rebooted after fencing and not shut itself down. We'd have to report the fence action completion time in stonith_event_t, which shouldn't be difficult. The tricky part would be to prevent other cluster nodes from setting the node's status to lost when getting the notification themselves. A joining node might have to record its uptime in its CIB node_state so all nodes could do the check. There would be a race condition where the node_state CIB update doesn't complete before the fencing notification is received, in which case this issue would still occur, but the window would be greatly reduced. *** Bug 1947731 has been marked as a duplicate of this bug. *** *** Bug 2122373 has been marked as a duplicate of this bug. *** |