Bug 1759555
| Summary: | After a cluster upgrade, cannot run pcs/crm_attribute on offline CIB on pacemaker remotes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Damien Ciabrini <dciabrin> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED MIGRATED | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 | CC: | cluster-maint, lmiksik, michele, pkomarov |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Reopened, Triaged |
| Target Release: | 8.10 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
Most users would not encounter this
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-22 18:36:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 2.1.7 |
| Embargoed: | |||
|
Description
Damien Ciabrini
2019-10-08 13:35:09 UTC
This is a known issue (Bug 1603613), but since we don't yet have a RHEL 8 clone for it, we can use this bz for that purpose. A bug that prevents updates, even if it is "just" in a layered, is surely higher than medium severity. (In reply to Andrew Beekhof from comment #2) > A bug that prevents updates, even if it is "just" in a layered, is surely > higher than medium severity. Do OSP updates require running configuration commands on saved CIBs on remote nodes before they've been updated? As I understood it, only manual commands were affected, and the workaround would be to run them on an updated node. BTW I've been using "medium" to indicate "not in the next point release", 8.2 in this case, which is pretty locked down due to QA capacity. Since this was internally reported, we could maybe get it in via the rebase bz and close this CURRENTRELEASE when it comes out, but 8.2 dev freeze is in less than 2 months, so that might be a stretch on our end. (In reply to Ken Gaillot from comment #3) > Do OSP updates require running configuration commands on saved CIBs on > remote nodes before they've been updated? As I understood it, only manual > commands were affected, and the workaround would be to run them on an > updated node. Unfortunately the commands ran on offline CIB is part of the update process and is automated in a workflow. Whenever the user runs an "stack update action" on a host, the update mechanism (in puppet) checks whether the Openstack services have to be updated. For every Openstack service that is managed via a pacemaker bundle (e.g. galera, rabbitmq, redis...), the check consists in: . dumping the current live cib in a file . applying the service configuration that comes with the "stack update action" (i.e. runs a series of pcs -f offline-cib.xml <bundle-create/update>) . compare the resulting offline cib against the live cib . if there's a change, apply it in the live cib For the record, we need to use an offline cib because pcs offers no way [1] to update a specific property of a bundle (e.g. change a bind-mount). So we need to calculate ahead of time the potential changes in the offline file, and compute the diffs with the live cib ourselves. In parallel, as a workaround for this bz, we're investigating ways of splitting out update process to first check whether the host we're running the update process on has the latest feature set, so we could bail out or run an appropriate fallback action if that's the case. But this is only a mitigation, as in OpenStack, we cannot constraint the operators to run the pacemaker upgrade on all the pacemaker remotes first, and ultimately on the real cluster nodes (for a variety of reasons that I'm not discussing here). [1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197#c7 (In reply to Damien Ciabrini from comment #4) > For the record, we need to use an offline cib because pcs offers no way [1] > to update a specific property of a bundle (e.g. change a bind-mount). So we > need to calculate ahead of time the potential changes in the offline file, > and compute the diffs with the live cib ourselves. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197#c7 Looks like that was fixed in 7.7, so that might be a good alternative in the meantime > In parallel, as a workaround for this bz, we're investigating ways of > splitting out update process to first check whether the host we're running > the update process on has the latest feature set, so we could bail out or > run an appropriate fallback action if that's the case. But this is only a > mitigation, as in OpenStack, we cannot constraint the operators to run the > pacemaker upgrade on all the pacemaker remotes first, and ultimately on the > real cluster nodes (for a variety of reasons that I'm not discussing here). It's considered best practice to update the cluster nodes first anyway. If I'm following correctly, the update being discussed here is a configuration update, not a software update. In that case, do we support running different software versions on different nodes outside of a rolling upgrade? Upgrading the software on the remote node before doing the configuration update would be a workaround for this issue. (In reply to Ken Gaillot from comment #5) > (In reply to Damien Ciabrini from comment #4) > > For the record, we need to use an offline cib because pcs offers no way [1] > > to update a specific property of a bundle (e.g. change a bind-mount). So we > > need to calculate ahead of time the potential changes in the offline file, > > and compute the diffs with the live cib ourselves. > > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197#c7 > > Looks like that was fixed in 7.7, so that might be a good alternative in the > meantime Oh thanks for the pointer, I didn't know that. We'll look into it! > > > In parallel, as a workaround for this bz, we're investigating ways of > > splitting out update process to first check whether the host we're running > > the update process on has the latest feature set, so we could bail out or > > run an appropriate fallback action if that's the case. But this is only a > > mitigation, as in OpenStack, we cannot constraint the operators to run the > > pacemaker upgrade on all the pacemaker remotes first, and ultimately on the > > real cluster nodes (for a variety of reasons that I'm not discussing here). > > It's considered best practice to update the cluster nodes first anyway. > > If I'm following correctly, the update being discussed here is a > configuration update, not a software update. In that case, do we support > running different software versions on different nodes outside of a rolling > upgrade? Upgrading the software on the remote node before doing the > configuration update would be a workaround for this issue. It's a mix of both actually. When the operator want to update his stack, he runs a update command a pass parameter to point to new container images (that ship with up-to-date pacemaker_remote rpm). The update command then 1) stops pacemaker locally 2) updates container images, 3) runs yum update, and 4) restart cluster. In normal circumstances, the pacemaker always restart locally with updated rpm on the host and in images. Now if the user didn't tell the update command to update the container image, we can end up with a pacemaker discrepancy between host and containers. This is usually not an issue because pacemaker code usually stays compatible in the same RHEL release and feature set are quite stable. so the next update command ran by the operator eventually fixes the discrepancy. But sometimes, such issue arise and this make the openstack update process break. Another typical problem is when the operator runs the update with the proper container images, but the update runs first on all pacemaker nodes, and then on the remaining pacemaker remote nodes (e.g. for complex deployments that runs the DB on dedicated pacemaker remote nodes). In such case, the same problem can break out update process. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This is still desirable and intended, but no time frame is available. This bz will be reopened once developer time becomes available. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |