Bug 2079803
Summary: | Upgrade-triggered etcd backup will be skip during serial upgrade | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | liujia <jiajliu> | |
Component: | Etcd | Assignee: | melbeher | |
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.10 | CC: | alray, aos-bugs, geliu, melbeher, wking, yanyang | |
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2091604 2097431 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 11:09:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2091604, 2097431, 2105148 |
Description
liujia
2022-04-28 09:47:18 UTC
Reading through [1], I'm not actually noticing anything that moves RecentBackup from True to False after a previous, successful backup is no longer considered "recent". I dunno what the freshness threshold would be (minutes? Hours?). But I think it's up to etcd to make that call, and set RecentBackup=False (while still keeping the condition around to point at the now-stale backup? Or removing the condition?) when they think the backup isn't fresh enough. If that ends up allowing: 1. Cluster is on 4.9 2. Requested update to 4.10 3. etcd takes a backup 4. Update to 4.10 5. Requested update to 4.11 6. Update to 4.11 7. etcd decides the 4.9 backup is stale, and sets RecentBackup=False that's fine with me. In the event of a disaster, the user can restore to step 3, and they can repeat as many of the subsequent steps as they like. The value of a second snapshot at 5 doesn't seem all that high, since it would just pick up an hour or so of the step 4 activity. [1]: https://github.com/openshift/cluster-etcd-operator/blob/95049e93f7acb4bd9ca7c684702390671e7a1371/pkg/operator/upgradebackupcontroller/upgradebackupcontroller.go May I ask how you upgraded to 4.11 ? .. Have you used `--force` flag ? If you used --force, no backup will be taken @jiajliu (In reply to melbeher from comment #2) > May I ask how you upgraded to 4.11 ? .. Have you used `--force` flag ? > > If you used --force, no backup will be taken > > @jiajliu No `--force` I raised a fix here https://github.com/openshift/cluster-etcd-operator/pull/835 please test extensively @geliu Hello melbeher, this bug fixed in 4.11, so I suppose we have not fix it in 4.10, exact? will it be backport to 4.10? status condition on cvo listed in Comment 6 @melbeher So the backup status should be in CEO conditions, not CVO .. You can see this in Comment #1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 Errata shipped; presumably all the NEEDINFO were addressed :) |