Bug 2097067
Summary: | ClusterVersion history pruner does not always retain initial completed update entry | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jack Ottofaro <jack.ottofaro> |
Component: | Cluster Version Operator | Assignee: | Jack Ottofaro <jack.ottofaro> |
Status: | CLOSED ERRATA | QA Contact: | Evgeni Vakhonin <evakhoni> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.11 | CC: | aos-team-ota, wking |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:17:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2108292 |
Description
Jack Ottofaro
2022-06-14 20:25:17 UTC
I added new var MaxHistory here [1]. Perhaps build your own release and drop MaxHistory to 3, or something reasonable, and do that many upgrades. The first completed installed version will end up at index MaxHistory and that should always be retained. Whenever a version must be removed it should be at index MaxHistory-1. On the other hand if the first installed version were not to complete it would be removed. Otherwise you're looking at a lot of upgrades - although you don't have to wait for them to complete. [1] https://github.com/openshift/cluster-version-operator/blob/dc927a4c63e2d9fb7f469ecb77503687a60c6564/pkg/cvo/status.go#L35 reproduced on 4.11.0-0.nightly-2022-06-21-040754 to 4.11.0-0.nightly-2022-06-21-151125 ran in a loop in python: 1) upgrade to 4.11.0-0.nightly-2022-06-21-151125 $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings --force --to-image=..... 2) the moment cvo .status.conditions[] 'progressing' had in message '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754 3) looped until history is 50 45 : 06-23T12:02:36Z 06-21-040754 - Partial 46 : 06-23T10:47:20Z 06-21-151125 - Completed 47 : 06-23T10:43:54Z 06-21-040754 - Partial 48 : 06-23T10:22:19Z 06-21-151125 - Partial 49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history entry, the in "installed version" 4) and did one final rollback to 06-21-040754. 46 : 06-23T12:02:36Z 06-21-040754 - Partial 47 : 06-23T10:47:20Z 06-21-151125 - Completed 48 : 06-23T10:43:54Z 06-21-040754 - Partial 49 : 06-23T10:22:19Z 06-21-151125 - Partial < -- this is the first history entry, removed result: "Completed" version from T09:13:15 removed, instead of next "Partial" from T10:22:19 (In reply to Evgeni Vakhonin from comment #4) > reproduced on 4.11.0-0.nightly-2022-06-21-040754 to > 4.11.0-0.nightly-2022-06-21-151125 > > ran in a loop in python: > 1) upgrade to 4.11.0-0.nightly-2022-06-21-151125 > $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings > --force --to-image=..... > 2) the moment cvo .status.conditions[] 'progressing' had in message > '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754 > 3) looped until history is 50 > > 45 : 06-23T12:02:36Z 06-21-040754 - Partial > 46 : 06-23T10:47:20Z 06-21-151125 - Completed > 47 : 06-23T10:43:54Z 06-21-040754 - Partial > 48 : 06-23T10:22:19Z 06-21-151125 - Partial > 49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history > entry, the in "installed version" > > 4) and did one final rollback to 06-21-040754. > > 46 : 06-23T12:02:36Z 06-21-040754 - Partial > 47 : 06-23T10:47:20Z 06-21-151125 - Completed > 48 : 06-23T10:43:54Z 06-21-040754 - Partial > 49 : 06-23T10:22:19Z 06-21-151125 - Partial > < -- this is the first history > entry, removed > > result: "Completed" version from T09:13:15 removed, instead of next > "Partial" from T10:22:19 Can you attach the CVO log after running the test. Also, send me your python script and I'll give it a run. Thanks. verifying on 4.11.0-0.nightly-2022-06-22-235234 to 4.11.0-0.nightly-2022-06-23-044003 using the same method as #C4 looped until history is 50 45 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 46 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 47 : 2022-06-23T18:12:11Z 06-22-235234 - Partial 48 : 2022-06-23T18:11:50Z 06-23-044003 - Partial <-- this should be removed 49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed <-- this should not and another one.. 46 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 47 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 48 : 2022-06-23T18:12:11Z 06-22-235234 - Partial <-- success!! 49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed Verified successfully! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |