Description of problem: CVO's CV history pruner has a maxHistory of 50. Whenever an entry is added and history length is > maxHistory the pruner should remove the first partial update added. If there are no partial updates then it simply removes the entry at index 0 which would be the first completed update added. Instead, when history length is > maxHistory the entry at index 0 gets removed regardless of whether it's a partial or completed update. How reproducible: Test_pruneStatusHistory unit test case was modified to show this behavior.
I added new var MaxHistory here [1]. Perhaps build your own release and drop MaxHistory to 3, or something reasonable, and do that many upgrades. The first completed installed version will end up at index MaxHistory and that should always be retained. Whenever a version must be removed it should be at index MaxHistory-1. On the other hand if the first installed version were not to complete it would be removed. Otherwise you're looking at a lot of upgrades - although you don't have to wait for them to complete. [1] https://github.com/openshift/cluster-version-operator/blob/dc927a4c63e2d9fb7f469ecb77503687a60c6564/pkg/cvo/status.go#L35
reproduced on 4.11.0-0.nightly-2022-06-21-040754 to 4.11.0-0.nightly-2022-06-21-151125 ran in a loop in python: 1) upgrade to 4.11.0-0.nightly-2022-06-21-151125 $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings --force --to-image=..... 2) the moment cvo .status.conditions[] 'progressing' had in message '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754 3) looped until history is 50 45 : 06-23T12:02:36Z 06-21-040754 - Partial 46 : 06-23T10:47:20Z 06-21-151125 - Completed 47 : 06-23T10:43:54Z 06-21-040754 - Partial 48 : 06-23T10:22:19Z 06-21-151125 - Partial 49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history entry, the in "installed version" 4) and did one final rollback to 06-21-040754. 46 : 06-23T12:02:36Z 06-21-040754 - Partial 47 : 06-23T10:47:20Z 06-21-151125 - Completed 48 : 06-23T10:43:54Z 06-21-040754 - Partial 49 : 06-23T10:22:19Z 06-21-151125 - Partial < -- this is the first history entry, removed result: "Completed" version from T09:13:15 removed, instead of next "Partial" from T10:22:19
(In reply to Evgeni Vakhonin from comment #4) > reproduced on 4.11.0-0.nightly-2022-06-21-040754 to > 4.11.0-0.nightly-2022-06-21-151125 > > ran in a loop in python: > 1) upgrade to 4.11.0-0.nightly-2022-06-21-151125 > $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings > --force --to-image=..... > 2) the moment cvo .status.conditions[] 'progressing' had in message > '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754 > 3) looped until history is 50 > > 45 : 06-23T12:02:36Z 06-21-040754 - Partial > 46 : 06-23T10:47:20Z 06-21-151125 - Completed > 47 : 06-23T10:43:54Z 06-21-040754 - Partial > 48 : 06-23T10:22:19Z 06-21-151125 - Partial > 49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history > entry, the in "installed version" > > 4) and did one final rollback to 06-21-040754. > > 46 : 06-23T12:02:36Z 06-21-040754 - Partial > 47 : 06-23T10:47:20Z 06-21-151125 - Completed > 48 : 06-23T10:43:54Z 06-21-040754 - Partial > 49 : 06-23T10:22:19Z 06-21-151125 - Partial > < -- this is the first history > entry, removed > > result: "Completed" version from T09:13:15 removed, instead of next > "Partial" from T10:22:19 Can you attach the CVO log after running the test. Also, send me your python script and I'll give it a run. Thanks.
verifying on 4.11.0-0.nightly-2022-06-22-235234 to 4.11.0-0.nightly-2022-06-23-044003 using the same method as #C4 looped until history is 50 45 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 46 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 47 : 2022-06-23T18:12:11Z 06-22-235234 - Partial 48 : 2022-06-23T18:11:50Z 06-23-044003 - Partial <-- this should be removed 49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed <-- this should not and another one.. 46 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 47 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 48 : 2022-06-23T18:12:11Z 06-22-235234 - Partial <-- success!! 49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed Verified successfully!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069