Bug 2097067 - ClusterVersion history pruner does not always retain initial completed update entry
Summary: ClusterVersion history pruner does not always retain initial completed update...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Jack Ottofaro
QA Contact: Evgeni Vakhonin
URL:
Whiteboard:
Depends On:
Blocks: 2108292
TreeView+ depends on / blocked
 
Reported: 2022-06-14 20:25 UTC by Jack Ottofaro
Modified: 2022-08-10 11:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:17:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 791 0 None open Bug 2097067: pkg/cvo: retain initial completed update history entry 2022-06-17 15:50:42 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:18:10 UTC

Description Jack Ottofaro 2022-06-14 20:25:17 UTC
Description of problem:

CVO's CV history pruner has a maxHistory of 50. Whenever an entry is added and history length is > maxHistory the pruner should remove the first partial update added. If there are no partial updates then it simply removes the entry at index 0 which would be the first completed update added.

Instead, when history length is > maxHistory the entry at index 0 gets removed regardless of whether it's a partial or completed update.

How reproducible:

Test_pruneStatusHistory unit test case was modified to show this behavior.

Comment 2 Jack Ottofaro 2022-06-22 18:41:16 UTC
I added new var MaxHistory here [1]. Perhaps build your own release and drop MaxHistory to 3, or something reasonable, and do that many upgrades. The first completed installed version will end up at index MaxHistory and that should always be retained. Whenever a version must be removed it should be at index MaxHistory-1. On the other hand if the first installed version were not to complete it would be removed.

Otherwise you're looking at a lot of upgrades - although you don't have to wait for them to complete.

[1] https://github.com/openshift/cluster-version-operator/blob/dc927a4c63e2d9fb7f469ecb77503687a60c6564/pkg/cvo/status.go#L35

Comment 4 Evgeni Vakhonin 2022-06-23 17:56:33 UTC
reproduced on 4.11.0-0.nightly-2022-06-21-040754 to 4.11.0-0.nightly-2022-06-21-151125

ran in a loop in python:
1) upgrade to 4.11.0-0.nightly-2022-06-21-151125
$ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings --force --to-image=.....
2) the moment cvo .status.conditions[] 'progressing' had in message '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754
3) looped until history is 50

45 : 06-23T12:02:36Z 06-21-040754 - Partial 
46 : 06-23T10:47:20Z 06-21-151125 - Completed 
47 : 06-23T10:43:54Z 06-21-040754 - Partial 
48 : 06-23T10:22:19Z 06-21-151125 - Partial 
49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history entry, the in "installed version"

4) and did one final rollback to 06-21-040754.

46 : 06-23T12:02:36Z 06-21-040754 - Partial 
47 : 06-23T10:47:20Z 06-21-151125 - Completed 
48 : 06-23T10:43:54Z 06-21-040754 - Partial 
49 : 06-23T10:22:19Z 06-21-151125 - Partial 
                                            < -- this is the first history entry, removed

result: "Completed" version from T09:13:15 removed, instead of next "Partial" from T10:22:19

Comment 5 Jack Ottofaro 2022-06-23 18:45:47 UTC
(In reply to Evgeni Vakhonin from comment #4)
> reproduced on 4.11.0-0.nightly-2022-06-21-040754 to
> 4.11.0-0.nightly-2022-06-21-151125
> 
> ran in a loop in python:
> 1) upgrade to 4.11.0-0.nightly-2022-06-21-151125
> $ oc adm upgrade --allow-explicit-upgrade --allow-upgrade-with-warnings
> --force --to-image=.....
> 2) the moment cvo .status.conditions[] 'progressing' had in message
> '4.11.0-0.nightly-2022-06-21-151125', commanded rollback to 06-21-040754
> 3) looped until history is 50
> 
> 45 : 06-23T12:02:36Z 06-21-040754 - Partial 
> 46 : 06-23T10:47:20Z 06-21-151125 - Completed 
> 47 : 06-23T10:43:54Z 06-21-040754 - Partial 
> 48 : 06-23T10:22:19Z 06-21-151125 - Partial 
> 49 : 06-23T09:13:15Z 06-21-040754 - Completed < -- this is the first history
> entry, the in "installed version"
> 
> 4) and did one final rollback to 06-21-040754.
> 
> 46 : 06-23T12:02:36Z 06-21-040754 - Partial 
> 47 : 06-23T10:47:20Z 06-21-151125 - Completed 
> 48 : 06-23T10:43:54Z 06-21-040754 - Partial 
> 49 : 06-23T10:22:19Z 06-21-151125 - Partial 
>                                             < -- this is the first history
> entry, removed
> 
> result: "Completed" version from T09:13:15 removed, instead of next
> "Partial" from T10:22:19

Can you attach the CVO log after running the test. Also, send me your python script and I'll give it a run. Thanks.

Comment 6 Evgeni Vakhonin 2022-06-23 18:49:09 UTC
    verifying on 4.11.0-0.nightly-2022-06-22-235234 to 4.11.0-0.nightly-2022-06-23-044003 using the same method as #C4

    looped until history is 50

    45 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 
    46 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 
    47 : 2022-06-23T18:12:11Z 06-22-235234 - Partial 
    48 : 2022-06-23T18:11:50Z 06-23-044003 - Partial    <-- this should be removed
    49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed  <-- this should not

    and another one..

    46 : 2022-06-23T18:12:47Z 06-22-235234 - Partial 
    47 : 2022-06-23T18:12:28Z 06-23-044003 - Partial 
    48 : 2022-06-23T18:12:11Z 06-22-235234 - Partial 
                                                              <-- success!!
    49 : 2022-06-23T17:40:13Z 06-22-235234 - Completed 


    Verified successfully!

Comment 12 errata-xmlrpc 2022-08-10 11:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.