Bug 1825006 - [4.3 upgrade][clusterversion] scary: Unable to apply ...: the cluster operator ... has not yet successfully rolled out
Summary: [4.3 upgrade][clusterversion] scary: Unable to apply ...: the cluster operato...
Keywords:
Status: CLOSED DUPLICATE of bug 1884334
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Over the Air Updates
QA Contact: liujia
URL:
Whiteboard:
: 1825008 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-16 20:04 UTC by Hongkai Liu
Modified: 2022-05-06 12:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-14 21:36:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hongkai Liu 2020-04-16 20:04:21 UTC
During upgrade of a cluster in CI build farm, we have seen a sequence of alerts and messages of failures from clusterversion.

oc --context build01 adm upgrade --allow-explicit-upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-04-13-190424 --force=true

Eventually upgrade was completed successfully (which is so nice).
But those alerts and messages are too frightening.

I would like to create a bug for each of those and feel better for the next upgrade.

https://coreos.slack.com/archives/CHY2E1BL4/p1587058244434700

Every 10.0s: oc --context build01 get clusterversions.config.openshift.io       Hongkais-MacBook-Pro: Thu Apr 16 13:30:20 2020
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-23-130439   True        True          27m     Unable to apply 4.3.0-0.nightly-2020-04-13-190
424: the cluster operator network has not yet successfully rolled out

Comment 2 W. Trevor King 2020-04-16 21:59:31 UTC
It's not the networking component's fault that they were unable to complete their whole update and bump their ClusterOperator status within ~5m of the CVO telling them to start (by bumping their operator's Deployment).  If there's a comfort-level fix for this issue, it will be on the CVO side softening the "Unable to apply" to suggest "and that may be fine, or maybe not, depending on how long it goes on for".

Comment 3 W. Trevor King 2020-04-16 21:59:48 UTC
*** Bug 1825008 has been marked as a duplicate of this bug. ***

Comment 4 Lalatendu Mohanty 2020-05-19 11:30:56 UTC
Setting the severity to low as this does not effect the cluster functionality.

Comment 5 Lalatendu Mohanty 2020-06-18 13:08:07 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit the next sprint.

Comment 6 Lalatendu Mohanty 2020-07-09 14:38:47 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 8 Lalatendu Mohanty 2020-08-20 18:43:15 UTC
Moving this to 4.7 as this is not a blocker for 4.6.

Comment 9 W. Trevor King 2020-09-12 21:03:56 UTC
I still think we can wordsmith this per comment 2, but I agree that this is cosmetic per comment 4.  Hopefully we'll have time to adjust the wording next sprint.

Comment 10 W. Trevor King 2020-10-02 23:24:49 UTC
Commen 9 is still current.

Comment 11 W. Trevor King 2020-10-14 21:36:17 UTC
(In reply to W. Trevor King from comment #2)
> If there's a comfort-level fix for this issue, it will be on the CVO side
> softening the "Unable to apply" to suggest "and that may be fine, or maybe
> not, depending on how long it goes on for".

Ah, I'm going to close this as a dup of bug 1884334.  We don't need to soften the wording if we stop setting Failing=True just because an operator is slow.  And that bug is part of an ongoing effort to ease ourselves into allowing operators to take as long as they want, and only going Failing=True on them if they set something concerning like Available=False.

*** This bug has been marked as a duplicate of bug 1884334 ***


Note You need to log in before you can comment on or make changes to this bug.