Bug 1825006

Summary: [4.3 upgrade][clusterversion] scary: Unable to apply ...: the cluster operator ... has not yet successfully rolled out
Product: OpenShift Container Platform Reporter: Hongkai Liu <hongkliu>
Component: Cluster Version OperatorAssignee: Over the Air Updates <aos-team-ota>
Status: CLOSED DUPLICATE QA Contact: liujia <jiajliu>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.0CC: aos-bugs, ccoleman, jokerman, sdodson, wking, yanyang
Target Milestone: ---Keywords: Upgrades
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-14 21:36:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongkai Liu 2020-04-16 20:04:21 UTC
During upgrade of a cluster in CI build farm, we have seen a sequence of alerts and messages of failures from clusterversion.

oc --context build01 adm upgrade --allow-explicit-upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-04-13-190424 --force=true

Eventually upgrade was completed successfully (which is so nice).
But those alerts and messages are too frightening.

I would like to create a bug for each of those and feel better for the next upgrade.

https://coreos.slack.com/archives/CHY2E1BL4/p1587058244434700

Every 10.0s: oc --context build01 get clusterversions.config.openshift.io       Hongkais-MacBook-Pro: Thu Apr 16 13:30:20 2020
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-23-130439   True        True          27m     Unable to apply 4.3.0-0.nightly-2020-04-13-190
424: the cluster operator network has not yet successfully rolled out

Comment 2 W. Trevor King 2020-04-16 21:59:31 UTC
It's not the networking component's fault that they were unable to complete their whole update and bump their ClusterOperator status within ~5m of the CVO telling them to start (by bumping their operator's Deployment).  If there's a comfort-level fix for this issue, it will be on the CVO side softening the "Unable to apply" to suggest "and that may be fine, or maybe not, depending on how long it goes on for".

Comment 3 W. Trevor King 2020-04-16 21:59:48 UTC
*** Bug 1825008 has been marked as a duplicate of this bug. ***

Comment 4 Lalatendu Mohanty 2020-05-19 11:30:56 UTC
Setting the severity to low as this does not effect the cluster functionality.

Comment 5 Lalatendu Mohanty 2020-06-18 13:08:07 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit the next sprint.

Comment 6 Lalatendu Mohanty 2020-07-09 14:38:47 UTC
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features.  Hence we are adding UpcomingSprint now, and we'll revisit this in the next sprint.

Comment 8 Lalatendu Mohanty 2020-08-20 18:43:15 UTC
Moving this to 4.7 as this is not a blocker for 4.6.

Comment 9 W. Trevor King 2020-09-12 21:03:56 UTC
I still think we can wordsmith this per comment 2, but I agree that this is cosmetic per comment 4.  Hopefully we'll have time to adjust the wording next sprint.

Comment 10 W. Trevor King 2020-10-02 23:24:49 UTC
Commen 9 is still current.

Comment 11 W. Trevor King 2020-10-14 21:36:17 UTC
(In reply to W. Trevor King from comment #2)
> If there's a comfort-level fix for this issue, it will be on the CVO side
> softening the "Unable to apply" to suggest "and that may be fine, or maybe
> not, depending on how long it goes on for".

Ah, I'm going to close this as a dup of bug 1884334.  We don't need to soften the wording if we stop setting Failing=True just because an operator is slow.  And that bug is part of an ongoing effort to ease ourselves into allowing operators to take as long as they want, and only going Failing=True on them if they set something concerning like Available=False.

*** This bug has been marked as a duplicate of bug 1884334 ***