Bug 1692353

Summary: CVO does not report all components failing to upgrade
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: high    
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-27 18:24:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Parees 2019-03-25 12:41:42 UTC
Description of problem:
The CVO seems to only report the most recent operator it saw failing to upgrade, not all of them.  This is misleading for admins about why the upgrade is blocked.


See:
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/259/


https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/259/artifacts/e2e-aws-upgrade/clusteroperators.json

Shows a number of operators that are still reporting the previous version. (    "version": "4.0.0-0.ci-2019-03-18-152932" is the new version, and  "version": "4.0.0-0.ci-2019-03-18-124953" is the old version). 

But:
https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/259/artifacts/e2e-aws-upgrade/clusterversion.json

only indicates the openshift controller manager operator is upgrading:

                    {
                        "lastTransitionTime": "2019-03-18T17:32:56Z",
                        "message": "Cluster operator openshift-controller-manager is still updating",
                        "reason": "ClusterOperatorNotAvailable",
                        "status": "True",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-18T16:43:50Z",
                        "message": "Unable to apply 4.0.0-0.ci-2019-03-18-152932: the cluster operator openshift-controller-manager has not yet successfully rolled out",
                        "reason": "ClusterOperatorNotAvailable",
                        "status": "True",
                        "type": "Progressing"
                    },

Comment 1 Abhinav Dahiya 2019-03-27 18:24:00 UTC
CVO performs upgrades in stages to make sure operators like control plane are upgraded correctly before upgrading any other operator. So it reports the the one it is currently upgrading. If the operators are failing after reporting Available for a version, during an upgrade please open bugs for specific operator.