Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1711964

Summary: "Error while reconciling" and "the update could not be applied" many hours after upgrade reported complete/successful
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: Cluster Version OperatorAssignee: Abhinav Dahiya <adahiya>
Status: CLOSED NOTABUG QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: low    
Version: 4.1.0CC: aos-bugs, bleanhar, erich, jokerman, jupierce, mmccomas, wking
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-30 17:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
4.1.2 listings none

Description Mike Fiedler 2019-05-20 14:07:08 UTC
Description of problem:

After upgrading from  4.1.0-0.nightly-2019-05-17-041605 to 4.1.0-0.nightly-2019-05-18-050636 I am running oc get clusterversion every minute.   Every 3-4 hours the cluster goes through a period where it reports that the update could not be applied.   In between it reports good status.

Some snippets:

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-18-050636   True        False         9h      Error while reconciling 4.1.0-0.nightly-2019-05-18-050636: the update could not be applied 

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-18-050636   True        False         12h     Error while reconciling 4.1.0-0.nightly-2019-05-18-050636: the update could not be applied


NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-18-050636   True        False         16h     Error while reconciling 4.1.0-0.nightly-2019-05-18-050636: the update could not be applied


In between it reports like this:

NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-18-050636   True        False         16h     Cluster version is 4.1.0-0.nightly-2019-05-18-050636                                  


Version-Release number of selected component (if applicable):  4.1.0-0.nightly-2019-05-18-050636


How reproducible: Unknown - 1/1 so far.


Steps to Reproduce:
1. Install 4.1.0-0.nightly-2019-05-17-041605
2. Upgrade to 4.1.0-0.nightly-2019-05-18-050636
3. Run oc clusterversion every minute and watch the status - especially many hours after the upgrade purportedly succeeds

Actual results:

Error while reconciling 4.1.0-0.nightly-2019-05-18-050636: the update could not be applied

Expected results:

No oc clusterversion erors

Additional info:

oc adm must-gather will be attached.

Comment 2 Brenton Leanhardt 2019-05-20 18:17:47 UTC
Unfortunately the must-gather logs aren't going to contain any actionable info in this situation.  This is what we need to fix (though in 4.1.z).

At the same time, we'll likely improve the wording in the CVO to make it more clear that the problem is that another operator is flapping.

Comment 3 W. Trevor King 2019-05-20 22:52:50 UTC
> At the same time, we'll likely improve the wording in the CVO to make it more clear that the problem is that another operator is flapping.

Some initial groundwork for this in https://github.com/openshift/cluster-version-operator/pull/194

Comment 5 Justin Pierce 2019-06-17 15:49:12 UTC
Created attachment 1581485 [details]
4.1.2 listings

Comment 6 Abhinav Dahiya 2019-06-24 20:33:55 UTC
Based on https://bugzilla.redhat.com/attachment.cgi?id=1581485 CVO is correctly reporting that it's failing to make progress on reconcile due to cloud-creds-operator.

the summary for `oc get clusterversion version` cannot be all encompassing. It provides enough details to go look for details in the actual object.

I would like to see concrete examples of status updates in the object in contrast to the expected message from users.

Comment 7 Justin Pierce 2019-06-25 13:54:09 UTC
It seems like https://bugzilla.redhat.com/show_bug.cgi?id=1714484 was the root cause of my cloud credential operator failing. So the reconciling message error was valid. An area to consider is the message: "Cluster version is quay.io/openshift-release-dev/ocp-release:4.1.2" which reads as success to a typical user and seems to contradict: "Error while reconciling 4.1.2: the update could not be applied". Consistently displaying the error message or concatenating the messages would have left no room for misunderstanding.

Since their is no ERROR column on clusterversion output, this message presently serves as an important UX for a human operator to sanity check the CVO's state.

[ec2-user us-east-1 ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             True        False         2d20h   Cluster version is quay.io/openshift-release-dev/ocp-release:4.1.2

Comment 9 Scott Dodson 2019-09-30 17:17:59 UTC
This is working as intended.

Comment 10 Red Hat Bugzilla 2023-09-14 05:28:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days