Description of problem: Version-Release number of the following components: upgrade from 4.1.0-0.nightly-2019-04-22-005054 to 4.1.0-0.nightly-2019-04-22-192604 How reproducible: Always Steps to Reproduce: 1. Fresh install a cluster with 4.1.0-0.nightly-2019-04-22-005054. 2. Trigger an upgrade towards 4.1.0-0.nightly-2019-04-22-192604. 3. Upon upgrade in progress, block all traffic from quay.io. 4. Wait until upgrade failed. Actual results: # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-04-22-192604 True True 62m Unable to apply 4.1.0-0.nightly-2019-04-22-192604: the update could not be applied # oc describe clusterversion Name: version Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterVersion Metadata: Creation Timestamp: 2019-04-23T06:36:14Z Generation: 2 Resource Version: 405822 Self Link: /apis/config.openshift.io/v1/clusterversions/version UID: 17af3504-6592-11e9-9955-0293645e251a Spec: Channel: stable-4.0 Cluster ID: 3b07acc8-66ba-4c9f-a465-5127b755487a Desired Update: Image: registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-04-22-192604 Version: Upstream: https://api.openshift.com/api/upgrades_info/v1/graph Status: Available Updates: <nil> Conditions: Last Transition Time: 2019-04-23T06:49:11Z Message: Done applying 4.1.0-0.nightly-2019-04-22-005054 Status: True Type: Available Last Transition Time: 2019-04-24T07:07:01Z Message: Could not update deployment "openshift-dns-operator/dns-operator" (47 of 333) Reason: UpdatePayloadFailed Status: True Type: Failing Last Transition Time: 2019-04-23T08:58:22Z Message: Unable to apply 4.1.0-0.nightly-2019-04-22-192604: the update could not be applied Reason: UpdatePayloadFailed Status: True Type: Progressing Last Transition Time: 2019-04-23T06:36:29Z Message: Unable to retrieve available updates: unknown version 4.1.0-0.nightly-2019-04-22-192604 Reason: RemoteFailed Status: False Type: RetrievedUpdates Desired: Image: registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-04-22-192604 Version: 4.1.0-0.nightly-2019-04-22-192604 History: Completion Time: <nil> Image: registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-04-22-192604 Started Time: 2019-04-23T08:58:22Z State: Partial Version: 4.1.0-0.nightly-2019-04-22-192604 Completion Time: 2019-04-23T08:58:22Z Image: registry.svc.ci.openshift.org/ocp/release@sha256:3f3628cd9b694705cb0627ce72e61932df5d9938a291fabba1ed691230f7b548 Started Time: 2019-04-23T06:36:29Z State: Completed Version: 4.1.0-0.nightly-2019-04-22-005054 Observed Generation: 2 Version Hash: VDUuFi-LWdE= Events: <none> From the output of `oc describe clusterversion`, saw 'Could not update deployment "openshift-dns-operator/dns-operator"' error message Expected results: CVO should do a bit more and reported why 'Could not update deployment "openshift-dns-operator/dns-operator"' with a better message. Additional info: The only way of trouble-shooting is checking cluster-version-operator log. # oc logs -f cluster-version-operator-694fb8bf89-w86mk -n openshift-cluster-version <--snip--> I0423 09:12:35.598277 1 apps.go:77] Deployment dns-operator is not ready. status: (replicas: 2, updated: 1, ready: 1, unavailable: 1) E0423 09:12:35.598319 1 task.go:77] error running apply for deployment "openshift-dns-operator/dns-operator" (47 of 333): timed out waiting for the condition I0423 09:12:35.598356 1 task_graph.go:560] Canceled worker 8 I0423 09:12:35.598366 1 task_graph.go:580] Workers finished I0423 09:12:35.598375 1 task_graph.go:588] Result of work: [Could not update deployment "openshift-dns-operator/dns-operator" (47 of 333)] I0423 09:12:35.598392 1 sync_worker.go:667] Summarizing 1 errors I0423 09:12:35.598399 1 sync_worker.go:671] Update error 47/333: UpdatePayloadFailed Could not update deployment "openshift-dns-operator/dns-operator" (47 of 333) (*errors.errorString: timed out waiting for the condition) I0423 09:12:35.598426 1 task_graph.go:508] No more reachable nodes in graph, continue I0423 09:12:35.598442 1 task_graph.go:544] No more work E0423 09:12:35.598461 1 sync_worker.go:288] unable to synchronize image (waiting 49.936801596s): Could not update deployment "openshift-dns-operator/dns-operator" (47 of 333) <--snip-->
cluster version object need not include detailed errors. It's our goal to make sure it include high level information that guides admins what to look next. > I0423 09:12:35.598277 1 apps.go:77] Deployment dns-operator is not ready. status: (replicas: 2, updated: 1, ready: 1, unavailable: 1) > E0423 09:12:35.598319 1 task.go:77] error running apply for deployment "openshift-dns-operator/dns-operator" (47 of 333): timed out waiting for the condition From the logs you provided exactly what the CVO thinks why it is failing to update the deployment. Which I think should cover the detail from CVO's perspective. We *might* try to make them better. but it is good for now.
https://github.com/openshift/cluster-version-operator/pull/187 was merged to include more information for failing deployment in CVO logs
No available 4.2 nightly build yet, pending the verification.
Verified this bug with 4.2.0-0.nightly-2019-06-25-003324, and PASS. $ oc logs cluster-version-operator-65544b6768-vsmg6 -n openshift-cluster-version|grep machine-api <--sinp--> I0625 10:34:31.093887 1 apps.go:94] Deployment openshift-apiserver-operator is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1, reason: MinimumReplicasUnavailable, message: Deployment does not have minimum availability., reason: ProgressDeadlineExceeded, message: ReplicaSet "openshift-apiserver-operator-c8cf58dbc" has timed out progressing.) E0625 10:34:31.093918 1 task.go:77] error running apply for deployment "openshift-apiserver-operator/openshift-apiserver-operator" (96 of 377): timed out waiting for the condition <--sinp-->
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922