Hide Forgot
Description of problem: fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:81]: during upgrade Expected error: <*errors.errorString | 0xc421f86580>: { s: "Cluster did not acknowledge request to upgrade: timed out waiting for the condition", } Cluster did not acknowledge request to upgrade: timed out waiting for the condition not to have occurred https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/234/ Version-Release number of the following components: 4.1 How reproducible: Appear to be a flake.
Linked wrong build: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/
From `clsuterversion.yaml` ``` { "apiVersion": "config.openshift.io/v1", "kind": "ClusterVersion", "metadata": { "creationTimestamp": "2019-03-17T04:27:04Z", "generation": 2, "name": "version", "resourceVersion": "65775", "selfLink": "/apis/config.openshift.io/v1/clusterversions/version", "uid": "ea967f21-486c-11e9-bd56-120db6c725be" }, "spec": { "channel": "stable-4.0", "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07", "desiredUpdate": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "version": "" }, "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph" }, "status": { "availableUpdates": null, "conditions": [ { "lastTransitionTime": "2019-03-17T04:40:43Z", "message": "Done applying 4.0.0-0.ci-2019-03-17-010422", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-03-17T05:31:14Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-03-17T04:43:16Z", "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2019-03-17T04:27:18Z", "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405", "reason": "RemoteFailed", "status": "False", "type": "RetrievedUpdates" } ], "desired": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "version": "4.0.0-0.ci-2019-03-17-041405" }, "history": [ { "completionTime": null, "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T05:34:01Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T05:34:01Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:44:17Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:44:17Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:43:16Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:43:16Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b", "startedTime": "2019-03-17T04:27:18Z", "state": "Completed", "version": "4.0.0-0.ci-2019-03-17-010422" } ], "observedGeneration": 2, "versionHash": "vjzTSO3pwl0=" } } ``` The desired version is `4.0.0-0.ci-2019-03-17-041405` The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring ``` { "apiVersion": "config.openshift.io/v1", "kind": "ClusterOperator", "metadata": { "creationTimestamp": "2019-03-17T04:33:34Z", "generation": 1, "name": "monitoring", "resourceVersion": "56407", "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring", "uid": "d37049c6-486d-11e9-8028-0a28398410f8" }, "spec": {}, "status": { "conditions": [ { "lastTransitionTime": "2019-03-17T05:22:26Z", "message": "Successfully rolled out the stack.", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-03-17T05:22:26Z", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-03-17T05:22:26Z", "status": "False", "type": "Failing" } ], "extension": null, "relatedObjects": null, "versions": [ { "name": "operator", "version": "4.0.0-0.ci-2019-03-17-010422" } ] } }, ```
From https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/ `clsuterversion.yaml` ``` { "apiVersion": "config.openshift.io/v1", "kind": "ClusterVersion", "metadata": { "creationTimestamp": "2019-03-17T04:27:04Z", "generation": 2, "name": "version", "resourceVersion": "65775", "selfLink": "/apis/config.openshift.io/v1/clusterversions/version", "uid": "ea967f21-486c-11e9-bd56-120db6c725be" }, "spec": { "channel": "stable-4.0", "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07", "desiredUpdate": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "version": "" }, "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph" }, "status": { "availableUpdates": null, "conditions": [ { "lastTransitionTime": "2019-03-17T04:40:43Z", "message": "Done applying 4.0.0-0.ci-2019-03-17-010422", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-03-17T05:31:14Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-03-17T04:43:16Z", "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2019-03-17T04:27:18Z", "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405", "reason": "RemoteFailed", "status": "False", "type": "RetrievedUpdates" } ], "desired": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "version": "4.0.0-0.ci-2019-03-17-041405" }, "history": [ { "completionTime": null, "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T05:34:01Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T05:34:01Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:44:17Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:44:17Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:43:16Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:43:16Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b", "startedTime": "2019-03-17T04:27:18Z", "state": "Completed", "version": "4.0.0-0.ci-2019-03-17-010422" } ], "observedGeneration": 2, "versionHash": "vjzTSO3pwl0=" } } ``` The desired version is `4.0.0-0.ci-2019-03-17-041405` The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring ``` { "apiVersion": "config.openshift.io/v1", "kind": "ClusterOperator", "metadata": { "creationTimestamp": "2019-03-17T04:33:34Z", "generation": 1, "name": "monitoring", "resourceVersion": "56407", "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring", "uid": "d37049c6-486d-11e9-8028-0a28398410f8" }, "spec": {}, "status": { "conditions": [ { "lastTransitionTime": "2019-03-17T05:22:26Z", "message": "Successfully rolled out the stack.", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-03-17T05:22:26Z", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-03-17T05:22:26Z", "status": "False", "type": "Failing" } ], "extension": null, "relatedObjects": null, "versions": [ { "name": "operator", "version": "4.0.0-0.ci-2019-03-17-010422" } ] } }, ```
Also from CVO logs at https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz ``` I0317 05:37:29.940297 1 sync_worker.go:462] Running sync for clusteroperator "monitoring" (197 of 303) ... E0317 05:38:29.940627 1 task.go:58] error running apply for clusteroperator "monitoring" (197 of 303): Cluster operator monitoring is still updating ``` CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` looking at https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/#operator-run-template-e2e-aws-upgrade---e2e-aws-upgrade-container-test Also looking at the `clusterversion.yaml` `.status.history` the upgrade was started at 04:43 and still after around 50 mins upgrade had not finished. ``` { "completionTime": null, "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T05:34:01Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T05:34:01Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:44:17Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:44:17Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405", "startedTime": "2019-03-17T04:43:16Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-17-041405" }, { "completionTime": "2019-03-17T04:43:16Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b", "startedTime": "2019-03-17T04:27:18Z", "state": "Completed", "version": "4.0.0-0.ci-2019-03-17-010422" } ``` from an upgrade test that was green the history there shows upgrade finished in around similar time 35 mins for 4.0.0-0.ci-2019-03-18-223512 ``` { "completionTime": "2019-03-18T23:40:40Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512", "startedTime": "2019-03-18T23:25:10Z", "state": "Completed", "version": "4.0.0-0.ci-2019-03-18-223512" }, { "completionTime": "2019-03-18T23:25:10Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512", "startedTime": "2019-03-18T23:07:10Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-18-223512" }, { "completionTime": "2019-03-18T23:07:10Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512", "startedTime": "2019-03-18T23:06:24Z", "state": "Partial", "version": "4.0.0-0.ci-2019-03-18-223512" }, { "completionTime": "2019-03-18T23:06:24Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:0dfafc0de33a116643c8f89552ed3b82110ef56bf5671f12a16df72bede9e973", "startedTime": "2019-03-18T22:52:31Z", "state": "Completed", "version": "4.0.0-0.ci-2019-03-18-202114" } ```
> CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` Abhinav it sounds like you're saying that basically this wasn't a monitoring upgrade failure, it was a case where the entire upgrade took longer than the test was willing to wait, right? i.e. something else caused the upgrade to take a long time (50+ mins instead of 35), earlier in the process.
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e- aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz | gunzip | head -n1 I0317 05:34:00.593130 1 start.go:23] ClusterVersionOperator v3.11.0-226-ga573b856-dirty But https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator_previous.log.gz is empty? Are there logs from the partial deploys started at 2019-03-17T04:43:16Z and 2019-03-17T04:44:17Z? What causes a partial entry in the history? Why is the current CVO pod so young?
This error has not occurred in the 861 *-e2e-aws* CI runs since 2019-03-19T12:28Z (and possibly earlier). Maybe some subsequent pull fixed it? Or maybe it's just low-probability.
no such issue with our upgrade eg: upgrade OCP from 4.0.0-0.9 -> 4.0.0-0.10
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758