This should have been fixed already by https://github.com/openshift/cluster-version-operator/pull/265
From job [1], i did not see any obvious log about if it's related with api version. And from recent ci build jobs, i notice the similar fail happen again in [2]. @Stefan Schimanski I'm not quite sure how should qe verify the bug against pr#265. Could u help confirm? [1] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/9490 [2] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/10456
Still see this test failure for recent ci tests. https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12127/ [Disruptive] Cluster upgrade should maintain a functioning cluster [Feature:ClusterUpgrade] [Suite:openshift] [Serial] fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:130]: during upgrade Unexpected error: <*errors.errorString | 0xc000be4360>: { s: "Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.3.0-0.nightly-2019-12-05-213858: 13% complete", } Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.3.0-0.nightly-2019-12-05-213858: 13% complete occurred
@liujia: Not every run failing with "Cluster upgrade should maintain a functioning cluster" is due to the CRD topic fixed in https://github.com/openshift/cluster-version-operator/pull/265. This was about one very specific case where upgrade could fail. What #265 fixed was about failing updates of very early CRDs before kube-apiserver was updated to 4.3. Looking https://search.svc.ci.openshift.org/?search=resource+may+have+been+deleted&maxAge=168h&context=2&type=all suggests me that this is not the case anymore. Moving back to modified.
While verifying bug 1779237#c4 yesterday, its tested cluster also failed, stuck at below for 20+ hours: oc get clusterversion version 4.2.0-0.nightly-2019-12-11-171302 True True 22h Working towards 4.3.0-0.nightly-2019-12-12-021332: 13% complete [xxia 2019-12-13 15:47:48 my]$ oc get co | grep -v "True False False" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.3.0-0.nightly-2019-12-12-021332 False True True 21h kube-apiserver 4.3.0-0.nightly-2019-12-12-021332 True True True 23h kube-controller-manager 4.3.0-0.nightly-2019-12-12-021332 True True True 23h kube-scheduler 4.3.0-0.nightly-2019-12-12-021332 True False True 23h machine-config 4.2.0-0.nightly-2019-12-11-171302 False True True 21h monitoring 4.3.0-0.nightly-2019-12-12-021332 False True True 21h network 4.3.0-0.nightly-2019-12-12-021332 True True True 23h [xxia 2019-12-13 15:48:14 my]$ oc get no NAME STATUS ROLES AGE VERSION ip-10-0-135-186.us-east-2.compute.internal NotReady,SchedulingDisabled worker 23h v1.14.6+cebabbf4a ip-10-0-141-151.us-east-2.compute.internal Ready master 23h v1.14.6+cebabbf4a ip-10-0-147-188.us-east-2.compute.internal Ready worker 23h v1.14.6+cebabbf4a ip-10-0-153-212.us-east-2.compute.internal NotReady,SchedulingDisabled master 23h v1.14.6+cebabbf4a ip-10-0-170-139.us-east-2.compute.internal Ready master 23h v1.14.6+cebabbf4a [xxia 2019-12-13 15:48:31 my]$ oc describe co kube-apiserver ... Status: Conditions: Last Transition Time: 2019-12-12T10:12:55Z Message: NodeControllerDegraded: The master node(s) "ip-10-0-153-212.us-east-2.compute.internal" not ready Reason: NodeControllerDegradedMasterNodesReady Status: True Type: Degraded Last Transition Time: 2019-12-13T03:06:56Z Message: Progressing: 3 nodes are at revision 7; 0 nodes have achieved new revision 9 Reason: Progressing Status: True Type: Progressing Last Transition Time: 2019-12-12T08:11:12Z Message: Available: 3 nodes are active; 3 nodes are at revision 7; 0 nodes have achieved new revision 9 Reason: AsExpected Status: True Type: Available Last Transition Time: 2019-12-12T08:10:06Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: ... [xxia 2019-12-13 15:52:22 my]$ oc get po -n openshift-kube-apiserver -l apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS kube-apiserver-ip-10-0-141-151.us-east-2.compute.internal 3/3 Running 0 22h apiserver=true,app=openshift-kube-apiserver,revision=7 kube-apiserver-ip-10-0-153-212.us-east-2.compute.internal 3/3 Running 0 22h apiserver=true,app=openshift-kube-apiserver,revision=7 kube-apiserver-ip-10-0-170-139.us-east-2.compute.internal 3/3 Running 0 22h apiserver=true,app=openshift-kube-apiserver,revision=7 I checked https://openshift-release.svc.ci.openshift.org/ , click latest payload - 4.3.0-0.nightly-2019-12-13-032731, in https://openshift-release.svc.ci.openshift.org/releasestream/4.3.0-0.nightly/release/4.3.0-0.nightly-2019-12-13-032731 , saw "4.2.10 (changes) - Failed", click it, saw https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12586 also shows: Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.3.0-0.nightly-2019-12-13-032731: 13% complete I'm not sure if this failure is same as original bug report. But 4.2 to 4.3 upgrade indeed does not work now.
(In reply to Stefan Schimanski from comment #5) > @liujia: Not every run failing with "Cluster upgrade should maintain a > functioning cluster" is due to the CRD topic fixed in > https://github.com/openshift/cluster-version-operator/pull/265. This was > about one very specific case where upgrade could fail. > > What #265 fixed was about failing updates of very early CRDs before > kube-apiserver was updated to 4.3. > > Looking > https://search.svc.ci.openshift.org/ > ?search=resource+may+have+been+deleted&maxAge=168h&context=2&type=all > suggests me that this is not the case anymore. Moving back to modified. Thx for point out the concrete error info in failed unit test-[Disruptive] Cluster upgrade should maintain a functioning cluster. Double confirm there was not the same error info in job https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12127/ from last verify.
> https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release- > openshift-origin-installer-e2e-aws-upgrade/12586 also shows: > Cluster did not complete upgrade: timed out waiting for the condition: > Working towards 4.3.0-0.nightly-2019-12-13-032731: 13% complete > > I'm not sure if this failure is same as original bug report. But 4.2 to 4.3 > upgrade indeed does not work now. Double check https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12586, it is not the same with the fix in pr265 about CRD api version. So @xxia, you could file a new bug for your new issue. i will verify the bug.
(In reply to liujia from comment #9) > So @xxia, you could file a new bug for your new issue Thanks, should be same as bug 1778904 .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days