This issue does not apply to 4.3 updates, because we only support 4.y -> 4.(y+1), so 4.4 CVOs will never run on nodes with 4.2- kubelets. +++ This bug was initially created as a clone of Bug #1787422 +++ Although 4.3 kubelet capacity reporting works, we still need to drop the 4.3 request, to support flows like: 1. 4.2 cluster running with 4.2 CVO and 4.2 kubelets (so no capacity reporting). 2. Admin requests an update to 4.3.1. 3. 4.2 CVO launches a version pod without requests, because of the 4.2 reversion (#288). This works fine. 4. Update gets far enough to run a 4.3 CVO. 5. Update hangs on some 4.3.1 bug, while it's still running 4.2 kubelets. 6. Admin requests an update to 4.3.2. 7. 4.3 CVO launches a version pod with an ephemeral-storage request, which hangs because the 4.2 kubelets are still running and not reporting ephemeral-storage capacity.
Current 4.3 nightlies can update to 4.4 nightlies without hitting this, so jumping straight to MODIFIED.
No accepted 4.4 nightlies since the 20th [1], so I've launched a 4.3.0-0.nightly-2020-01-02-141332 -> 4.4.0-0.ci-2020-01-02-161748 job [2] to confirm the "does not apply to 4.3 -> 4.4" assertion. [1]: https://openshift-release.svc.ci.openshift.org/#4.4.0-0.nightly [2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.3/3
The 4.3.0-0.nightly-2020-01-02-141332 -> 4.4.0-0.ci-2020-01-02-161748 update job passed [1]. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.3/3
Attempt to keep this breakage class from sneaking through CI again: https://github.com/openshift/release/pull/6542
Run upgrade from 4.4.0-0.nightly-2020-01-08-233510 to 4.4.0-0.nightly-2020-01-09-013524 succeed. Checked the version pod requests ephemeral-storage resource and the scheduled node had the capacity. # ./oc get pod version--8hprx-2nbbg -ojson|jq .spec.containers[].resources { "requests": { "cpu": "10m", "ephemeral-storage": "2Mi", "memory": "50Mi" } } # ./oc get node control-plane-0 -ojson| jq .status.capacity { "cpu": "4", "ephemeral-storage": "30905324Ki", "hugepages-1Gi": "0", "hugepages-2Mi": "0", "memory": "8163844Ki", "pods": "250" } Since both 4.3 and 4.4 have ephemeral-storage capacity, so it will hit the issue like 4.2-4.3 in description. > The 4.3.0-0.nightly-2020-01-02-141332 -> 4.4.0-0.ci-2020-01-02-161748 update job passed [1]. And 4.3 to 4.4 upgrade works well from above ci job. Verify the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581