Description of problem: Build-watcher looking at the 4.8 nightly jobs, the metal upgrade jobs has a significant degrade in performance. Example job tracker: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade Most recently it started failing on MCO, which upon checking MCO logs, I see 'Node worker-1 is reporting: "failed to drain node (5 tries): timed out waiting for the condition: error when evicting pods/\"image-registry-556b7484d5-pbrqc\" -n \"openshift-image-registry\": global timeout reached: 1m30s"' Which degrades the upgrade. Not sure why the pod is unable to drain (and is repeatedly failing to do so). So opening this bug. Example jobs https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade/1377260372473417728 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade/1377201785415929856 See must-gather/cluster-scoped-resources/machineconfig.../machineconfigpools/worker status for the error Version-Release number of selected component (if applicable): 4.8 metal How reproducible: 100% across 3 jobs I looked at Steps to Reproduce: See CI Actual results: Fail Expected results: Pass Additional info:
At least [1] has PodDisruptionBudgetAtLimit firing as well. [1]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade/1377260372473417728 *** This bug has been marked as a duplicate of bug 1944762 ***