test: Create the release image "latest" containing all images built by this job is failing frequently in CI, see search results: https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=operator%5C.Create+the+release+image+%22latest%22+containing+all+images+built+by+this+job This seems to have a few different recent failures, for instance: * https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275420323147157504 * https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/888/pull-ci-openshift-cluster-kube-apiserver-operator-master-images/1273439452068319232 * https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/890/pull-ci-openshift-cluster-kube-apiserver-operator-master-images/1275435876230369280 In some cases it may be caused by an outage or image retrieval failure; however, I'm opening this BZ to see if it can be investigated.
The build log for [1] wasn't particularly clear: 2020/06/23 13:54:40 Create release image default-route-openshift-image-registry.apps.build01.ci.devcluster.openshift.com/ci-op-r0h8tx5f/release:latest 2020/06/23 13:57:57 error: unable to signal to artifacts container to terminate in pod release-latest, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts") But it's a PR presubmit, so it's possible it was just interrupted by a new push. Checking the PR, [2] is a force-push at 13:27Z. So might have triggered this images job, but is unlikely to have lead to its termination. And the PR CI history [3] shows the subsequent images job is [4], starting at 2020-06-23 21:22:26Z. So... I dunno. Possibly the CI cluster felt overwhelmed and terminated some jobs? Or the run flaked out on something else? Do we care about failed image builds for PR preflights? I'd expect we'd leave those up to the PR authors, and only focus on images failures for release promotion jobs. [1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275420323147157504 [2]: https://github.com/openshift/console/pull/5747#event-3472947232 [3]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/pr-history/?org=openshift&repo=console&pr=5747 [4]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275539719005933568
We did in fact see a large spike in the number of jobs failing to run a Pod to create that release: https://grafana-prow-monitoring.apps.ci.l2s4.p1.openshiftapps.com/d/8ce131e226b7fd2901c2fce45d4e21c1/dptp-dashboard?orgId=1&from=1592932043046&to=1593018443048
This morning one of our build farms hit this bug and failed to schedule pods, which caused this: https://bugzilla.redhat.com/show_bug.cgi?id=1808588# *** This bug has been marked as a duplicate of bug 1808588 ***