1850144 – [buildcop] Create the release image "latest" containing all images built by this job is failing

Bug 1850144 - [buildcop] Create the release image "latest" containing all images built by this job is failing

Summary: [buildcop] Create the release image "latest" containing all images built by t...

Keywords:
Status:	CLOSED DUPLICATE of bug 1808588
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Test Infrastructure
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Steve Kuznetsov
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-23 15:24 UTC by Christian Huffman
Modified:	2020-06-24 17:09 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:	operator.Create the release image "latest" containing all images built by this job
Last Closed:	2020-06-24 17:09:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Christian Huffman 2020-06-23 15:24:06 UTC

test: Create the release image "latest" containing all images built by this job 

is failing frequently in CI, see search results:
https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=operator%5C.Create+the+release+image+%22latest%22+containing+all+images+built+by+this+job

This seems to have a few different recent failures, for instance:

* https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275420323147157504
* https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/888/pull-ci-openshift-cluster-kube-apiserver-operator-master-images/1273439452068319232
* https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/890/pull-ci-openshift-cluster-kube-apiserver-operator-master-images/1275435876230369280

In some cases it may be caused by an outage or image retrieval failure; however, I'm opening this BZ to see if it can be investigated.

Comment 2 W. Trevor King 2020-06-24 04:17:41 UTC

The build log for [1] wasn't particularly clear:

2020/06/23 13:54:40 Create release image default-route-openshift-image-registry.apps.build01.ci.devcluster.openshift.com/ci-op-r0h8tx5f/release:latest
2020/06/23 13:57:57 error: unable to signal to artifacts container to terminate in pod release-latest, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")

But it's a PR presubmit, so it's possible it was just interrupted by a new push. Checking the PR, [2] is a force-push at 13:27Z. So might have triggered this images job, but is unlikely to have lead to its termination. And the PR CI history [3] shows the subsequent images job is [4], starting at 2020-06-23 21:22:26Z. So... I dunno. Possibly the CI cluster felt overwhelmed and terminated some jobs? Or the run flaked out on something else? Do we care about failed image builds for PR preflights? I'd expect we'd leave those up to the PR authors, and only focus on images failures for release promotion jobs.

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275420323147157504
[2]: https://github.com/openshift/console/pull/5747#event-3472947232
[3]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/pr-history/?org=openshift&repo=console&pr=5747
[4]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275539719005933568

Comment 5 Steve Kuznetsov 2020-06-24 17:07:46 UTC

We did in fact see a large spike in the number of jobs failing to run a Pod to create that release:

https://grafana-prow-monitoring.apps.ci.l2s4.p1.openshiftapps.com/d/8ce131e226b7fd2901c2fce45d4e21c1/dptp-dashboard?orgId=1&from=1592932043046&to=1593018443048

Comment 6 Steve Kuznetsov 2020-06-24 17:09:30 UTC

This morning one of our build farms hit this bug and failed to schedule pods, which caused this:

https://bugzilla.redhat.com/show_bug.cgi?id=1808588#

*** This bug has been marked as a duplicate of bug 1808588 ***

Note You need to log in before you can comment on or make changes to this bug.