Bug 1850144 - [buildcop] Create the release image "latest" containing all images built by this job is failing
Summary: [buildcop] Create the release image "latest" containing all images built by t...
Keywords:
Status: CLOSED DUPLICATE of bug 1808588
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Infrastructure
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Steve Kuznetsov
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-23 15:24 UTC by Christian Huffman
Modified: 2020-06-24 17:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
operator.Create the release image "latest" containing all images built by this job
Last Closed: 2020-06-24 17:09:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 2 W. Trevor King 2020-06-24 04:17:41 UTC
The build log for [1] wasn't particularly clear:

2020/06/23 13:54:40 Create release image default-route-openshift-image-registry.apps.build01.ci.devcluster.openshift.com/ci-op-r0h8tx5f/release:latest
2020/06/23 13:57:57 error: unable to signal to artifacts container to terminate in pod release-latest, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")

But it's a PR presubmit, so it's possible it was just interrupted by a new push.  Checking the PR, [2] is a force-push at 13:27Z.  So might have triggered this images job, but is unlikely to have lead to its termination.  And the PR CI history [3] shows the subsequent images job is [4], starting at 2020-06-23 21:22:26Z.  So... I dunno.  Possibly the CI cluster felt overwhelmed and terminated some jobs?  Or the run flaked out on something else?  Do we care about failed image builds for PR preflights?  I'd expect we'd leave those up to the PR authors, and only focus on images failures for release promotion jobs.

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275420323147157504
[2]: https://github.com/openshift/console/pull/5747#event-3472947232
[3]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/pr-history/?org=openshift&repo=console&pr=5747
[4]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/pr-logs/pull/openshift_console/5747/pull-ci-openshift-console-master-images/1275539719005933568

Comment 5 Steve Kuznetsov 2020-06-24 17:07:46 UTC
We did in fact see a large spike in the number of jobs failing to run a Pod to create that release:

https://grafana-prow-monitoring.apps.ci.l2s4.p1.openshiftapps.com/d/8ce131e226b7fd2901c2fce45d4e21c1/dptp-dashboard?orgId=1&from=1592932043046&to=1593018443048

Comment 6 Steve Kuznetsov 2020-06-24 17:09:30 UTC
This morning one of our build farms hit this bug and failed to schedule pods, which caused this:

https://bugzilla.redhat.com/show_bug.cgi?id=1808588#

*** This bug has been marked as a duplicate of bug 1808588 ***


Note You need to log in before you can comment on or make changes to this bug.