Bug 2077305 - Metal upgrades permafailing on metal3 containers crash looping
Summary: Metal upgrades permafailing on metal3 containers crash looping
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.z
Assignee: Riccardo Pittau
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On: 2075024
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-21 01:43 UTC by OpenShift BugZilla Robot
Modified: 2022-05-18 11:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-18 11:51:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-os-images pull 14 0 None open [release-4.10] Bug 2077305: Remove destination coreos if it exists 2022-04-26 01:00:28 UTC
Red Hat Product Errata RHBA-2022:2178 0 None Waiting on Customer [Sat6.8] Satellite RAM requirements seem to have no upper bound 2022-05-19 09:35:21 UTC

Description OpenShift BugZilla Robot 2022-04-21 01:43:45 UTC
+++ This bug was initially created as a clone of Bug #2075024 +++

periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

Example run:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade/1513021466923241472

The machine-os-images containers are failing with errors like this:

$ oc logs -n openshift-machine-api metal3-image-cache-tqmn4 -c machine-os-images
2022-04-10T07:18:04.556516595Z /shared/html/images/coreos-x86_64-[vmlinuz|initrd.img|rootfs.img] are already up to date
2022-04-10T07:18:04.568801589Z cat: /shared/html/images/coreos-x86_64.iso.sha256: No such file or directory
2022-04-10T07:18:04.572850718Z Extracting ISO file
2022-04-10T07:18:04.572869633Z Adding kernel argument ip=dhcp
2022-04-10T07:18:05.373283528Z Error: persisting output file to /shared/html/images/coreos-x86_64.iso
2022-04-10T07:18:05.373283528Z 
2022-04-10T07:18:05.373283528Z Caused by:
2022-04-10T07:18:05.373283528Z     2022-04-10T07:18:05.373326819Z File exists (os error 17)

--- Additional comment from derekh on 2022-04-13 14:23:17 UTC ---

Looks like output from the coreos-installer command


yup, pushing a patch to remove any dest file
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
Error: persisting output file to /tmp/afile.qcow2

Caused by:
    File exists (os error 17)

I've push a PR to remove the dst file if it exists

Comment 5 errata-xmlrpc 2022-05-18 11:51:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2178


Note You need to log in before you can comment on or make changes to this bug.