Bug 2075024 - Metal upgrades permafailing on metal3 containers crash looping
Summary: Metal upgrades permafailing on metal3 containers crash looping
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Derek Higgins
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 2077305
TreeView+ depends on / blocked
 
Reported: 2022-04-13 12:38 UTC by Stephen Benjamin
Modified: 2022-08-10 11:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:07:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-os-images pull 13 0 None open Bug 2075024: Remove destination coreos if it exists 2022-04-13 14:20:56 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:07:26 UTC

Description Stephen Benjamin 2022-04-13 12:38:43 UTC
periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

Example run:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade/1513021466923241472

The machine-os-images containers are failing with errors like this:

$ oc logs -n openshift-machine-api metal3-image-cache-tqmn4 -c machine-os-images
2022-04-10T07:18:04.556516595Z /shared/html/images/coreos-x86_64-[vmlinuz|initrd.img|rootfs.img] are already up to date
2022-04-10T07:18:04.568801589Z cat: /shared/html/images/coreos-x86_64.iso.sha256: No such file or directory
2022-04-10T07:18:04.572850718Z Extracting ISO file
2022-04-10T07:18:04.572869633Z Adding kernel argument ip=dhcp
2022-04-10T07:18:05.373283528Z Error: persisting output file to /shared/html/images/coreos-x86_64.iso
2022-04-10T07:18:05.373283528Z 
2022-04-10T07:18:05.373283528Z Caused by:
2022-04-10T07:18:05.373283528Z     2022-04-10T07:18:05.373326819Z File exists (os error 17)

Comment 1 Derek Higgins 2022-04-13 14:23:17 UTC
Looks like output from the coreos-installer command


yup, pushing a patch to remove any dest file
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
Error: persisting output file to /tmp/afile.qcow2

Caused by:
    File exists (os error 17)

I've push a PR to remove the dst file if it exists

Comment 5 errata-xmlrpc 2022-08-10 11:07:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.