Bug 2077305

Summary: Metal upgrades permafailing on metal3 containers crash looping
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Bare Metal Hardware ProvisioningAssignee: Riccardo Pittau <rpittau>
Bare Metal Hardware Provisioning sub component: OS Image Provider QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, derekh, lshilin, rpittau, sippy, zbitter
Version: 4.10Keywords: OtherQA, Triaged
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-18 11:51:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2075024    
Bug Blocks:    

Description OpenShift BugZilla Robot 2022-04-21 01:43:45 UTC
+++ This bug was initially created as a clone of Bug #2075024 +++

periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

Example run:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade/1513021466923241472

The machine-os-images containers are failing with errors like this:

$ oc logs -n openshift-machine-api metal3-image-cache-tqmn4 -c machine-os-images
2022-04-10T07:18:04.556516595Z /shared/html/images/coreos-x86_64-[vmlinuz|initrd.img|rootfs.img] are already up to date
2022-04-10T07:18:04.568801589Z cat: /shared/html/images/coreos-x86_64.iso.sha256: No such file or directory
2022-04-10T07:18:04.572850718Z Extracting ISO file
2022-04-10T07:18:04.572869633Z Adding kernel argument ip=dhcp
2022-04-10T07:18:05.373283528Z Error: persisting output file to /shared/html/images/coreos-x86_64.iso
2022-04-10T07:18:05.373283528Z 
2022-04-10T07:18:05.373283528Z Caused by:
2022-04-10T07:18:05.373283528Z     2022-04-10T07:18:05.373326819Z File exists (os error 17)

--- Additional comment from derekh on 2022-04-13 14:23:17 UTC ---

Looks like output from the coreos-installer command


yup, pushing a patch to remove any dest file
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
Error: persisting output file to /tmp/afile.qcow2

Caused by:
    File exists (os error 17)

I've push a PR to remove the dst file if it exists

Comment 5 errata-xmlrpc 2022-05-18 11:51:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2178