Bug 2075024

Summary: Metal upgrades permafailing on metal3 containers crash looping
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: Bare Metal Hardware ProvisioningAssignee: Derek Higgins <derekh>
Bare Metal Hardware Provisioning sub component: OS Image Provider QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, derekh, rpittau, sippy, zbitter
Version: 4.11Keywords: OtherQA, Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:07:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2077305    

Description Stephen Benjamin 2022-04-13 12:38:43 UTC
periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade

Example run:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade/1513021466923241472

The machine-os-images containers are failing with errors like this:

$ oc logs -n openshift-machine-api metal3-image-cache-tqmn4 -c machine-os-images
2022-04-10T07:18:04.556516595Z /shared/html/images/coreos-x86_64-[vmlinuz|initrd.img|rootfs.img] are already up to date
2022-04-10T07:18:04.568801589Z cat: /shared/html/images/coreos-x86_64.iso.sha256: No such file or directory
2022-04-10T07:18:04.572850718Z Extracting ISO file
2022-04-10T07:18:04.572869633Z Adding kernel argument ip=dhcp
2022-04-10T07:18:05.373283528Z Error: persisting output file to /shared/html/images/coreos-x86_64.iso
2022-04-10T07:18:05.373283528Z 
2022-04-10T07:18:05.373283528Z Caused by:
2022-04-10T07:18:05.373283528Z     2022-04-10T07:18:05.373326819Z File exists (os error 17)

Comment 1 Derek Higgins 2022-04-13 14:23:17 UTC
Looks like output from the coreos-installer command


yup, pushing a patch to remove any dest file
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
[root@localhost core]# /var/../coreos-installer iso kargs modify -a dd=dd -o /tmp/afile.qcow2 /var/lib/containers/storage/volumes/ironic/_data/html/images/coreos-x86_64.iso 
Error: persisting output file to /tmp/afile.qcow2

Caused by:
    File exists (os error 17)

I've push a PR to remove the dst file if it exists

Comment 5 errata-xmlrpc 2022-08-10 11:07:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069