Bug 1772080

Summary: [IPI Baremetal]: gz openstack image suffix breaks deployment
Product: OpenShift Container Platform Reporter: Steven Hardy <shardy>
Component: InstallerAssignee: Steven Hardy <shardy>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Sasha Smolyak <ssmolyak>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: augol, jrist, rbartal, stbenjam, xtian
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:12:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Hardy 2019-11-13 15:17:45 UTC
Description of problem:

The recent changes to add a .gz suffix to the openstack (and qemu) images in data/data/rhcos.json have broken baremetal IPI, there are two problems:

The bootstrap VM which relies on the libvirt/qemu image no longer boots - this may be resolved via openshift/installer#2657 or something similar

The image URLs for terraform and also the machines now contains the .gz suffix, but this is wrong because we now uncompress the file in https://github.com/openshift/ironic-rhcos-downloader so we should strip the path given in the tfvars and the machine providerSpec.


How reproducible:
Always

Steps to Reproduce:
1. Attempt to deploy IPI baremetal via openshift-baremetal-install
2. Observe bootstrap VM fails to boot with a non-bootable disk image
3. Work around the bootstrap VM issue then observe the masters fail to deploy because the image URL contains the .gz suffix

Actual results:

Terraform fails with an error like:


level=error msg="Error: Bad request with: [PUT http://172.22.0.2:6385/v1/nodes/2e4048b2-6e1d-4812-9854-df4f5d4a7fcd/states/provision], error message: {\"error_message\": \"{\\\"faultcode\\\": \\\"Client\\\", \\\"faultstring\\\": \\\"Failed to validate deploy or power info for node 2e4048b2-6e1d-4812-9854-df4f5d4a7fcd. Error: Validation of image href http://172.22.0.2/images/rhcos-43.81.201911081536.0-openstack.x86_64.qcow2.gz/rhcos-43.81.201911081536.0-compressed.x86_64.qcow2.gz failed, reason: Got HTTP code 404 instead of 200 in response to HEAD request.\\\", \\\"debuginfo\\\": null}\"}"

After working around that the machine provider spec is also similarly malformed, check `oc describe machineset ostest-worker-0 -n openshift-machine-api`

Expected results:
bootstrap, master and worker deployments should work as before the gzipped images.
Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Sasha Smolyak 2019-12-11 12:05:28 UTC
gz files are downloaded for bootstrap node and unpacked, installation passes. Verified

Comment 4 errata-xmlrpc 2020-01-23 11:12:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062