Created attachment 1712271 [details] ironic inspector log for master Description of problem: While deploying workers on virtual simulation env, the ironic inspector ignores root device hints set in installer-config.yaml Version-Release number of the following components: Client Version: 4.6.0-0.nightly-2020-08-23-111716 Server Version: 4.6.0-0.nightly-2020-08-23-111716 Kubernetes Version: v1.19.0-rc.2+3e083ac-dirty How reproducible: 100% Steps to Reproduce: 1. Provision 3 masters and 2 workers with 2 disks big enough to deploy COREOS (52G each) 2. In install-config.yaml add to all nodes rootDeviceHints: deviceName: /dev/sdb 3. Run deploy Actual results: From metal3-ironic-inspector log: 2020-08-23 18:43:03.769 1 DEBUG ironic_inspector.process [-] [node: 97e04e17-51bc-47dd-9e7b-b1d617b5a784 state processing MAC 52:54:00:44:bc:2f] Running post-processing hook root_disk_selection _run_post_hooks /usr/lib/python3.6/site-packages/ironic_inspector/process.py:268 2020-08-23 18:43:03.770 1 DEBUG ironic_inspector.plugins.standard [-] [node: 97e04e17-51bc-47dd-9e7b-b1d617b5a784 state processing MAC 52:54:00:44:bc:2f] Root device hints are not provided _process_root_device_hints /usr/lib/python3.6/site-packages/ironic_inspector/plugins/standard.py:46 Expected results: For masters there are the following lines: 2020-08-23 18:18:47.500 1 DEBUG ironic_inspector.process [-] [node: b625385c-c4fd-4989-a847-c125d7b13da4 state processing MAC 52:54:00:c3:3c:7f] Running post-processing hook root_disk_selection _run_post_hooks /usr/lib/python3.6/site-packages/ironic_inspector/process.py:268^[[00m 2020-08-23 18:18:47.501 1 DEBUG ironic_lib.utils [-] Trying to find devices from "/dev/sda, /dev/sdb" that match the device hints "{'name': 's== /dev/sdb'}" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:362^[[00m 2020-08-23 18:18:47.503 1 DEBUG ironic_lib.utils [-] Trying to match the device hint "name" with a value of "s== /dev/sdb" against the same device's (/dev/sda) attribute with a value of "/dev/sda" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:393^[[00m 2020-08-23 18:18:47.505 1 DEBUG ironic_lib.utils [-] The attribute "name" (with value "/dev/sda") of device "/dev/sda" does not match the hint s== /dev/sdb find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:418^[[00m 2020-08-23 18:18:47.514 1 DEBUG ironic_lib.utils [-] Trying to match the device hint "name" with a value of "s== /dev/sdb" against the same device's (/dev/sdb) attribute with a value of "/dev/sdb" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:393^[[00m 2020-08-23 18:18:47.516 1 INFO ironic_lib.utils [-] Root device found! The device "{'name': '/dev/sdb', 'model': 'QEMU HARDDISK', 'size': 55834574848, 'rotational': True, 'wwn': None, 'serial': 'drive-scsi0-0-0-1', 'vendor': 'QEMU', 'wwn_with_extension': None, 'wwn_vendor_extension': None, 'hctl': '0:0:0:1', 'by_path': '/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1'}" matches the root device hints {'name': 's== /dev/sdb'} The same is expected for workers Additional info: attaching ironic inspector logs for masters and workers and install-config.yaml
Created attachment 1712273 [details] ironic inspector logs for workers
Created attachment 1712274 [details] install-config.yaml
Created attachment 1712342 [details] output of 'openstack baremetal node show openshift-master-0-0'
Created attachment 1712343 [details] output of 'openstack baremetal node show openshift-master-0-0'
Created attachment 1712344 [details] output of 'oc describe bmh openshift-worker-0-1'
I don't see where we're setting the root device hints on the BMH object for workers: https://github.com/openshift/installer/blob/master/pkg/asset/machines/baremetal/hosts.go
*** Bug 1875745 has been marked as a duplicate of this bug. ***
Client Version: 4.6.0-0.nightly-2020-09-10-121352 Server Version: 4.6.0-0.nightly-2020-09-10-121352 Kubernetes Version: v1.19.0-rc.2+068702d The same problem - rootDeviceHints: deviceName: /dev/sdb ignored on workers [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 25G 0 disk ├─sda1 8:1 0 384M 0 part /boot ├─sda2 8:2 0 127M 0 part /boot/efi ├─sda3 8:3 0 1M 0 part ├─sda4 8:4 0 24.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 24.4G 0 dm /sysroot └─sda5 8:5 0 65M 0 part sdb 8:16 0 45G 0 disk "Root device hints are not provided" reported in metal3-ironic-inspector log
Which version of the installer are you using to test?
It sounds like this may be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1867744, in which IPA was not using the configured root device hint. There is a fix merged upstream https://review.opendev.org/#/c/747072/ and its available in ironic-ipa-downloader-container-v4.6.0-202009082256.p0. Can you try recent ironic images?
Created attachment 1714680 [details] updated worker ironic-inspector log
Created attachment 1714682 [details] metal3-ironic-conductor log
# ./openshift-baremetal-install version ./openshift-baremetal-install 4.6.0-0.nightly-2020-09-12-230035 built from commit 0a4cc6c428c9d1aaf11d7f05002ab7637cdb872f release image registry.ocp-edge-cluster-0.qe.lab.redhat.com:5000/localimages/local-release-image@sha256:c6147df7325dfdc4b526a9d799870ccf1fd04334570d01e715376947eb0473f8
Created attachment 1714683 [details] manifest
Bob is looking at the backport for the additional patch to IPA, so I'm going to reassign this to him.
Lubov - yes, the fix has merged and the package has been tagged (https://bugzilla.redhat.com/show_bug.cgi?id=1878856) and should be available.
Verified on Client Version: 4.6.0-0.nightly-2020-09-24-095222 Server Version: 4.6.0-0.nightly-2020-09-24-095222 Kubernetes Version: v1.19.0+fff8183 [kni@provisionhost-0-0 ~]$ ssh core@worker-0-1 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 25G 0 disk └─sda1 8:1 0 25G 0 part sdb 8:16 0 45G 0 disk ├─sdb1 8:17 0 384M 0 part /boot ├─sdb2 8:18 0 127M 0 part /boot/efi ├─sdb3 8:19 0 1M 0 part ├─sdb4 8:20 0 44.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 44.4G 0 dm /sysroot └─sdb5 8:21 0 65M 0 part [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 25G 0 disk └─sda1 8:1 0 25G 0 part sdb 8:16 0 45G 0 disk ├─sdb1 8:17 0 384M 0 part /boot ├─sdb2 8:18 0 127M 0 part /boot/efi ├─sdb3 8:19 0 1M 0 part ├─sdb4 8:20 0 44.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 44.4G 0 dm /sysroot └─sdb5 8:21 0 65M 0 part
I'm setting this to "No Doc Update" because it's a new feature and the bug was found before the release.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196