Bug 1871653 - Root device hints are ignored while deploying workers
Summary: Root device hints are ignored while deploying workers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Bob Fournier
QA Contact: Lubov
URL:
Whiteboard:
: 1875745 (view as bug list)
Depends On:
Blocks: 1805237 1864092
TreeView+ depends on / blocked
 
Reported: 2020-08-23 19:09 UTC by Lubov
Modified: 2020-10-27 16:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:31:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ironic inspector log for master (267.15 KB, text/plain)
2020-08-23 19:09 UTC, Lubov
no flags Details
ironic inspector logs for workers (303.48 KB, text/plain)
2020-08-23 19:10 UTC, Lubov
no flags Details
install-config.yaml (5.58 KB, text/plain)
2020-08-23 19:11 UTC, Lubov
no flags Details
output of 'openstack baremetal node show openshift-master-0-0' (25.01 KB, text/plain)
2020-08-24 11:57 UTC, Lubov
no flags Details
output of 'openstack baremetal node show openshift-master-0-0' (25.01 KB, text/plain)
2020-08-24 11:57 UTC, Lubov
no flags Details
output of 'oc describe bmh openshift-worker-0-1' (9.22 KB, text/plain)
2020-08-24 11:58 UTC, Lubov
no flags Details
updated worker ironic-inspector log (527.09 KB, text/plain)
2020-09-13 12:28 UTC, Lubov
no flags Details
metal3-ironic-conductor log (4.68 MB, text/plain)
2020-09-13 12:39 UTC, Lubov
no flags Details
manifest (120.00 KB, application/x-tar)
2020-09-13 12:48 UTC, Lubov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4088 0 None closed Bug 1871653: baremetal: set root device hints on host resources 2021-01-13 19:38:11 UTC
OpenStack gerrit 747072 0 None MERGED Update the cache if we don't have a root device hint 2021-01-13 19:38:11 UTC
OpenStack gerrit 750823 0 None MERGED Fix backup node lookup 2021-01-13 19:38:50 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:31:13 UTC

Description Lubov 2020-08-23 19:09:56 UTC
Created attachment 1712271 [details]
ironic inspector log for master

Description of problem:
While deploying workers on virtual simulation env, the ironic inspector ignores root device hints set in installer-config.yaml

Version-Release number of the following components:
Client Version: 4.6.0-0.nightly-2020-08-23-111716
Server Version: 4.6.0-0.nightly-2020-08-23-111716
Kubernetes Version: v1.19.0-rc.2+3e083ac-dirty


How reproducible:
100%

Steps to Reproduce:
1. Provision 3 masters and 2 workers with 2 disks big enough to deploy COREOS (52G each) 
2. In install-config.yaml add to all nodes
        rootDeviceHints:
          deviceName: /dev/sdb
3. Run deploy

Actual results:
From metal3-ironic-inspector log:
2020-08-23 18:43:03.769 1 DEBUG ironic_inspector.process [-] [node: 97e04e17-51bc-47dd-9e7b-b1d617b5a784 state processing MAC 52:54:00:44:bc:2f] Running post-processing hook root_disk_selection _run_post_hooks /usr/lib/python3.6/site-packages/ironic_inspector/process.py:268
2020-08-23 18:43:03.770 1 DEBUG ironic_inspector.plugins.standard [-] [node: 97e04e17-51bc-47dd-9e7b-b1d617b5a784 state processing MAC 52:54:00:44:bc:2f] Root device hints are not provided _process_root_device_hints /usr/lib/python3.6/site-packages/ironic_inspector/plugins/standard.py:46

Expected results:
For masters there are the following lines:
2020-08-23 18:18:47.500 1 DEBUG ironic_inspector.process [-] [node: b625385c-c4fd-4989-a847-c125d7b13da4 state processing MAC 52:54:00:c3:3c:7f] Running post-processing hook root_disk_selection _run_post_hooks /usr/lib/python3.6/site-packages/ironic_inspector/process.py:268^[[00m
2020-08-23 18:18:47.501 1 DEBUG ironic_lib.utils [-] Trying to find devices from "/dev/sda, /dev/sdb" that match the device hints "{'name': 's== /dev/sdb'}" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:362^[[00m
2020-08-23 18:18:47.503 1 DEBUG ironic_lib.utils [-] Trying to match the device hint "name" with a value of "s== /dev/sdb" against the same device's (/dev/sda) attribute with a value of "/dev/sda" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:393^[[00m
2020-08-23 18:18:47.505 1 DEBUG ironic_lib.utils [-] The attribute "name" (with value "/dev/sda") of device "/dev/sda" does not match the hint s== /dev/sdb find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:418^[[00m
2020-08-23 18:18:47.514 1 DEBUG ironic_lib.utils [-] Trying to match the device hint "name" with a value of "s== /dev/sdb" against the same device's (/dev/sdb) attribute with a value of "/dev/sdb" find_devices_by_hints /usr/lib/python3.6/site-packages/ironic_lib/utils.py:393^[[00m
2020-08-23 18:18:47.516 1 INFO ironic_lib.utils [-] Root device found! The device "{'name': '/dev/sdb', 'model': 'QEMU HARDDISK', 'size': 55834574848, 'rotational': True, 'wwn': None, 'serial': 'drive-scsi0-0-0-1', 'vendor': 'QEMU', 'wwn_with_extension': None, 'wwn_vendor_extension': None, 'hctl': '0:0:0:1', 'by_path': '/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:1'}" matches the root device hints {'name': 's== /dev/sdb'}

The same is expected for workers


Additional info:
attaching ironic inspector logs for masters and workers and install-config.yaml

Comment 1 Lubov 2020-08-23 19:10:56 UTC
Created attachment 1712273 [details]
ironic inspector logs for workers

Comment 2 Lubov 2020-08-23 19:11:20 UTC
Created attachment 1712274 [details]
install-config.yaml

Comment 3 Lubov 2020-08-24 11:57:23 UTC
Created attachment 1712342 [details]
output of 'openstack baremetal node show openshift-master-0-0'

Comment 4 Lubov 2020-08-24 11:57:31 UTC
Created attachment 1712343 [details]
output of 'openstack baremetal node show openshift-master-0-0'

Comment 5 Lubov 2020-08-24 11:58:38 UTC
Created attachment 1712344 [details]
output of 'oc describe bmh openshift-worker-0-1'

Comment 6 Stephen Benjamin 2020-08-24 15:49:24 UTC
I don't see where we're setting the root device hints on the BMH object for workers:

https://github.com/openshift/installer/blob/master/pkg/asset/machines/baremetal/hosts.go

Comment 7 Doug Hellmann 2020-09-04 20:46:55 UTC
*** Bug 1875745 has been marked as a duplicate of this bug. ***

Comment 10 Lubov 2020-09-10 18:32:07 UTC
Client Version: 4.6.0-0.nightly-2020-09-10-121352
Server Version: 4.6.0-0.nightly-2020-09-10-121352
Kubernetes Version: v1.19.0-rc.2+068702d

The same problem - rootDeviceHints: deviceName: /dev/sdb ignored on workers
[kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   25G  0 disk 
├─sda1                         8:1    0  384M  0 part /boot
├─sda2                         8:2    0  127M  0 part /boot/efi
├─sda3                         8:3    0    1M  0 part 
├─sda4                         8:4    0 24.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 24.4G  0 dm   /sysroot
└─sda5                         8:5    0   65M  0 part 
sdb                            8:16   0   45G  0 disk

"Root device hints are not provided" reported in metal3-ironic-inspector log

Comment 11 Doug Hellmann 2020-09-10 18:54:24 UTC
Which version of the installer are you using to test?

Comment 13 Bob Fournier 2020-09-10 22:27:46 UTC
It sounds like this may be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1867744, in which IPA was not using the configured root device hint. There is a fix merged upstream https://review.opendev.org/#/c/747072/ and its available in ironic-ipa-downloader-container-v4.6.0-202009082256.p0.  Can you try recent ironic images?

Comment 16 Lubov 2020-09-13 12:28:35 UTC
Created attachment 1714680 [details]
updated worker ironic-inspector log

Comment 18 Lubov 2020-09-13 12:39:58 UTC
Created attachment 1714682 [details]
metal3-ironic-conductor log

Comment 19 Lubov 2020-09-13 12:45:28 UTC
# ./openshift-baremetal-install version
./openshift-baremetal-install 4.6.0-0.nightly-2020-09-12-230035
built from commit 0a4cc6c428c9d1aaf11d7f05002ab7637cdb872f
release image registry.ocp-edge-cluster-0.qe.lab.redhat.com:5000/localimages/local-release-image@sha256:c6147df7325dfdc4b526a9d799870ccf1fd04334570d01e715376947eb0473f8

Comment 20 Lubov 2020-09-13 12:48:50 UTC
Created attachment 1714683 [details]
manifest

Comment 23 Doug Hellmann 2020-09-14 17:09:01 UTC
Bob is looking at the backport for the additional patch to IPA, so I'm going to reassign this to him.

Comment 26 Bob Fournier 2020-09-24 10:55:55 UTC
Lubov - yes, the fix has merged and the package has been tagged (https://bugzilla.redhat.com/show_bug.cgi?id=1878856) and should be available.

Comment 28 Lubov 2020-09-24 15:25:11 UTC
Verified on 
Client Version: 4.6.0-0.nightly-2020-09-24-095222
Server Version: 4.6.0-0.nightly-2020-09-24-095222
Kubernetes Version: v1.19.0+fff8183

[kni@provisionhost-0-0 ~]$ ssh core@worker-0-1 lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   25G  0 disk 
└─sda1                         8:1    0   25G  0 part 
sdb                            8:16   0   45G  0 disk 
├─sdb1                         8:17   0  384M  0 part /boot
├─sdb2                         8:18   0  127M  0 part /boot/efi
├─sdb3                         8:19   0    1M  0 part 
├─sdb4                         8:20   0 44.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 44.4G  0 dm   /sysroot
└─sdb5                         8:21   0   65M  0 part 

[kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   25G  0 disk 
└─sda1                         8:1    0   25G  0 part 
sdb                            8:16   0   45G  0 disk 
├─sdb1                         8:17   0  384M  0 part /boot
├─sdb2                         8:18   0  127M  0 part /boot/efi
├─sdb3                         8:19   0    1M  0 part 
├─sdb4                         8:20   0 44.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 44.4G  0 dm   /sysroot
└─sdb5                         8:21   0   65M  0 part

Comment 29 Doug Hellmann 2020-10-07 13:47:58 UTC
I'm setting this to "No Doc Update" because it's a new feature and the bug was found before the release.

Comment 31 errata-xmlrpc 2020-10-27 16:31:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.