Bug 1867744

Summary: 4.6 UEFI Node with multiple disks does not boot up
Product: OpenShift Container Platform Reporter: Constantin Vultur <cvultur>
Component: Bare Metal Hardware ProvisioningAssignee: Bob Fournier <bfournie>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Lubov <lshilin>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: afasano, bfournie, dtantsur, jkreger, lshilin, mcornea, ohochman, sasha, shardy, tsedovic, yprokule
Version: 4.6Keywords: Triaged
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ironic-ipa-downloader-container-v4.6.0-202009082256.p0 Doc Type: Bug Fix
Doc Text:
When doing fast track deployments in Ironic, the ironic-python-agent was not caching the root device hints properly. This fix updates the root device hint correctly so the hint that is set gets used.
Story Points: ---
Clone Of:
: 2109847 (view as bug list) Environment:
Last Closed: 2020-10-27 16:27:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2109847    
Attachments:
Description Flags
worker_not_booting none

Description Constantin Vultur 2020-08-10 16:14:56 UTC
Created attachment 1710988 [details]
worker_not_booting

Description of problem:
Installing 4.6.0-0.nightly-2020-08-09-151434 failed with both worker nodes stuck at boot screen.
The setup is libvirt based and both worker nodes were configured with 5 disks, 4 being used by the local-storage operator.
Master nodes had 1 disk.
Worker nodes had 5 disks

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-09-151434

How reproducible:


Steps to Reproduce:
1. Deploy an OCP cluster with nodes that have more than 1 disk
2. Check the console of the nodes with more than 1 disk
3.

Actual results:
Node stuck at boot screen not finding any boot disk

While doing another test, with all nodes with just one disk, the install was successful.

[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-10-110737   True        False         31m     Cluster version is 4.6.0-0.nightly-2020-08-10-110737

Expected results:
Nodes with multiple disks to properly boot

Additional info:
Check printscreen with error

Comment 3 Andrea Fasano 2020-08-11 16:20:42 UTC
Suggestion: did you try BIOS booting?

Comment 5 Constantin Vultur 2020-08-11 16:59:18 UTC
We are looking into having the rootDeviceHints added to the automation. 

Still this worked ok on 4.4 and 4.5, thus opened this BZ.

Comment 6 Marius Cornea 2020-08-12 19:42:31 UTC
Looks like Ironic fails to write the image on /dev/sda which is the first disk and also set in rootDeviceHints:

2020-08-12 17:38:46.278 1 DEBUG ironic.drivers.modules.agent_client [-] Status of agent commands for node 48d6f7dd-c401-41ff-88f0-36b00fab776f: prepare_image: result "None", error "{'type': 'ImageWriteError', 'code': 500, 'message': 'Error writing image to device', 'details': 'Writing image to device /dev/sdb failed with exit code 1. stdout: write_image.sh: Erasing existing GPT and MBR data structures from /dev/sdb\nCreating new GPT entries.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nwrite_image.sh: Imaging /tmp/rhcos-46.82.202008111140-0-compressed.x86_64.qcow2 to /dev/sdb\n. stderr: 33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.0017078 s, 9.9 MB/s\n33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.000909784 s, 18.6 MB/s\nqemu-img: /dev/sdb: error while converting host_device: Device is too small\n'}"; get_deploy_steps: result "{'deploy_steps': {'GenericHardwareManager': [{'step': 'apply_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'argsinfo': {'raid_config': {'description': 'The RAID configuration to apply.', 'required': True}, 'delete_existing': {'description': "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", 'required': False}}}, {'step': 'write_image', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False}]}, 'hardware_manager_version': {'generic_hardware_manager': '1.1'}}", error "None"; get_partition_uuids: result "{}", error "None"; install_bootloader: result "None", error "{'type': 'DeviceNotFound', 'code': 404, 'message': 'Error finding the disk or partition device to deploy the image onto', 'details': 'No partition with UUID None found on device /dev/sda'}" get_commands_status /usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_client.py:275

Comment 8 Julia Kreger 2020-08-13 13:46:44 UTC
Try expanding the hard disk of the VMs your using, and please confirm the disks are at least _40GB_.

You also likely have smaller secondary disks. When root device hints are not defined, the smallest fundamentally usable block storage device is chosen as they are typically mirror sets in physical servers as opposed to large RAID sets. If you've created a small 4GB disk as a secondary disk, Ironic will choose it because it will see it as minimally viable.

Imaging /tmp/rhcos-46.82.202008111140-0-compressed.x86_64.qcow2 to /dev/sdb\n. stderr: 33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.0017078 s, 9.9 MB/s\n33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.000909784 s, 18.6 MB/s\nqemu-img: /dev/sdb: error while converting host_device: Device is too small\n'}"

Comment 9 Constantin Vultur 2020-08-13 14:11:01 UTC
The disks were 10GB ones, that were used by the local-storage operator. 

I will do a test with 40 GB disks and report back.

Still the question that remains is why the rootDeviceHints setting is not taken into consideration ?

Comment 10 Julia Kreger 2020-08-13 14:54:31 UTC
Well, the root device hint default that was in 4.4 and 4.5 was /dev/sda that was set by the operator. I believe that team changed it so no default hint was supplied because that default was causing as many issues as it was attempting to prevent. Namely with NVME devices not being selected and the like.

A quick look at the logs indicates the hint being supplied is invalid. Keep in mind, these are hints, so no match results in fallback to the default logic which chooses the smallest usable block device greater than four gigabytes.

'root_device': {'name': 's== /dev/sda'}}"

If the hint is being supplied to Metal3 as "/dev/sda", then there is likely a bug in the baremetal-operator. Please check your input parameters.

Comment 12 Constantin Vultur 2020-08-13 15:24:17 UTC
Related to the hint, it was tested line that :

{% if ocp_version_short is version('4.6', '>=') %}
        rootDeviceHints:
          deviceName: /dev/sda
{% else %}

while the install-config was generated.
Not sure why it got to the operator in the invalid format. Maybe another bug ?

Comment 13 Constantin Vultur 2020-08-13 19:04:40 UTC
Managed to get a build successfull and 40 GB disk seems to make it work.

Still it chooses sdb as the disk for installation.

[kni@provisionhost-0-0 ~]$ oc get nodes
NAME         STATUS   ROLES    AGE    VERSION
master-0-0   Ready    master   138m   v1.19.0-rc.2+edbf229-dirty
master-0-1   Ready    master   137m   v1.19.0-rc.2+edbf229-dirty
master-0-2   Ready    master   138m   v1.19.0-rc.2+edbf229-dirty
worker-0-0   Ready    worker   109m   v1.19.0-rc.2+edbf229-dirty
worker-0-1   Ready    worker   109m   v1.19.0-rc.2+edbf229-dirty
[kni@provisionhost-0-0 ~]$ ssh core@worker-0-0  lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   52G  0 disk 
└─sda1                         8:1    0   52G  0 part 
sdb                            8:16   0   40G  0 disk 
├─sdb1                         8:17   0  384M  0 part /boot
├─sdb2                         8:18   0  127M  0 part /boot/efi
├─sdb3                         8:19   0    1M  0 part 
├─sdb4                         8:20   0 39.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 39.4G  0 dm   /sysroot
└─sdb5                         8:21   0   65M  0 part 
sdc                            8:32   0   40G  0 disk 
sdd                            8:48   0   40G  0 disk 
sde                            8:64   0   40G  0 disk 
[kni@provisionhost-0-0 ~]$ 
[kni@provisionhost-0-0 ~]$ ssh core@worker-0-1  lsblk
The authenticity of host 'worker-0-1 (192.168.123.119)' can't be established.
ECDSA key fingerprint is SHA256:OVL+/HO4JtoJgHjrM0ZWJIGAY1vnQ0SZhQs5b2rEyTM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'worker-0-1,192.168.123.119' (ECDSA) to the list of known hosts.
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   52G  0 disk 
└─sda1                         8:1    0   52G  0 part 
sdb                            8:16   0   40G  0 disk 
├─sdb1                         8:17   0  384M  0 part /boot
├─sdb2                         8:18   0  127M  0 part /boot/efi
├─sdb3                         8:19   0    1M  0 part 
├─sdb4                         8:20   0 39.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 39.4G  0 dm   /sysroot
└─sdb5                         8:21   0   65M  0 part 
sdc                            8:32   0   40G  0 disk 
sdd                            8:48   0   40G  0 disk 
sde                            8:64   0   40G  0 disk 
[kni@provisionhost-0-0 ~]$

Comment 14 Julia Kreger 2020-08-13 19:48:35 UTC
Yeah, the use of sdb matches the algorithm used to select the device, so everything is working as intended at least without a root device hint. Just somehow the hint is going in... incorrectly. :\

Comment 15 Julia Kreger 2020-08-13 19:50:15 UTC
Moving to baremetal-operator component as Ironic is working as designed, and the primary issue observed seems to be the operator is somehow supplying an malformed hint.

Comment 16 Dmitry Tantsur 2020-08-16 14:37:01 UTC
> Keep in mind, these are hints, so no match results in fallback to the default logic which chooses the smallest usable block device greater than four gigabytes.

This is not quite correct: if a hint is present but does not match (or is malformed), the deployment fails.

Comment 17 Constantin Vultur 2020-08-17 07:25:42 UTC
removing the needinfo flag

Comment 20 Steven Hardy 2020-08-18 16:49:38 UTC
Ok from the must-gather we see the rootDeviceHints seems OK in the BMH definition:

  ./openshift-worker-0-0.yaml:    rootDeviceHints:
  ./openshift-worker-0-0.yaml-      deviceName: /dev/sda

However in the BMO logs we see this:

  ./current.log:2020-08-12T17:38:33.949966068Z {"level":"info","ts":1597253913.9499474,"logger":"baremetalhost_ironic","msg":"using root device","host":"openshift-worker-0-0","hints":{"name":"s== /dev/sda"}}

This seems to be expected based on this code:

https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/devicehints/devicehints.go#L19

That looks initially OK based on the docs I can find e.g https://docs.openstack.org/ironic/pike/install/include/root-device-hints.html

We need the ironic experts to take a look, and identify what's malformed about the current ironic API input, I'm not entirely clear what it should look like.

Comment 21 Julia Kreger 2020-08-18 18:56:30 UTC
So it turns out I was wrong. The format of the hint is actually valid and I'd just quite literally never seen anyone use that format before. According to additional testing I've added to the underling library it gets handled properly. It seems we've got a situation where the first disk is just getting disqualified or viewed as invalid. The only discrepancy we seem to notice looking at the disk data observed by the agent in the introspection action and the configuration seems to be that the disk name labels are not initializing in the same order. In this specific case /dev/sdb is actually labled "sde" in virsh with lun id 4. Since RHCOS is presenting the same order it seems, that seems unrelated and largely cosmetic as /dev/sda is the 50GB disk.

We think the only way forward, since the agent logs are not captured in must-gather, is for this to be reproduced and left up for someone to investigate.

Reassigning to Bob and the HWMgmt squad.

Comment 22 Constantin Vultur 2020-08-19 08:17:51 UTC
It seems that this does not reproduce anymore. Now it deploys ok, without any problems.


[core@worker-0-0 ~]$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   52G  0 disk 
├─sda1                         8:1    0  384M  0 part /boot
├─sda2                         8:2    0  127M  0 part /boot/efi
├─sda3                         8:3    0    1M  0 part 
├─sda4                         8:4    0 51.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 51.4G  0 dm   /sysroot
└─sda5                         8:5    0   65M  0 part 
sdb                            8:16   0   10G  0 disk 
sdc                            8:32   0   10G  0 disk 
sdd                            8:48   0   10G  0 disk 
sde                            8:64   0   10G  0 disk 
[core@worker-0-0 ~]$ logout
Connection to worker-0-0 closed.
[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-18-125322   True        False         14h     Cluster version is 4.6.0-0.nightly-2020-08-18-125322
[kni@provisionhost-0-0 ~]$ oc get nodes
NAME         STATUS   ROLES    AGE   VERSION
master-0-0   Ready    master   14h   v1.19.0-rc.2+99cb93a-dirty
master-0-1   Ready    master   14h   v1.19.0-rc.2+99cb93a-dirty
master-0-2   Ready    master   14h   v1.19.0-rc.2+99cb93a-dirty
worker-0-0   Ready    worker   14h   v1.19.0-rc.2+99cb93a-dirty
worker-0-1   Ready    worker   14h   v1.19.0-rc.2+99cb93a-dirty
[kni@provisionhost-0-0 ~]$

Comment 23 Constantin Vultur 2020-08-19 12:05:47 UTC
Rerun the job, with another build and indeed this does not reproduce 

[kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   52G  0 disk 
├─sda1                         8:1    0  384M  0 part /boot
├─sda2                         8:2    0  127M  0 part /boot/efi
├─sda3                         8:3    0    1M  0 part 
├─sda4                         8:4    0 51.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 51.4G  0 dm   /sysroot
└─sda5                         8:5    0   65M  0 part 
sdb                            8:16   0   10G  0 disk 
sdc                            8:32   0   10G  0 disk 
sdd                            8:48   0   10G  0 disk 
sde                            8:64   0   10G  0 disk 
[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-18-165040   True        False         50m     Cluster version is 4.6.0-0.nightly-2020-08-18-165040
[kni@provisionhost-0-0 ~]$ 

Should we close this or there are still things to get clarified ?

Comment 29 Bob Fournier 2020-09-09 00:42:15 UTC
ironic-ipa-downloader-container-v4.6.0-202009082256.p0 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1313535) has the ironic-python-agent patch with fix:

openstack-ironic-python-agent   noarch  6.3.1-0.20200904042948.e73b722.el8ost

See also tagging bug https://bugzilla.redhat.com/show_bug.cgi?id=1875510.

Comment 33 Bob Fournier 2020-09-14 17:29:07 UTC
It looks like https://review.opendev.org/#/c/750823/ is also needed, moving this back to POST until that fix is in a container.

Comment 34 Bob Fournier 2020-09-24 10:56:50 UTC
Fix has merged and package is available (https://bugzilla.redhat.com/show_bug.cgi?id=1878856).

Comment 36 Lubov 2020-09-30 09:47:32 UTC
Client Version: 4.6.0-0.nightly-2020-09-29-170625
Server Version: 4.6.0-0.nightly-2020-09-29-170625
Kubernetes Version: v1.19.0+6ef2098

[kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   52G  0 disk 
├─sda1                         8:1    0  384M  0 part /boot
├─sda2                         8:2    0  127M  0 part /boot/efi
├─sda3                         8:3    0    1M  0 part 
├─sda4                         8:4    0 51.4G  0 part 
│ └─coreos-luks-root-nocrypt 253:0    0 51.4G  0 dm   /sysroot
└─sda5                         8:5    0   65M  0 part 
sdb                            8:16   0   10G  0 disk 
sdc                            8:32   0   10G  0 disk 
sdd                            8:48   0   10G  0 disk 
sde                            8:64   0   10G  0 disk 

rootDeviceHints for workers verified by https://bugzilla.redhat.com/show_bug.cgi?id=1871653

Comment 38 errata-xmlrpc 2020-10-27 16:27:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196