Created attachment 1710988 [details] worker_not_booting Description of problem: Installing 4.6.0-0.nightly-2020-08-09-151434 failed with both worker nodes stuck at boot screen. The setup is libvirt based and both worker nodes were configured with 5 disks, 4 being used by the local-storage operator. Master nodes had 1 disk. Worker nodes had 5 disks Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-09-151434 How reproducible: Steps to Reproduce: 1. Deploy an OCP cluster with nodes that have more than 1 disk 2. Check the console of the nodes with more than 1 disk 3. Actual results: Node stuck at boot screen not finding any boot disk While doing another test, with all nodes with just one disk, the install was successful. [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-10-110737 True False 31m Cluster version is 4.6.0-0.nightly-2020-08-10-110737 Expected results: Nodes with multiple disks to properly boot Additional info: Check printscreen with error
Suggestion: did you try BIOS booting?
Please check https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md#install-config
We are looking into having the rootDeviceHints added to the automation. Still this worked ok on 4.4 and 4.5, thus opened this BZ.
Looks like Ironic fails to write the image on /dev/sda which is the first disk and also set in rootDeviceHints: 2020-08-12 17:38:46.278 1 DEBUG ironic.drivers.modules.agent_client [-] Status of agent commands for node 48d6f7dd-c401-41ff-88f0-36b00fab776f: prepare_image: result "None", error "{'type': 'ImageWriteError', 'code': 500, 'message': 'Error writing image to device', 'details': 'Writing image to device /dev/sdb failed with exit code 1. stdout: write_image.sh: Erasing existing GPT and MBR data structures from /dev/sdb\nCreating new GPT entries.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nwrite_image.sh: Imaging /tmp/rhcos-46.82.202008111140-0-compressed.x86_64.qcow2 to /dev/sdb\n. stderr: 33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.0017078 s, 9.9 MB/s\n33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.000909784 s, 18.6 MB/s\nqemu-img: /dev/sdb: error while converting host_device: Device is too small\n'}"; get_deploy_steps: result "{'deploy_steps': {'GenericHardwareManager': [{'step': 'apply_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'argsinfo': {'raid_config': {'description': 'The RAID configuration to apply.', 'required': True}, 'delete_existing': {'description': "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", 'required': False}}}, {'step': 'write_image', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False}]}, 'hardware_manager_version': {'generic_hardware_manager': '1.1'}}", error "None"; get_partition_uuids: result "{}", error "None"; install_bootloader: result "None", error "{'type': 'DeviceNotFound', 'code': 404, 'message': 'Error finding the disk or partition device to deploy the image onto', 'details': 'No partition with UUID None found on device /dev/sda'}" get_commands_status /usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_client.py:275
Try expanding the hard disk of the VMs your using, and please confirm the disks are at least _40GB_. You also likely have smaller secondary disks. When root device hints are not defined, the smallest fundamentally usable block storage device is chosen as they are typically mirror sets in physical servers as opposed to large RAID sets. If you've created a small 4GB disk as a secondary disk, Ironic will choose it because it will see it as minimally viable. Imaging /tmp/rhcos-46.82.202008111140-0-compressed.x86_64.qcow2 to /dev/sdb\n. stderr: 33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.0017078 s, 9.9 MB/s\n33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.000909784 s, 18.6 MB/s\nqemu-img: /dev/sdb: error while converting host_device: Device is too small\n'}"
The disks were 10GB ones, that were used by the local-storage operator. I will do a test with 40 GB disks and report back. Still the question that remains is why the rootDeviceHints setting is not taken into consideration ?
Well, the root device hint default that was in 4.4 and 4.5 was /dev/sda that was set by the operator. I believe that team changed it so no default hint was supplied because that default was causing as many issues as it was attempting to prevent. Namely with NVME devices not being selected and the like. A quick look at the logs indicates the hint being supplied is invalid. Keep in mind, these are hints, so no match results in fallback to the default logic which chooses the smallest usable block device greater than four gigabytes. 'root_device': {'name': 's== /dev/sda'}}" If the hint is being supplied to Metal3 as "/dev/sda", then there is likely a bug in the baremetal-operator. Please check your input parameters.
Related to the hint, it was tested line that : {% if ocp_version_short is version('4.6', '>=') %} rootDeviceHints: deviceName: /dev/sda {% else %} while the install-config was generated. Not sure why it got to the operator in the invalid format. Maybe another bug ?
Managed to get a build successfull and 40 GB disk seems to make it work. Still it chooses sdb as the disk for installation. [kni@provisionhost-0-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0-0 Ready master 138m v1.19.0-rc.2+edbf229-dirty master-0-1 Ready master 137m v1.19.0-rc.2+edbf229-dirty master-0-2 Ready master 138m v1.19.0-rc.2+edbf229-dirty worker-0-0 Ready worker 109m v1.19.0-rc.2+edbf229-dirty worker-0-1 Ready worker 109m v1.19.0-rc.2+edbf229-dirty [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 52G 0 disk └─sda1 8:1 0 52G 0 part sdb 8:16 0 40G 0 disk ├─sdb1 8:17 0 384M 0 part /boot ├─sdb2 8:18 0 127M 0 part /boot/efi ├─sdb3 8:19 0 1M 0 part ├─sdb4 8:20 0 39.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 39.4G 0 dm /sysroot └─sdb5 8:21 0 65M 0 part sdc 8:32 0 40G 0 disk sdd 8:48 0 40G 0 disk sde 8:64 0 40G 0 disk [kni@provisionhost-0-0 ~]$ [kni@provisionhost-0-0 ~]$ ssh core@worker-0-1 lsblk The authenticity of host 'worker-0-1 (192.168.123.119)' can't be established. ECDSA key fingerprint is SHA256:OVL+/HO4JtoJgHjrM0ZWJIGAY1vnQ0SZhQs5b2rEyTM. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'worker-0-1,192.168.123.119' (ECDSA) to the list of known hosts. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 52G 0 disk └─sda1 8:1 0 52G 0 part sdb 8:16 0 40G 0 disk ├─sdb1 8:17 0 384M 0 part /boot ├─sdb2 8:18 0 127M 0 part /boot/efi ├─sdb3 8:19 0 1M 0 part ├─sdb4 8:20 0 39.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 39.4G 0 dm /sysroot └─sdb5 8:21 0 65M 0 part sdc 8:32 0 40G 0 disk sdd 8:48 0 40G 0 disk sde 8:64 0 40G 0 disk [kni@provisionhost-0-0 ~]$
Yeah, the use of sdb matches the algorithm used to select the device, so everything is working as intended at least without a root device hint. Just somehow the hint is going in... incorrectly. :\
Moving to baremetal-operator component as Ironic is working as designed, and the primary issue observed seems to be the operator is somehow supplying an malformed hint.
> Keep in mind, these are hints, so no match results in fallback to the default logic which chooses the smallest usable block device greater than four gigabytes. This is not quite correct: if a hint is present but does not match (or is malformed), the deployment fails.
removing the needinfo flag
Ok from the must-gather we see the rootDeviceHints seems OK in the BMH definition: ./openshift-worker-0-0.yaml: rootDeviceHints: ./openshift-worker-0-0.yaml- deviceName: /dev/sda However in the BMO logs we see this: ./current.log:2020-08-12T17:38:33.949966068Z {"level":"info","ts":1597253913.9499474,"logger":"baremetalhost_ironic","msg":"using root device","host":"openshift-worker-0-0","hints":{"name":"s== /dev/sda"}} This seems to be expected based on this code: https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/devicehints/devicehints.go#L19 That looks initially OK based on the docs I can find e.g https://docs.openstack.org/ironic/pike/install/include/root-device-hints.html We need the ironic experts to take a look, and identify what's malformed about the current ironic API input, I'm not entirely clear what it should look like.
So it turns out I was wrong. The format of the hint is actually valid and I'd just quite literally never seen anyone use that format before. According to additional testing I've added to the underling library it gets handled properly. It seems we've got a situation where the first disk is just getting disqualified or viewed as invalid. The only discrepancy we seem to notice looking at the disk data observed by the agent in the introspection action and the configuration seems to be that the disk name labels are not initializing in the same order. In this specific case /dev/sdb is actually labled "sde" in virsh with lun id 4. Since RHCOS is presenting the same order it seems, that seems unrelated and largely cosmetic as /dev/sda is the 50GB disk. We think the only way forward, since the agent logs are not captured in must-gather, is for this to be reproduced and left up for someone to investigate. Reassigning to Bob and the HWMgmt squad.
It seems that this does not reproduce anymore. Now it deploys ok, without any problems. [core@worker-0-0 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 52G 0 disk ├─sda1 8:1 0 384M 0 part /boot ├─sda2 8:2 0 127M 0 part /boot/efi ├─sda3 8:3 0 1M 0 part ├─sda4 8:4 0 51.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 51.4G 0 dm /sysroot └─sda5 8:5 0 65M 0 part sdb 8:16 0 10G 0 disk sdc 8:32 0 10G 0 disk sdd 8:48 0 10G 0 disk sde 8:64 0 10G 0 disk [core@worker-0-0 ~]$ logout Connection to worker-0-0 closed. [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-18-125322 True False 14h Cluster version is 4.6.0-0.nightly-2020-08-18-125322 [kni@provisionhost-0-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0-0 Ready master 14h v1.19.0-rc.2+99cb93a-dirty master-0-1 Ready master 14h v1.19.0-rc.2+99cb93a-dirty master-0-2 Ready master 14h v1.19.0-rc.2+99cb93a-dirty worker-0-0 Ready worker 14h v1.19.0-rc.2+99cb93a-dirty worker-0-1 Ready worker 14h v1.19.0-rc.2+99cb93a-dirty [kni@provisionhost-0-0 ~]$
Rerun the job, with another build and indeed this does not reproduce [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 52G 0 disk ├─sda1 8:1 0 384M 0 part /boot ├─sda2 8:2 0 127M 0 part /boot/efi ├─sda3 8:3 0 1M 0 part ├─sda4 8:4 0 51.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 51.4G 0 dm /sysroot └─sda5 8:5 0 65M 0 part sdb 8:16 0 10G 0 disk sdc 8:32 0 10G 0 disk sdd 8:48 0 10G 0 disk sde 8:64 0 10G 0 disk [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-18-165040 True False 50m Cluster version is 4.6.0-0.nightly-2020-08-18-165040 [kni@provisionhost-0-0 ~]$ Should we close this or there are still things to get clarified ?
ironic-ipa-downloader-container-v4.6.0-202009082256.p0 (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1313535) has the ironic-python-agent patch with fix: openstack-ironic-python-agent noarch 6.3.1-0.20200904042948.e73b722.el8ost See also tagging bug https://bugzilla.redhat.com/show_bug.cgi?id=1875510.
It looks like https://review.opendev.org/#/c/750823/ is also needed, moving this back to POST until that fix is in a container.
Fix has merged and package is available (https://bugzilla.redhat.com/show_bug.cgi?id=1878856).
Client Version: 4.6.0-0.nightly-2020-09-29-170625 Server Version: 4.6.0-0.nightly-2020-09-29-170625 Kubernetes Version: v1.19.0+6ef2098 [kni@provisionhost-0-0 ~]$ ssh core@worker-0-0 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 52G 0 disk ├─sda1 8:1 0 384M 0 part /boot ├─sda2 8:2 0 127M 0 part /boot/efi ├─sda3 8:3 0 1M 0 part ├─sda4 8:4 0 51.4G 0 part │ └─coreos-luks-root-nocrypt 253:0 0 51.4G 0 dm /sysroot └─sda5 8:5 0 65M 0 part sdb 8:16 0 10G 0 disk sdc 8:32 0 10G 0 disk sdd 8:48 0 10G 0 disk sde 8:64 0 10G 0 disk rootDeviceHints for workers verified by https://bugzilla.redhat.com/show_bug.cgi?id=1871653
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196