Created attachment 1776028 [details] Deploy logs from a failed node Description of problem: Attempting to deploy an overcloud with mixed DL360 G10 and HP E910 blade nodes. Th eDL360 nodes deploy fine as expected. I am unabel to deploy to any of the E910 nodes. Ironic throws the following error: Apr 27 10:37:52 host-192-168-10-92 ironic-python-agent[1984]: 2021-04-27 10:37:52.011 1984 ERROR root [-] Command execution error: invalid literal for int() with base 10: 'p1': ValueError: invalid literal for int() with base 10: 'p1' 2021-04-27 10:37:52.011 1984 ERROR root Traceback (most recent call last): 2021-04-27 10:37:52.011 1984 ERROR root File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 256, in execute_command 2021-04-27 10:37:52.011 1984 ERROR root result = ext.execute(command_part, **kwargs) 2021-04-27 10:37:52.011 1984 ERROR root File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 208, in execute 2021-04-27 10:37:52.011 1984 ERROR root return cmd(**kwargs) 2021-04-27 10:37:52.011 1984 ERROR root File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 326, in wrapper 2021-04-27 10:37:52.011 1984 ERROR root result = func(self, **command_params) 2021-04-27 10:37:52.011 1984 ERROR root File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 562, in install_bootloader 2021-04-27 10:37:52.011 1984 ERROR root efi_system_part_uuid=efi_system_part_uuid): 2021-04-27 10:37:52.011 1984 ERROR root File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 289, in _manage_uefi 2021-04-27 10:37:52.011 1984 ERROR root efi_partition = int(partition.replace(device, "")) 2021-04-27 10:37:52.011 1984 ERROR root ValueError: invalid literal for int() with base 10: 'p1' 2021-04-27 10:37:52.011 1984 ERROR root Version-Release number of selected component (if applicable): 16.1.latest How reproducible: Every time I try in this environment Steps to Reproduce: 1. Deploy undercloud according to docs 2. Import and introspect all nodes 3. Run overcloud deploy with HP E910 nodes Actual results: No hosts found Expected results: Overcloud successfully deployed Additional info: The DL360s which work have SATA drives while the e910's only have 5 - NVMe drives available. Introspection and cleaning work correctly on both servers.
Looks like the issue is the code for handling/guessing the partition is not NVMe pattern safe. We know customers have deployed on Edgeline hardware using whole disk images. I highly recommend taking that route in the mean time.
Thanks Julia. We are going to give wholedisk image a try first.
Proposed fix upstream.
Hi Julia, I added your code fix to the IPA initramfs and updated the deployment with the new image. It seems to have solved the issue and the deployment is moving forward. I will let you know if everything works as expected. Thanks for the quick fix!!
For the record, these are the steps I took to make it work with the code fix posted above by Julia. # make the temp directory and change into it mkdir /tmp/initramfs cd /tmp/initramfs/ # unpack the initramfs into current directory zcat ~/images/ironic-python-agent.initramfs | cpio -idmv # make the edit to match the fix above vi +303 ./usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py # create the new initramfs file find . | cpio -o -c -R root:root | gzip -9 > ~/images/ironic-python-agent2.initramfs # change into the images directory and do some housekeeping to maintain the original file for backup cd /home/stack/images/ mv ironic-python-agent.initramfs ironic-python-agent.initramfs.orig mv ironic-python-agent2.initramfs ironic-python-agent.initramfs # re-upload the image to glance openstack overcloud image upload --image-path ~/images/ --update-existing Those were the steps that worked so far. I am now able to put down the OS and boot from the node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762