Bug 1954096 - Unable to deploy overcloud to nodes with nvme drives as OS drive
Summary: Unable to deploy overcloud to nodes with nvme drives as OS drive
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 16.1 (Train)
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Julia Kreger
QA Contact: Paras Babbar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-27 15:11 UTC by phalmos
Modified: 2021-12-09 20:19 UTC (History)
7 users (show)

Fixed In Version: openstack-ironic-python-agent-5.0.4-1.20210608203308.9920532.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:19:00 UTC
Target Upstream Version:


Attachments (Terms of Use)
Deploy logs from a failed node (50.20 KB, application/gzip)
2021-04-27 15:11 UTC, phalmos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack Storyboard 2008881 0 None None None 2021-05-04 14:51:17 UTC
OpenStack gerrit 788338 0 None NEW Fix NVMe Partition image on UEFI 2021-04-27 18:13:05 UTC
Red Hat Issue Tracker OSP-3361 0 None None None 2021-11-18 11:35:52 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:19:20 UTC

Description phalmos 2021-04-27 15:11:37 UTC
Created attachment 1776028 [details]
Deploy logs from a failed node

Description of problem:
Attempting to deploy an overcloud with mixed DL360 G10 
and HP E910 blade nodes.  Th eDL360 nodes deploy fine as expected.  I am unabel to deploy to any of the E910 nodes.  Ironic throws the following error:  
Apr 27 10:37:52 host-192-168-10-92 ironic-python-agent[1984]: 2021-04-27 10:37:52.011 1984 ERROR root [-] Command execution error: invalid literal for int() with base 10: 'p1': ValueError: invalid literal for int() with base 10: 'p1'
                                                              2021-04-27 10:37:52.011 1984 ERROR root Traceback (most recent call last):
                                                              2021-04-27 10:37:52.011 1984 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 256, in execute_command
                                                              2021-04-27 10:37:52.011 1984 ERROR root     result = ext.execute(command_part, **kwargs)
                                                              2021-04-27 10:37:52.011 1984 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 208, in execute
                                                              2021-04-27 10:37:52.011 1984 ERROR root     return cmd(**kwargs)
                                                              2021-04-27 10:37:52.011 1984 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py", line 326, in wrapper
                                                              2021-04-27 10:37:52.011 1984 ERROR root     result = func(self, **command_params)
                                                              2021-04-27 10:37:52.011 1984 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 562, in install_bootloader
                                                              2021-04-27 10:37:52.011 1984 ERROR root     efi_system_part_uuid=efi_system_part_uuid):
                                                              2021-04-27 10:37:52.011 1984 ERROR root   File "/usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py", line 289, in _manage_uefi
                                                              2021-04-27 10:37:52.011 1984 ERROR root     efi_partition = int(partition.replace(device, ""))
                                                              2021-04-27 10:37:52.011 1984 ERROR root ValueError: invalid literal for int() with base 10: 'p1'
                                                              2021-04-27 10:37:52.011 1984 ERROR root



Version-Release number of selected component (if applicable):
16.1.latest

How reproducible: Every time I try in this environment 


Steps to Reproduce:
1. Deploy undercloud according to docs
2. Import and introspect all nodes
3. Run overcloud deploy with HP E910 nodes

Actual results: 
No hosts found

Expected results: Overcloud successfully deployed 


Additional info: The DL360s which work have SATA drives while the e910's only have 5 - NVMe drives available.  Introspection and cleaning work correctly on both servers.

Comment 1 Julia Kreger 2021-04-27 15:19:16 UTC
Looks like the issue is the code for handling/guessing the partition is not NVMe pattern safe. We know customers have deployed on Edgeline hardware using whole disk images. I highly recommend taking that route in the mean time.

Comment 2 Chris Janiszewski 2021-04-27 15:32:07 UTC
Thanks Julia. We are going to give wholedisk image a try first.

Comment 3 Julia Kreger 2021-04-27 18:13:06 UTC
Proposed fix upstream.

Comment 4 phalmos 2021-04-27 19:36:38 UTC
Hi Julia,

I added your code fix to the IPA initramfs and updated the deployment with the new image.  It seems to have solved the issue and the deployment is moving forward.  I will let you know if everything works as expected.  Thanks for the quick fix!!

Comment 8 phalmos 2021-04-27 20:15:46 UTC
For the record, these are the steps I took to make it work with the code fix posted above by Julia.

# make the temp directory and change into it
 mkdir /tmp/initramfs
 cd /tmp/initramfs/

# unpack the initramfs into current directory
 zcat ~/images/ironic-python-agent.initramfs | cpio -idmv

# make the edit to match the fix above
 vi +303 ./usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py

# create the new initramfs file
 find . | cpio -o -c -R root:root | gzip -9 > ~/images/ironic-python-agent2.initramfs

# change into the images directory and do some housekeeping to maintain the original file for backup
 cd /home/stack/images/
 mv ironic-python-agent.initramfs ironic-python-agent.initramfs.orig
 mv ironic-python-agent2.initramfs ironic-python-agent.initramfs

# re-upload the image to glance 
 openstack overcloud image upload --image-path ~/images/ --update-existing

Those were the steps that worked so far.  I am now able to put down the OS and boot from the node.

Comment 36 errata-xmlrpc 2021-12-09 20:19:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.