Created attachment 1791852 [details] DNSmasq configuration file Product: Red Hat OpenStack Reporter: jlabarre Component: python-ironic-inspector-client Version: 16.2 (Train) Severity: high Hardware: x86_64 OS: Linux Summary Introspection not providing boot for UEFI x86_64 node in a mixed architecture configuration, ipxe disabled Description of problem: Deploying an OpenStack 16.2 environment, with mixed architecture nodes (x86_64 and Power/ppc64le). As the Power nodes cannot use iPXE, the undercloud needs to be deployed with "ipxe_enabled - False". The controller node is a VM, and is defined as a BIOS-based system. So to define the entire cluster: 1) Director: x86_64 VM 2) Controller: x86_64 VM 3) Compute1: Power8 4) Compute2: Power9, OpenBMC 5) Compute3: x86_64, set for UEFI boot only This configuration should still be able to support iPXE systems, just not as a default. Nodes 2-4 are defined as ["boot_interface": "pxe"] in the nodes.json file used to import the node definitions, while 5 has ["boot_interface": "ipxe"]. Additionally the "capabilities:" lines for the nodes have "boot_mode:bios" or "boot_mode:uefi" as appropriate. Also will set ipmi_disable_boot_timeout=False for the OpenBMC Power node. After importing the node definitions, the Controller VM and 2 Power nodes will introspect correctly. The x86_64 UEFI node will not. Version-Release number of selected component (if applicable): OpenStack 16.2 (RHOS-16.2-RHEL-8-20210614.n.1) openstack-ironic-python-agent-builder-2.8.0-2.20210529034815.b133d4d.el8ost.2.noarch puppet-ironic-15.5.0-2.20210601011633.d553541.el8ost.2.noarch puppet-nova-15.8.0-2.20210601013941.99789e3.el8ost.2.noarch python3-ironicclient-3.1.2-2.20210528013403.1220d76.el8ost.1.noarch python3-ironic-inspector-client-3.7.1-2.20210528020511.3a41127.el8ost.1.noarch python3-novaclient-15.1.1-2.20210528065428.79959ab.el8ost.1.noarch rhosp-director-images-all-16.2-20210614.1.el8ost.noarch rhosp-director-images-base-16.2-20210614.1.el8ost.noarch rhosp-director-images-ipa-ppc64le-16.2-20210614.1.el8ost.noarch rhosp-director-images-ipa-x86_64-16.2-20210614.1.el8ost.noarch rhosp-director-images-metadata-16.2-20210614.1.el8ost.noarch rhosp-director-images-minimal-16.2-20210614.1.el8ost.noarch rhosp-director-images-ppc64le-16.2-20210614.1.el8ost.noarch rhosp-director-images-x86_64-16.2-20210614.1.el8ost.noarch How reproducible: always Steps to Reproduce: 1. deploy OSP undercloud, with ipxe_enabled=False 2. upload x86_64 and ppc64le images a. openstack overcloud image upload --image-path ~/images/ppc64le --architecture ppc64le \ --whole-disk --http-boot /var/lib/ironic/tftpboot/ppc64le b. openstack overcloud image upload --image-path ~/images/x86_64 --architecture x86_64 \ --http-boot /var/lib/ironic/tftpboot 3. import node definitions from .json file openstack overcloud node import --http-boot /var/lib/ironic/tftpboot ~/nodes.json 3a. if the .json file for nodes definition did not already have "boot_interface" defined, set boot interface for nodes openstack baremetal node set --boot-interface ... 4. set OpenBMC node ipmi option openstack baremetal node set {{ name }} --driver-info ipmi_disable_boot_timeout=False 5. create custom traits for nodes openstack --os-placement-api-version 1.6 trait create [trait] for traits: - CUSTOM_HW_CPU_PPC64LE_POWER8 - CUSTOM_HW_CPU_PPC64LE_POWER9 - CUSTOM_HW_CONTROLLER 6. Import introspection rules openstack baremetal introspection rule import ~stack/introspection-rules.json 7. Run introspection against nodes openstack overcloud node introspect [node name] ran them individually to watch the status/output of each from their consoles Actual results: Controller (VM), Power/ppc64le compute nodes ran introspection sucessfully (all set as "pxe"). x86_64 UEFI compute node fails to even get a network boot when it requests it (configured for ipxe). Expected results: All nodes, regardless of being configured as pxe or ipxe, should run introspection sucessfully. Would expect the overcloud deploy later should also install the nodes sucessfully as well (have not reached that point in this configuration) Additional info: I had previously tried setting up these same systems with "ipxe_enabled: True". Under that configuration both the pxe Controller and ipxe x86_64 Compute nodes were able to run introspection correctly. The Power8 & Power9 nodes both required providing an alternative TFTP URL to boot the introspection (PXE Autoconfiguration failed) and neither could run the overcloud install. Apparently the "ipxe_enabled" is supposed to still be set as "False" for these configurations, but need the UEFI/iPXE boot to work for x86_64, as a customer with mixed x86_64 and ppc64le Compute nodes has new x86_64 systems that can only do UEFI/iPXE boot.
Created attachment 1791853 [details] Nodes definition file for import
Created attachment 1791854 [details] additional introspection rules
Created attachment 1791855 [details] sample log file for dnsmasq
Created attachment 1791881 [details] "baremetal node show" for Power9 compute node baremetal node information for one of the Power/ppc64le nodes, after introspection (json format)
Created attachment 1791882 [details] "baremetal node show" for x86_64 UEFI compute node undercloud information for x86_64 UEFI compute node. Node was imported but introspection failed to run against the node. Sample from earlier undercloud deploy in next file.
Created attachment 1791883 [details] "baremetal node show" for x86_64 UEFI node (earlier run w/different parameters) In an earlier test, I had set "ipxe_enable = True". Under this setting the x86_64 UEFI node was able to run introspection, but the Power nodes needed manual intervention, and could not install the overcloud. This is the x86_64 output from "openstack baremetal node show" that shows what should be discovered in introspection (but this is from an undercloud that doesn't work with Power systems). Supplied for comparison purposes, if it would be useful.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1001