Description of problem: ceph_spec_bootstrap.py generates a faulty ceph_spec.yaml when using roles_data.yaml file that contains some roles that don't exist in the overcloud-baremetal-deploy.yaml file. Version-Release number of selected component (if applicable): tripleo wallaby openstack-tripleo-common.noarch 15.4.1-0.20230925193723.e5b18f2.el9 openstack-tripleo-common-containers.noarch 15.4.1-0.20230925193723.e5b18f2.el9 openstack-tripleo-heat-templates.noarch 14.3.1-0.20231102121515.e7c7ce3.el9 python3-tripleoclient.noarch 16.5.1-0.20230926092026.f3599d0.el9 @delorean_wallaby_all_04032024 How reproducible: When using roles_data.yaml file that contains roles that don't exist in the overcloud-baremetal-deployed.yaml file Steps to Reproduce: 1. Define roles_data.yaml with multiple roles. E.g. Controller, Compute, Storage, ComputeDummy, StorageDummy. 2. Define the baremetal provision configuration file to define what roles will be deployed: overcloud-baremetal-deploy.yaml. Leave some of the roles defined in (1) out of the overcloud-baremetal-deploy.yaml file. E.g. define only Controller, Compute, StorageDummy. Number of counts for each role doesn't matter. 3. Provision baremetal nodes via "openstack overcloud node provision -output ~/overcloud-baremetal-deployed.yaml ~/overcloud-baremetal-deploy.yaml ... " 4. Generate a ceph_spec via: "openstack overcloud ceph spec \ --roles-data /home/stack/templates/roles_data.yaml \ -o /home/stack/ceph_spec.yaml \ /home/stack/templates/overcloud-baremetal-deployed.yaml" Actual results: Wrong labeling of ceph services in ceph_spec.yaml file. Expected results: Correct labels according to roles_data.yaml service list for the provisioned nodes defined in overcloud-baremetal-deployed.yaml Additional info: Performed a local patch on ceph_spec_bootstrap.py that solved the issue for me
Created attachment 2039464 [details] Assign roles_to_hosts[role] with matching_hosts only if role was found in metalsmith_data_file I don't know how to create a PR for this as I didn't manage to fork the following repository: https://github.com/openstack-archive/tripleo-ansible
The issue you mentioned will be seen if the baremetal file is missing some role info when compared to roles_data. bootstrap module is designed considering the fact that every role information in baremetal file will always match with roles_data.yaml as the metal file is generated using the steps below 1. generate roles_data using the list of custom roles openstack overcloud roles generate -o /home/stack/composable_roles/roles/roles_data.yaml \ ControllerStorageNfs \ CephStorage \ Compute \ 2. extract the provisioned and generate overcloud-baremetal-deploy.yaml from roles_data openstack overcloud node extract provisioned --stack overcloud --roles-file /home/stack/templates/roles_data.yaml --output /home/stack/templates/overcloud-baremetal-deploy.yaml 3. provision and generate the final baremetal file which will be used by the module. openstack overcloud node provision -y --network-config --templates /home/stack/templates --output /home/stack/templates/overcloud-baremetal-deployed.yaml /home/stack/templates/overcloud-baremetal-deploy.yaml I won't consider it as a bug but your patch seems like a improvement to the module which can avoid incorrect spec (irrespective of how baremetal file is generated), so +1 for the patch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974