Description of problem: The upgrade_tasks_step1.yaml playbook is executed during the Host System upgrade (from RHEL 8.4 to 9.2) and It fails on the first task called "Set noout flag": - - name: Set noout flag shell: "cephadm shell -- ceph osd set {{ item }}" become: true with_items: - noout - norecover - nobackfill - norebalance - nodeep-scrub delegate_to: "{{ ceph_mon_short_bootstrap_node_name }}" https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/deployment/cephadm/ceph-osd.yaml#L109 Because the task is delegated to "ceph_mon_short_bootstrap_node_name" which points to one of the controllers which is not included in used inventory for DCN stack and I assume It would set the flags on the central ceph cluster anyway which is pointless when the DCN site has a different ceph cluster. Moreover I assume the command cephadm shell -- ceph osd set {{ item }} would fail anyway because It would not find the ceph cluster credentials. So there are two problems need to be fixed: 1. select the right ceph_mon node in the delegation 2. select the right cluster assuming we solve step 1 The command cephadm shell -- ceph osd set {{ item }} should be able to find the right ceph cluster and look like something like: cephadm --fsid {{ tripleo_cephadm_fsid }} -c /etc/ceph/{{ tripleo_cephadm_cluster }}.conf -k /etc/ceph/{{ tripleo_cephadm_cluster }}.client.{{ select_keyring| default('admin') }}.keyring shell -- ceph osd set <flag> Version-Release number of selected component (if applicable): openstack-tripleo-common-containers-15.4.1-17.1.20230927010819.el9ost.noarch puppet-tripleo-14.2.3-17.1.20231102190827.40278e1.el9ost.noarch ansible-tripleo-ipsec-11.0.1-17.1.20230620172008.b5559c8.el9ost.noarch ansible-tripleo-ipa-0.3.1-17.1.20230627190951.8d29d9e.el9ost.noarch ansible-role-tripleo-modify-image-1.5.1-17.1.20230621064242.b6eedb6.el9ost.noarch python3-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch openstack-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch tripleo-ansible-3.3.1-17.1.20231101230823.4d015bf.el9ost.noarch openstack-tripleo-heat-templates-14.3.1-17.1.20231103010823.el9ost.noarch openstack-tripleo-validations-14.3.2-17.1.20231026020815.2b526f8.el9ost.noarch python3-tripleoclient-16.5.1-17.1.20230927000827.f3599d0.el9ost.noarch openstack-tripleo-image-elements-13.1.3-17.1.20230621111410.a641940.el9ost.noarch openstack-tripleo-puppet-elements-14.1.3-17.1.20230810141019.b4e0cbd.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Execute the Host system upgrade of HCI compute nodes of DCN env during the FFU procedure.
removed needinfo as John answered it in comment3
Thanks Erin for the Doc text update, it looks good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2736