Before this update, during a DCN FFU system upgrade of nodes on the setup with multiple stacks, the Red Hat Ceph Storage task `Set noout flag` might fail to run the ceph command on the right host.
+
After the update, a system upgrade on any node in a multi-stack setup now delegates the Red Hat Ceph Storage task `Set noout flag` to the relevant host, and the `ceph` commands are run on the specific cluster.
DescriptionMarian Krcmarik
2023-12-11 16:27:51 UTC
Description of problem:
The upgrade_tasks_step1.yaml playbook is executed during the Host System upgrade (from RHEL 8.4 to 9.2) and It fails on the first task called "Set noout flag":
- - name: Set noout flag
shell: "cephadm shell -- ceph osd set {{ item }}"
become: true
with_items:
- noout
- norecover
- nobackfill
- norebalance
- nodeep-scrub
delegate_to: "{{ ceph_mon_short_bootstrap_node_name }}"
https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/deployment/cephadm/ceph-osd.yaml#L109
Because the task is delegated to "ceph_mon_short_bootstrap_node_name" which points to one of the controllers which is not included in used inventory for DCN stack and I assume It would set the flags on the central ceph cluster anyway which is pointless when the DCN site has a different ceph cluster.
Moreover I assume the command cephadm shell -- ceph osd set {{ item }} would fail anyway because It would not find the ceph cluster credentials.
So there are two problems need to be fixed:
1. select the right ceph_mon node in the delegation
2. select the right cluster assuming we solve step 1
The command cephadm shell -- ceph osd set {{ item }} should be able to find the right ceph cluster and look like something like:
cephadm --fsid {{ tripleo_cephadm_fsid }} -c /etc/ceph/{{ tripleo_cephadm_cluster }}.conf -k /etc/ceph/{{ tripleo_cephadm_cluster }}.client.{{ select_keyring| default('admin') }}.keyring shell -- ceph osd set <flag>
Version-Release number of selected component (if applicable):
openstack-tripleo-common-containers-15.4.1-17.1.20230927010819.el9ost.noarch
puppet-tripleo-14.2.3-17.1.20231102190827.40278e1.el9ost.noarch
ansible-tripleo-ipsec-11.0.1-17.1.20230620172008.b5559c8.el9ost.noarch
ansible-tripleo-ipa-0.3.1-17.1.20230627190951.8d29d9e.el9ost.noarch
ansible-role-tripleo-modify-image-1.5.1-17.1.20230621064242.b6eedb6.el9ost.noarch
python3-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch
openstack-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch
tripleo-ansible-3.3.1-17.1.20231101230823.4d015bf.el9ost.noarch
openstack-tripleo-heat-templates-14.3.1-17.1.20231103010823.el9ost.noarch
openstack-tripleo-validations-14.3.2-17.1.20231026020815.2b526f8.el9ost.noarch
python3-tripleoclient-16.5.1-17.1.20230927000827.f3599d0.el9ost.noarch
openstack-tripleo-image-elements-13.1.3-17.1.20230621111410.a641940.el9ost.noarch
openstack-tripleo-puppet-elements-14.1.3-17.1.20230810141019.b4e0cbd.el9ost.noarch
How reproducible:
Always
Steps to Reproduce:
1. Execute the Host system upgrade of HCI compute nodes of DCN env during the FFU procedure.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: openstack-tripleo-heat-templates and tripleo-ansible update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:2736
Description of problem: The upgrade_tasks_step1.yaml playbook is executed during the Host System upgrade (from RHEL 8.4 to 9.2) and It fails on the first task called "Set noout flag": - - name: Set noout flag shell: "cephadm shell -- ceph osd set {{ item }}" become: true with_items: - noout - norecover - nobackfill - norebalance - nodeep-scrub delegate_to: "{{ ceph_mon_short_bootstrap_node_name }}" https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/wallaby/deployment/cephadm/ceph-osd.yaml#L109 Because the task is delegated to "ceph_mon_short_bootstrap_node_name" which points to one of the controllers which is not included in used inventory for DCN stack and I assume It would set the flags on the central ceph cluster anyway which is pointless when the DCN site has a different ceph cluster. Moreover I assume the command cephadm shell -- ceph osd set {{ item }} would fail anyway because It would not find the ceph cluster credentials. So there are two problems need to be fixed: 1. select the right ceph_mon node in the delegation 2. select the right cluster assuming we solve step 1 The command cephadm shell -- ceph osd set {{ item }} should be able to find the right ceph cluster and look like something like: cephadm --fsid {{ tripleo_cephadm_fsid }} -c /etc/ceph/{{ tripleo_cephadm_cluster }}.conf -k /etc/ceph/{{ tripleo_cephadm_cluster }}.client.{{ select_keyring| default('admin') }}.keyring shell -- ceph osd set <flag> Version-Release number of selected component (if applicable): openstack-tripleo-common-containers-15.4.1-17.1.20230927010819.el9ost.noarch puppet-tripleo-14.2.3-17.1.20231102190827.40278e1.el9ost.noarch ansible-tripleo-ipsec-11.0.1-17.1.20230620172008.b5559c8.el9ost.noarch ansible-tripleo-ipa-0.3.1-17.1.20230627190951.8d29d9e.el9ost.noarch ansible-role-tripleo-modify-image-1.5.1-17.1.20230621064242.b6eedb6.el9ost.noarch python3-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch openstack-tripleo-common-15.4.1-17.1.20230927010819.el9ost.noarch tripleo-ansible-3.3.1-17.1.20231101230823.4d015bf.el9ost.noarch openstack-tripleo-heat-templates-14.3.1-17.1.20231103010823.el9ost.noarch openstack-tripleo-validations-14.3.2-17.1.20231026020815.2b526f8.el9ost.noarch python3-tripleoclient-16.5.1-17.1.20230927000827.f3599d0.el9ost.noarch openstack-tripleo-image-elements-13.1.3-17.1.20230621111410.a641940.el9ost.noarch openstack-tripleo-puppet-elements-14.1.3-17.1.20230810141019.b4e0cbd.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Execute the Host system upgrade of HCI compute nodes of DCN env during the FFU procedure.