Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2272946

Summary: tripleo_ceph_client role is not applied when allovercloud,undercloud is used as a limit
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: tripleo-ansibleAssignee: Manoj Katari <mkatari>
Status: CLOSED ERRATA QA Contact: Alfredo <alfrgarc>
Severity: medium Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: enothen, fpantano, johfulto, jpretori, mariel, mkatari, pkomarov, ramishra
Target Milestone: z4Keywords: Triaged
Target Release: 17.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: tripleo-ansible-3.3.1-17.1.20240502120759.8debef3.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-21 09:40:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2024-04-03 12:23:51 UTC
Description of problem:

This was originally reported by a customer running the following command during FFU procedure:

openstack overcloud upgrade run --yes --stack <stack> --debug --limit allovercloud,undercloud --playbook all

https://access.redhat.com/documentation/de-de/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-rhosp-on-all-nodes-in-each-stack_overcloud-upgrade


The problem is that "allovercloud" is group name, but it is treated as string/host name by the following play:
https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_ceph_client/tasks/effective_clients_limit.yml#L20-L24

As a result, tripleo_ceph_client_include created on top of tripleo_ceph_client_limit_list is the following list:
tripleo_ceph_client_include: ['allovercloud', 'undercloud']

Since 'allovercloud' is treated as string, then ansible is unable to find valid intersection between list of Ceph client hosts and 'allovercloud' when building tripleo_ceph_client_effective_clients


Version-Release number of selected component (if applicable):
RHOSP 17.1

How reproducible:
Steps and analysis were provided in description


Actual results:
Ceph client play is not applied on valid overcloud hosts

Expected results:
From python-tripleoclient perspective --limit argument should contain list of hosts, so other recommendations in the document may be incorrect. But if we will decide to preserve it, then Ceph client configuration should be applied on all groups['ceph_client'] hosts.

Comment 3 Manoj Katari 2024-04-05 11:53:55 UTC
Thanks @fpantano for your inputs.

As the upgrade command uses `--limit allovercloud,undercloud`, i think ansible_limit is generated as 'allovercloud,undercloud' where as the code in [1] expects it as  'listofnodes_in_overcloud, undercloud' , so the tripleo_ceph_client_effective_clients generated in L55  will result in empty list.

@john, We need to review [1] and decide if the fix is needed in the code or upgrade doc.

[1] https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_ceph_client/tasks/effective_clients_limit.yml#L22

Comment 4 Eric Nothen 2024-04-08 11:06:52 UTC
The customer is currently working this issue around by passing an explicit list of hostnames to the --limit parameter. This works in the smaller test/dev clusters, and should work as well in production, unless they hit a cli limit preventing them from explicitly passing ~200 FQDNs. 

FWIW, I have tested the openstack overcloud upgrade command with >2000 FQDNs as the value of --limit and the command runs (it fails down the road because I don't actually have 2k overcloud nodes, but it does seem to run so it's not an issue for bash).

Comment 5 John Fulton 2024-04-08 21:59:35 UTC
If the upgrade is run with "CephConfigPath: /etc/ceph", as suggested in the docs section 3.1, I expect the workaround from comment #4 will be unnecessary. See /etc/ceph/ on the compute nodes to confirm if the keys are already present.

This is what I think is happening:

1. The system has "CephConfigPath: /var/lib/tripleo-config/ceph" and that path is empty so the keys appear to be missing. The director is creating new versions of the keys and trying to copy them to compute nodes.

2. Per docs section 8.1, "--limit allovercloud,undercloud" is passed which effectively results in keys not getting copied to any computes.

I suspect this problem never presented itself as a bug in our testing because we used "CephConfigPath: /etc/ceph". I.e. the ceph client role computed an empty list of hosts but it didn't matter since the keys were already there and the upgrade could continue.

[3.1] https://access.redhat.com/documentation/de-de/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#updating-ceph-client-configuration-for-rhosp-171-external-ceph-deployments

[8.1] https://access.redhat.com/documentation/de-de/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-rhosp-on-all-nodes-in-each-stack_overcloud-upgrade

Comment 9 John Fulton 2024-04-09 13:04:54 UTC
This is a bug. The documentation tells you to limit by allovercloud,undercloud (two groups) and the sync.yml tasks in tripleo_ceph_client can only handle hostnames in the limit.

This bug shouldn't block an upgrade if "CephConfigPath: /etc/ceph" is set as documented. However, because its producing unintentional behavior (a task limited to empty set) we will update the tripleo_ceph_client role to handle when "--limit allovercloud,undercloud" is passed.

Comment 23 errata-xmlrpc 2024-11-21 09:40:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:9974