Bug 2272946
| Summary: | tripleo_ceph_client role is not applied when allovercloud,undercloud is used as a limit | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alex Stupnikov <astupnik> |
| Component: | tripleo-ansible | Assignee: | Manoj Katari <mkatari> |
| Status: | CLOSED ERRATA | QA Contact: | Alfredo <alfrgarc> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.1 (Wallaby) | CC: | enothen, fpantano, johfulto, jpretori, mariel, mkatari, pkomarov, ramishra |
| Target Milestone: | z4 | Keywords: | Triaged |
| Target Release: | 17.1 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | tripleo-ansible-3.3.1-17.1.20240502120759.8debef3.el9ost | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-11-21 09:40:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Alex Stupnikov
2024-04-03 12:23:51 UTC
Thanks @fpantano for your inputs. As the upgrade command uses `--limit allovercloud,undercloud`, i think ansible_limit is generated as 'allovercloud,undercloud' where as the code in [1] expects it as 'listofnodes_in_overcloud, undercloud' , so the tripleo_ceph_client_effective_clients generated in L55 will result in empty list. @john, We need to review [1] and decide if the fix is needed in the code or upgrade doc. [1] https://github.com/openstack-archive/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_ceph_client/tasks/effective_clients_limit.yml#L22 The customer is currently working this issue around by passing an explicit list of hostnames to the --limit parameter. This works in the smaller test/dev clusters, and should work as well in production, unless they hit a cli limit preventing them from explicitly passing ~200 FQDNs. FWIW, I have tested the openstack overcloud upgrade command with >2000 FQDNs as the value of --limit and the command runs (it fails down the road because I don't actually have 2k overcloud nodes, but it does seem to run so it's not an issue for bash). If the upgrade is run with "CephConfigPath: /etc/ceph", as suggested in the docs section 3.1, I expect the workaround from comment #4 will be unnecessary. See /etc/ceph/ on the compute nodes to confirm if the keys are already present. This is what I think is happening: 1. The system has "CephConfigPath: /var/lib/tripleo-config/ceph" and that path is empty so the keys appear to be missing. The director is creating new versions of the keys and trying to copy them to compute nodes. 2. Per docs section 8.1, "--limit allovercloud,undercloud" is passed which effectively results in keys not getting copied to any computes. I suspect this problem never presented itself as a bug in our testing because we used "CephConfigPath: /etc/ceph". I.e. the ceph client role computed an empty list of hosts but it didn't matter since the keys were already there and the upgrade could continue. [3.1] https://access.redhat.com/documentation/de-de/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#updating-ceph-client-configuration-for-rhosp-171-external-ceph-deployments [8.1] https://access.redhat.com/documentation/de-de/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-rhosp-on-all-nodes-in-each-stack_overcloud-upgrade This is a bug. The documentation tells you to limit by allovercloud,undercloud (two groups) and the sync.yml tasks in tripleo_ceph_client can only handle hostnames in the limit. This bug shouldn't block an upgrade if "CephConfigPath: /etc/ceph" is set as documented. However, because its producing unintentional behavior (a task limited to empty set) we will update the tripleo_ceph_client role to handle when "--limit allovercloud,undercloud" is passed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974 |