DescriptionFrancesco Pantano
2023-07-20 13:54:07 UTC
Description of problem:
During the FFU from 16.2 to 17.1, when RGW is deployed as part of Director
deployed ceph, the procedure fails on the next stack update.
In particular, haproxy-bundle is not able to start via pacemaker due to a
failure that occurs when it tries to bind to the rgw port (8080).
After digging into the existing environment, we've seen that rgw has not
been redeployed on the storage network, and it's bound on *.
The resulting spec gathered from the adopted cluster shows:
---
service_type: rgw
service_id: controller-0
service_name: rgw.controller-0
placement:
count_per_host: 1
hosts:
- controller-0
spec:
rgw_frontend_port: 8080
---
service_type: rgw
service_id: controller-1
service_name: rgw.controller-1
placement:
count_per_host: 1
hosts:
- controller-1
spec:
rgw_frontend_port: 8080
---
service_type: rgw
service_id: controller-2
service_name: rgw.controller-2
placement:
count_per_host: 1
hosts:
- controller-2
spec:
rgw_frontend_port: 8080
while the Director uses to build RGW as follows:
---
service_type: rgw
service_id: rgw
service_name: rgw.rgw
placement:
hosts:
- controller-0
- controller-1
- controller-2
networks:
- 172.17.3.0/24
spec:
rgw_frontend_port: 8080
rgw_realm: default
rgw_zone: default
Apparently, the code responsible for the rgw adoption is [1] and should handle
the fact that the three rgw instances were bound to the storage network.
The failure has been observed in the job [2] that can be used to build a reproducer.
[1] https://github.com/ceph/ceph-ansible/blob/main/infrastructure-playbooks/cephadm-adopt.yml#L952
[2] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Upgrades/job/DFG-storage-ffu-17.1-from-16.2-passed_phase2-3cont_2comp_3ceph-ipv4-ovn_dvr-ceph-nfs-ganesha/
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:4760