In order to replace or scale ceph monitors the ceph-ansible fetch_directory [1] from the original ceph cluster deployment must exist and be referenced by ceph-ansible during the ceph-ansible run to scale up or replace a monitor. If it does not exist then you will encounter bugs like bz 1600202 and bz 1548026.
It would be better if ceph-ansible could retrieve its state information from an existing ceph deployment and then act accordingly without needing to determine state information from the fetch_directory. This is a request for ceph-ansible to be able to handle monitor scale up or replacement without a fetch_directory.
OSPd (TripleO) works around this bug by backing up the fetch_directory in swift and restoring it if necessary before running subsequent ceph-ansible operations and implementation details are linked from bz 1548026 and bz 1613847. However, in a future versions of TripleO there is a plan for the Swift service on the undercloud to be removed but this bug will be a blocker for that.
[1] https://github.com/ceph/ceph-ansible/blob/b7b8aba47bcf92d5c972dea37677205fa5f7b4a4/roles/ceph-config/tasks/main.yml#L30
Comment 11Giridhar Ramaraju
2019-08-05 13:11:01 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate.
Regards,
Giri
Comment 12Giridhar Ramaraju
2019-08-05 13:12:02 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate.
Regards,
Giri
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:0312