There are cases where someone might want to be able to run `ceph orch host drain` and have the contents of /etc/ceph/ not be deleted. This is similar to the following PR but it would be nice to have a flag so it's user controllable since what's below doesn't address a use case that happens in RH OpenStack 17. https://github.com/ceph/ceph/pull/45174 In Red Hat OpenStack 17 we have a controller replacement procedure which involves draining the Ceph mon/mgr/rgw/mds daemons from a node before we shut the node down and replace it with a new node. We automate the testing of that precuedrewith this ansible playbook: https://review.gerrithub.io/c/rhos-infra/cloud-config/+/547322/1/post_tasks/roles/replace-controller/tasks/remove_ceph_monitor.yml#160 Note the steps we have to take to not loose the key and conf: - name: drain ceph daemons on host being removed shell: | cp /etc/ceph/ceph.conf /home/tripleo-admin cp /etc/ceph/ceph.client.admin.keyring /home/tripleo-admin cephadm shell ceph orch host drain {{ install.controller.to.remove }} cp /home/tripleo-admin/ceph.conf /etc/ceph cp /home/tripleo-admin/ceph.client.admin.keyring /etc/ceph rm /home/tripleo-admin/ceph.conf rm /home/tripleo-admin/ceph.client.admin.keyring delegate_to: "{{ install.controller.to.remove }}" when: - rc_controller_is_reachable - '"No daemons reported" not in ceph_daemon_status_predrain.stdout' If we don't do the above workaround to back up the ceph conf and keyring and restore them then we have an error in our procedure because we can't run anymore cephadm commands on that node. "When doing the procedure manually I went into cephadm shell and then executed the host drain command. Worked with no problem. When doing the automation it was all one step i.e. automation executes the command: cephadm shell ceph orch host drain. After that command is executed all ceph commands failed with error: ObjectNotFound('RADOS object not found. After looking at it for a long time I saw the /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring were being deleted. After that I saved them, let them be deleted, and then replaced them. The automation worked after that. If you know of a way to do the ceph drain without it deleting those files then I would do it. I could not find a way." This bug requests that the above workaround not be necessary.
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
(In reply to John Fulton from comment #0) > If we don't do the above workaround to back up the ceph conf and keyring and > restore them then we have an error in our procedure because we can't run > anymore cephadm commands on that node. ... > This bug requests that the above workaround not be necessary. Update: we avoid having to back up and restore the files be executing all subsequent cephadm commands on a different node; i.e. drain is the last cephadm command on that node. Regardless, it would be nice to have an option to not delete /etc/ceph (deleting files in /etc after removing a daemon violates principle of least surprise IMHO).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780