Description of problem: ceph orch host rm <host> is not stopping the services deployed in the respective removed hosts Version-Release number of selected component (if applicable): Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-83100-20200929173915 ceph version 16.0.0-5974.el8cp (8135ff644ca71c2ae8c2d37db20a75166cdb15ef) pacific (dev) issue is filed in the upstream tracker: https://tracker.ceph.com/issues/47782 How reproducible: Steps to Reproduce: cluster with more than one mon nodes and with at least 1 daemon (osd) running in everyhost ceph orch host rm <host> ceph ceph orch ps | grep <host> this command does not provide any output, but however if you check the containers running in the removed host they remain started if you check ceph -s ,... no changes respect the situation before removing the hosts you can also check with systemctl commands which shows services are in active state though hosts are removed from cluster. Actual results: Expected results: Additional info:
Hi Juan, Similar BZs are being reported by customers for Ceph Ansible for cleaning the nodes after cluster purge. For CephADM we would like to have a good experience with removing of cluster nodes using CephADM/orch in 5.0 release.
We are not able to do this kind of operation in 5.0. In order to remove properly a host you need to manually modify the services (ceph orch apply ...) with daemons running in the hosts, in order to remove this daemons. Once we do not have daemons running in the host you can use the "ceph orch rm command"
Decision taken: For 5.0 We are going to return an error message if the user tries to remove a host with ceph daemons running: Example: # ceph orch host rm testhost Operation not allowed (daemons must be removed first). <testhost> has the following daemons running <mon, mgr, osd>. For 5.x Usability will be improved when we implement the command "drain": This command will remove the daemons running in one host. https://tracker.ceph.com/issues/48624
@Juan, "Ceph orch rm <service id> will remove service only after services are stopped using ceph orch stop <service> is applied. We need to improve on usability here i feel. Also, for purging/rm MON, OSD, MGR etc we use ceph orch apply option to redeploy. We should make use of remove options instead of orch apply again is what i feel.
In this version 5.0 ( and i do not if it will be possible in 5.1) the removal of hosts with ceph daemons runnig is not going to be allowed. https://github.com/ceph/ceph/pull/39850
*** Bug 1930341 has been marked as a duplicate of this bug. ***
Doc text: LGTM
https://github.com/ceph/ceph/pull/39850 was closed and replaced with https://github.com/ceph/ceph/pull/42017
upstream trackers: https://tracker.ceph.com/issues/49622 https://tracker.ceph.com/issues/48624
backported to pacific https://github.com/ceph/ceph/pull/42736
'ceph orch osd rm <id>' is run as part of 'ceph orch host drain <host>' if you check 'ceph orch osd rm status' is should show that those osds are trying to be removed. The problem with your cluster though is you only have 3 hosts and your replication count is 3. So you can not remove those osds because without them data can not be replicated 3 times. The operation your trying to do is not safe hence why its not happening.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174