Bug 1886120

Summary: ceph orch host rm <host> is not stopping the services deployed in the respective removed hosts
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Preethi <pnataraj>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED ERRATA QA Contact: Rahul Lepakshi <rlepaksh>
Severity: high Docs Contact: Ranjini M N <rmandyam>
Priority: high    
Version: 5.0CC: adking, jolmomar, rmandyam, sunnagar, vereddy
Target Milestone: ---   
Target Release: 5.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.6-1.el8cp Doc Type: Enhancement
Doc Text:
.The `ceph orch host drain` command is now available to remove the hosts from the storage cluster Previously, the `ceph orch host rm _HOSTNAME_` command would not remove the Ceph daemons in the host of the {storage-product} cluster. It would stop managing the host and the remaining daemons on the host would end up marked stray. With this release, the `ceph orch host rm _HOSTNAME_` command provides the following output: + .Example ---- [ceph: root@host01 /]# ceph orch host rm host02 Error EINVAL: Not allowed to remove host02 from cluster. The following daemons are running in the host: type                 id             -------------------- --------------- mon                  host02           node-exporter        host02           crash                host02           Please run 'ceph orch host drain host02' to remove daemons from host ---- You can remove the Ceph daemons from the host with the `ceph orch host drain _HOSTNAME_` command which applies the `_no_schedule` label and causes all the daemons to be removed from the host, outside of “unmanaged” services. Those must be removed manually with the `ceph orch daemon rm _DAEMON_NAME_` command. The OSDs are drained before they can be removed. You can check the status of the osd removal with `ceph orch osd rm status` command. Once all daemons are removed from the host, `ceph orch host rm _HOSTNAME_` should be successful and the host no longer is part of the storage cluster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-04 10:19:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959686    

Description Preethi 2020-10-07 17:00:51 UTC
Description of problem:
ceph orch host rm <host> is not stopping the services deployed in the respective removed hosts

Version-Release number of selected component (if applicable):

Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-83100-20200929173915
ceph version 16.0.0-5974.el8cp (8135ff644ca71c2ae8c2d37db20a75166cdb15ef) pacific (dev)


issue is filed in the upstream tracker:
https://tracker.ceph.com/issues/47782


How reproducible:


Steps to Reproduce:

cluster with more than one mon nodes
and with at least 1 daemon (osd) running in everyhost
ceph orch host rm <host>
ceph ceph orch ps | grep <host>
this command does not provide any output, but however
if you check the containers running in the removed host they remain started
if you check ceph -s ,... no changes respect the situation before removing the hosts

you can also check with systemctl commands which shows services are in active state though hosts are removed from cluster.

Actual results:


Expected results:


 Additional info:

Comment 1 Veera Raghava Reddy 2020-12-02 16:43:17 UTC
Hi Juan,
Similar BZs are being reported by customers for Ceph Ansible for cleaning the nodes after cluster purge.
For CephADM we would like to have a good experience with removing of cluster nodes using CephADM/orch in 5.0 release.

Comment 2 Juan Miguel Olmo 2020-12-11 09:55:09 UTC
We are not able to do this kind of operation in 5.0.
In order to remove properly a host you need to manually modify the services (ceph orch apply ...) with daemons running in the hosts, in order to remove this daemons.

Once we do not have daemons running in the host  you can use the "ceph orch rm command"

Comment 3 Juan Miguel Olmo 2020-12-16 10:39:59 UTC
Decision taken:

For 5.0
We are going to return an error message if the user tries to remove a host with ceph daemons running:

Example:
# ceph orch host rm testhost
Operation not allowed (daemons must be removed first). <testhost> has the following daemons running <mon, mgr, osd>. 


For 5.x

Usability will be improved when we implement the command "drain":
This command will remove the daemons running in one host.
https://tracker.ceph.com/issues/48624

Comment 4 Preethi 2021-01-06 07:14:56 UTC
@Juan, "Ceph orch rm <service id> will remove service only after services are stopped using ceph orch stop <service> is applied. We need to improve on usability here i feel. Also, for purging/rm MON, OSD, MGR etc we use ceph orch apply option to redeploy. We should make use of remove options instead of orch apply again is what i feel.

Comment 5 Juan Miguel Olmo 2021-03-05 12:03:23 UTC
In this version 5.0 ( and i do not if it will be possible in 5.1) the removal of hosts with ceph daemons runnig is not going to be allowed.

https://github.com/ceph/ceph/pull/39850

Comment 6 Sebastian Wagner 2021-05-31 12:44:45 UTC
*** Bug 1930341 has been marked as a duplicate of this bug. ***

Comment 9 Juan Miguel Olmo 2021-06-21 07:21:14 UTC
Doc text: LGTM

Comment 10 Daniel Pivonka 2021-06-28 14:26:51 UTC
https://github.com/ceph/ceph/pull/39850 was closed and replaced with https://github.com/ceph/ceph/pull/42017

Comment 11 Daniel Pivonka 2021-06-28 14:29:36 UTC
upstream trackers: https://tracker.ceph.com/issues/49622  https://tracker.ceph.com/issues/48624

Comment 12 Daniel Pivonka 2021-08-24 16:41:50 UTC
backported to pacific https://github.com/ceph/ceph/pull/42736

Comment 18 Daniel Pivonka 2021-11-11 23:53:24 UTC
'ceph orch osd rm <id>' is run as part of 'ceph orch host drain <host>' if you check 'ceph orch osd rm status' is should show that those osds are trying to be removed.

The problem with your cluster though is you only have 3 hosts and your replication count is 3. So you can not remove those osds because without them data can not be replicated 3 times. 
The operation your trying to do is not safe hence why its not happening.

Comment 30 errata-xmlrpc 2022-04-04 10:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174