Bug 1886120 - ceph orch host rm <host> is not stopping the services deployed in the respective removed hosts
Summary: ceph orch host rm <host> is not stopping the services deployed in the respect...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 5.1
Assignee: Adam King
QA Contact: Rahul Lepakshi
Ranjini M N
URL:
Whiteboard:
: 1930341 (view as bug list)
Depends On:
Blocks: 1959686
TreeView+ depends on / blocked
 
Reported: 2020-10-07 17:00 UTC by Preethi
Modified: 2022-04-04 10:20 UTC (History)
5 users (show)

Fixed In Version: ceph-16.2.6-1.el8cp
Doc Type: Enhancement
Doc Text:
.The `ceph orch host drain` command is now available to remove the hosts from the storage cluster Previously, the `ceph orch host rm _HOSTNAME_` command would not remove the Ceph daemons in the host of the {storage-product} cluster. It would stop managing the host and the remaining daemons on the host would end up marked stray. With this release, the `ceph orch host rm _HOSTNAME_` command provides the following output: + .Example ---- [ceph: root@host01 /]# ceph orch host rm host02 Error EINVAL: Not allowed to remove host02 from cluster. The following daemons are running in the host: type                 id             -------------------- --------------- mon                  host02           node-exporter        host02           crash                host02           Please run 'ceph orch host drain host02' to remove daemons from host ---- You can remove the Ceph daemons from the host with the `ceph orch host drain _HOSTNAME_` command which applies the `_no_schedule` label and causes all the daemons to be removed from the host, outside of “unmanaged” services. Those must be removed manually with the `ceph orch daemon rm _DAEMON_NAME_` command. The OSDs are drained before they can be removed. You can check the status of the osd removal with `ceph orch osd rm status` command. Once all daemons are removed from the host, `ceph orch host rm _HOSTNAME_` should be successful and the host no longer is part of the storage cluster.
Clone Of:
Environment:
Last Closed: 2022-04-04 10:19:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 45769 0 None None None 2020-12-11 09:52:08 UTC
Ceph Project Bug Tracker 47782 0 None None None 2020-12-16 10:39:58 UTC
Ceph Project Bug Tracker 48624 0 None None None 2020-12-16 10:39:58 UTC
Ceph Project Bug Tracker 49622 0 None None None 2021-03-05 11:50:13 UTC
Github ceph ceph pull 39850 0 None open mgr/cephadm: Do not allow remove hosts with daemons running 2021-03-11 08:11:28 UTC
Github ceph ceph pull 42017 0 None closed mgr/cephadm: add ceph orch host drain and limit host removal to empty hosts 2021-07-07 13:29:15 UTC
Red Hat Product Errata RHSA-2022:1174 0 None None None 2022-04-04 10:20:24 UTC

Description Preethi 2020-10-07 17:00:51 UTC
Description of problem:
ceph orch host rm <host> is not stopping the services deployed in the respective removed hosts

Version-Release number of selected component (if applicable):

Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-83100-20200929173915
ceph version 16.0.0-5974.el8cp (8135ff644ca71c2ae8c2d37db20a75166cdb15ef) pacific (dev)


issue is filed in the upstream tracker:
https://tracker.ceph.com/issues/47782


How reproducible:


Steps to Reproduce:

cluster with more than one mon nodes
and with at least 1 daemon (osd) running in everyhost
ceph orch host rm <host>
ceph ceph orch ps | grep <host>
this command does not provide any output, but however
if you check the containers running in the removed host they remain started
if you check ceph -s ,... no changes respect the situation before removing the hosts

you can also check with systemctl commands which shows services are in active state though hosts are removed from cluster.

Actual results:


Expected results:


 Additional info:

Comment 1 Veera Raghava Reddy 2020-12-02 16:43:17 UTC
Hi Juan,
Similar BZs are being reported by customers for Ceph Ansible for cleaning the nodes after cluster purge.
For CephADM we would like to have a good experience with removing of cluster nodes using CephADM/orch in 5.0 release.

Comment 2 Juan Miguel Olmo 2020-12-11 09:55:09 UTC
We are not able to do this kind of operation in 5.0.
In order to remove properly a host you need to manually modify the services (ceph orch apply ...) with daemons running in the hosts, in order to remove this daemons.

Once we do not have daemons running in the host  you can use the "ceph orch rm command"

Comment 3 Juan Miguel Olmo 2020-12-16 10:39:59 UTC
Decision taken:

For 5.0
We are going to return an error message if the user tries to remove a host with ceph daemons running:

Example:
# ceph orch host rm testhost
Operation not allowed (daemons must be removed first). <testhost> has the following daemons running <mon, mgr, osd>. 


For 5.x

Usability will be improved when we implement the command "drain":
This command will remove the daemons running in one host.
https://tracker.ceph.com/issues/48624

Comment 4 Preethi 2021-01-06 07:14:56 UTC
@Juan, "Ceph orch rm <service id> will remove service only after services are stopped using ceph orch stop <service> is applied. We need to improve on usability here i feel. Also, for purging/rm MON, OSD, MGR etc we use ceph orch apply option to redeploy. We should make use of remove options instead of orch apply again is what i feel.

Comment 5 Juan Miguel Olmo 2021-03-05 12:03:23 UTC
In this version 5.0 ( and i do not if it will be possible in 5.1) the removal of hosts with ceph daemons runnig is not going to be allowed.

https://github.com/ceph/ceph/pull/39850

Comment 6 Sebastian Wagner 2021-05-31 12:44:45 UTC
*** Bug 1930341 has been marked as a duplicate of this bug. ***

Comment 9 Juan Miguel Olmo 2021-06-21 07:21:14 UTC
Doc text: LGTM

Comment 10 Daniel Pivonka 2021-06-28 14:26:51 UTC
https://github.com/ceph/ceph/pull/39850 was closed and replaced with https://github.com/ceph/ceph/pull/42017

Comment 11 Daniel Pivonka 2021-06-28 14:29:36 UTC
upstream trackers: https://tracker.ceph.com/issues/49622  https://tracker.ceph.com/issues/48624

Comment 12 Daniel Pivonka 2021-08-24 16:41:50 UTC
backported to pacific https://github.com/ceph/ceph/pull/42736

Comment 18 Daniel Pivonka 2021-11-11 23:53:24 UTC
'ceph orch osd rm <id>' is run as part of 'ceph orch host drain <host>' if you check 'ceph orch osd rm status' is should show that those osds are trying to be removed.

The problem with your cluster though is you only have 3 hosts and your replication count is 3. So you can not remove those osds because without them data can not be replicated 3 times. 
The operation your trying to do is not safe hence why its not happening.

Comment 30 errata-xmlrpc 2022-04-04 10:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174


Note You need to log in before you can comment on or make changes to this bug.