1335569 – [ceph-ansible] : unable to shrink cluster(remove MON and/or OSD) using ceph-ansible

Bug 1335569 - [ceph-ansible] : unable to shrink cluster(remove MON and/or OSD) using ceph-ansible

Summary: [ceph-ansible] : unable to shrink cluster(remove MON and/or OSD) using ceph-a...

Keywords:
Status:	CLOSED DUPLICATE of bug 1366807
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-ansible
Sub Component:
Version:	2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3
Assignee:	Tamil
QA Contact:	sds-qe-bugs
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:	1322504 1383917
TreeView+	depends on / blocked

Reported:	2016-05-12 14:19 UTC by Rachana Patel
Modified:	2017-03-06 10:45 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Ansible does not support removing monitor or OSD nodes The current version of the `ceph-ansible` utility does not support removing monitor or OSD nodes. To remove monitor or OSD nodes from a cluster, use the manual procedure. For more information, see the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide[Administration Guide] for Red Hat Ceph Storage 2.
Clone Of:
Environment:
Last Closed:	2017-03-03 16:50:03 UTC
Embargoed:

Attachments	(Terms of Use)

Description Rachana Patel 2016-05-12 14:19:08 UTC

Description of problem:
=======================
unable to shrink cluster - remove MON and/or OSD from cluster using ceph-ansible


Version-Release number of selected component (if applicable):
=============================================================
ceph-ansible-1.0.5-10.el7scon.noarch


How reproducible:
=================
always


Steps to Reproduce:
1.
2.
3.


Expected results:
=================
ceph-ansible should provide a way to remove OSD and/or MON from the cluster.


Additional info:

Comment 2 Harish NV Rao 2016-05-24 14:58:14 UTC

Federico,

This defect is re-targeted to ceph release 3.

Is product management ok with this? Please confirm.

If this is not going to be in 2, then what is the alternate plan for the customer who want to remove/add nodes to the cluster? 

Regards,
Harish

Comment 3 Harish NV Rao 2016-05-25 18:20:34 UTC

There was a brief discussion on this BZ in today's program meeting and it was decided to discuss further on this via bz. I am changing target release to 2.0 till we reach final decision.

Comment 6 Federico Lucifredi 2016-05-31 22:02:01 UTC

Gregory, we need confirmation that removing a node without using Ansible will not impact the rest of the install/scale out Ansible/ceph-install stack before we punt this to Console v3.

question is around day to day operations, failed nodes, etc. Not about console initiated operations, but about the need to ditch a node. We know RHS-C will not do this in 2.0, we need to decide that indeed we do not need this in Ansible either operationally.

What say you?

Comment 8 Alfredo Deza 2016-06-01 18:37:04 UTC

We can't know with absolute certainty that in such a case ceph-ansible would work in the future.

There is currently no support for removing a node in ceph-ansible.

Comment 9 Neil Levine 2016-06-02 19:02:40 UTC

Will USM break if a customer manually removes a node? We don't have this support in ceph-ansible and we don't have the time to implement it.

Comment 10 Tamil 2016-06-13 23:59:02 UTC

tried to deploy a ceph cluster using ceph-ansible and removed a ceph osd node using ceph-deploy.
everything seems to be normal and the cluster is healthy.

1. deployed ceph cluster on a 3 node test setup [ 1 mon+osd, 2 osd nodes] using ceph-ansible
2. removed a osd node from the cluster using ceph-deploy [ceph-deploy purge <node>, ceph-deploy purgedata <node>]
3. set the crushmap to use osd level replication instead of host level [since am now left with only 2 nodes] - dont need this step in an environment that has more than 2 osds.
4. Ceph cluster is healthy.

so, i believe, until we have a support in ceph-ansible or Console to remove a node[for faulty disks or whatever reason], we can use ceph-deploy to do it.

ceph-deploy is going to be however shipped in RH Ceph Tools repo [rhel-7-server-rhceph-2-tools-rpms] although deprecated.

Comment 11 Harish NV Rao 2016-06-15 07:34:02 UTC

Neil, can you please check comment 10 and let us know your opinion?

My guess is that this workaround will be exposed only to Support team and not to the customer. If this is correct, then where do we document the info?

Comment 12 Neil Levine 2016-06-15 17:58:39 UTC

We can't use ceph-deploy to do any operational actions on the cluster. This goes for both support and the customer. 

How long is the manual process to remove a node to document?

Comment 13 Tamil 2016-06-15 18:07:38 UTC

Harish and Neil, please note that comment10 only answers the question on comment-6  which is what we tested and confirmed that removing a node from a running cluster will not impact the cluster in anyway. 

ceph-deploy was only used as a short cut to do that.

as Neil pointed out, we may have to document the manual process to remove a node from the cluster.

Comment 14 Neil Levine 2016-06-15 19:11:08 UTC

(In reply to Tamil from comment #13)
> Harish and Neil, please note that comment10 only answers the question on
> comment-6  which is what we tested and confirmed that removing a node from a
> running cluster will not impact the cluster in anyway. 
> 
> ceph-deploy was only used as a short cut to do that.
> 
> as Neil pointed out, we may have to document the manual process to remove a
> node from the cluster.

Ah. Ok, thanks for clearing that up.

Comment 15 Harish NV Rao 2016-06-16 08:05:02 UTC

 (In reply to Tamil from comment #13)
> Harish and Neil, please note that comment10 only answers the question on
> comment-6  which is what we tested and confirmed that removing a node from a
> running cluster will not impact the cluster in anyway. 
Thanks Tamil!
> 
> ceph-deploy was only used as a short cut to do that.
> 
> as Neil pointed out, we may have to document the manual process to remove a
> node from the cluster.
Can you please let me know who will be coming up with manual process(steps)?

Comment 16 Tamil 2016-06-22 22:24:16 UTC

Harish, i bet there is already a document upstream on how to remove a monitor or osd from the running cluster. 
Docs team has to make a similar copy for downstream.

reference: http://docs.ceph.com/docs/master/rados/operations/

Comment 17 Harish NV Rao 2016-06-23 07:20:12 UTC

Ken, can you please get us the downstream documentation from comment 16?

Comment 19 Tamil 2016-08-17 21:33:49 UTC

verified the doc, looks good!

Comment 21 Alfredo Deza 2017-01-12 15:13:01 UTC

We aren't ready to fully support this yet. This was initially targeted for 3.0 and we are working towards having this fully tested in upstream CI

Comment 22 Ken Dreyer (Red Hat) 2017-03-03 16:50:03 UTC

shrinking is tracked in bz 1366807

*** This bug has been marked as a duplicate of bug 1366807 ***

Note You need to log in before you can comment on or make changes to this bug.