Bug 1366807 - [RFE] ceph-ansible: remove MON and OSD nodes
Summary: [RFE] ceph-ansible: remove MON and OSD nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 3.0
Assignee: seb
QA Contact: Vasishta
Bara Ancincova
URL:
Whiteboard:
: 1335569 1414092 (view as bug list)
Depends On:
Blocks: 1322504 1383917 1412948 1494421
TreeView+ depends on / blocked
 
Reported: 2016-08-12 22:29 UTC by Federico Lucifredi
Modified: 2017-12-05 23:31 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc6.el7cp Ubuntu: ceph-ansible_3.0.0~rc6-2redhat1
Doc Type: Enhancement
Doc Text:
.Ansible now supports removing Monitors and OSDs You can use the `ceph-ansible` utility to remove Monitors and OSDs from a Ceph cluster. For details, see the link:https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#removing-monitors-with-ansible[Removing Monitors with Ansible] and link:https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#removing-osds-with-ansible[Removing OSDs with Ansible] sections in the Red Hat Ceph Storage 3 Administration Guide. The same procedures apply also for removing Monitors and OSDs from a containerized Ceph cluster.
Clone Of:
Environment:
Last Closed: 2017-12-05 23:31:14 UTC
Embargoed:


Attachments (Terms of Use)
File contains contents ansible-playbook log, conf file after removing a monitor (10.02 KB, text/plain)
2017-09-11 08:35 UTC, Vasishta
no flags Details
File contains contents ansible-playbook log and conf file from different nodes (45.02 KB, text/x-vhdl)
2017-09-14 04:26 UTC, Vasishta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 1836 0 None closed shrink mon and osd 2020-03-25 15:40:21 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Federico Lucifredi 2016-08-12 22:29:59 UTC
Description of problem:


The current version of ceph-ansible does not support removal of MON and OSD nodes. 

This is a regression against ceph-deploy functionality.

Shrinking a cluster is not supported by Console, but we need to provide a way to remove nodes from the cluster at least on the CLI.

Resolution:

Sébastien is implementing code to accomplish this in ceph-ansible, the latest upstream version will be able to perform this functionality and we should package it as an async.

Comment 2 Federico Lucifredi 2016-08-12 22:31:46 UTC
This should be targeted at the first Async — but I only see targets 2 and 3....

Comment 5 seb 2016-10-07 08:38:59 UTC
Yup fixed in v1.0.8

Comment 6 Federico Lucifredi 2016-10-07 16:58:56 UTC
This will ship concurrently with RHCS 2.1.

Comment 7 Ken Dreyer (Red Hat) 2017-03-03 16:29:23 UTC
What automated tests cover this feature as implemented today?

From discussion with Andrew, it sounds like the current implementation requires the admin to run Ansible run *on* the Ceph cluster nodes? (runs local commands?) If so, we need to change that.

Comment 9 Ken Dreyer (Red Hat) 2017-03-03 16:50:03 UTC
*** Bug 1335569 has been marked as a duplicate of this bug. ***

Comment 13 Drew Harris 2017-06-29 14:01:26 UTC
*** Bug 1414092 has been marked as a duplicate of this bug. ***

Comment 16 Vasishta 2017-09-11 08:35:08 UTC
Created attachment 1324368 [details]
File contains contents ansible-playbook log, conf file after removing a monitor

Hi all,

I worked on shrinking MON from the cluster. playbook run was successful, but 
1) Monitor was still in the cluster though "verify the monitor is out of the cluster" completed without any errors  
and
2) Configuration file still had entry of removed monitor.

By referring steps mentioned in Admin Doc to remove a monitor from the cluster, I expect ansible need to remove the mon from the cluster and modify, re-distribute the config file to increase the usability of the feature.

I'm moving the BZ back to ASSIGNED state, please let me know if my expectation is not appropriate. I've attached a file containing ansible-log and conf file after removing a mon.

(Terminal log after removing a MON from node magna051)

# sudo ceph -s --cluster 12_3a
-------
    health: HEALTH_WARN
-------       
            1/3 mons down, quorum magna033,magna040
 
  services:
    mon: 3 daemons, quorum magna033,magna040, out of quorum: magna051
-------
$ sudo ceph mon stat --cluster 12_3a
e2: 3 mons at {magna033=10.8.128.33:6789/0,magna040=10.8.128.40:6789/0,magna051=10.8.128.51:6789/0}, election epoch 12, leader 0 magna033, quorum 0,1 magna033,magna040

$ sudo ceph mon remove magna051 --cluster 12_3a
removing mon.magna051 at 10.8.128.51:6789/0, there will be 2 monitors

$ sudo ceph mon stat --cluster 12_3a
e3: 2 mons at {magna033=10.8.128.33:6789/0,magna040=10.8.128.40:6789/0}, election epoch 14, leader 0 magna033, quorum 0,1 magna033,magna040


Regards,
Vasishta

Comment 17 seb 2017-09-12 17:46:17 UTC
It's weird, can you retry and run ansible in debug mode? with -vvvv please?
I need to make sure the command was issued properly.

Thanks!

Comment 18 seb 2017-09-13 22:24:16 UTC
FYI I haven't been able to reproduce.

Comment 19 Vasishta 2017-09-14 04:26:51 UTC
Created attachment 1325704 [details]
File contains contents ansible-playbook log and conf file from different nodes

Hi Sebastien,

This time it worked partially. Mon was removed from the cluster as expected but conf file in rest of the cluster were not updated.

I've copied those conf files and ansible log with verbose enabled. Can you please check this once ?

Comment 20 seb 2017-09-14 05:38:18 UTC
This is expected that the user will update the ceph.conf. It's difficult for us to do the update and re-distribute because this means modifying their inventory.

Modifying the inventory is not possible, even if we override it, the next-ansible run will override it again.

Comment 21 seb 2017-09-15 13:13:24 UTC
Since you've been able to make it work eventually I'm moving this back to POST.
Also as described in my earlier comment, I don't think we can do much more than what we currently do.

Thanks.

Comment 22 Ken Dreyer (Red Hat) 2017-09-18 14:48:19 UTC
Vasishta is this still an issue in rc7?

Comment 26 Federico Lucifredi 2017-10-13 20:27:53 UTC
It is acceptable, and yes, let's please add this step to the docs.

A prompt indicating ceph.conf needs to be updated may also be in order (Seb's call).

Comment 27 Sébastien Han 2017-10-16 07:59:10 UTC
At the end of the play, we prompt the user with a message saying:

"The monitor has been successfully removed from the cluster.
 Please remove the monitor entry from the rest of your ceph configuration files, cluster wide."

Comment 28 Vasishta 2017-10-16 10:52:28 UTC
Hi Ken,

Can you please move this BZ to ON_QA ?

Regards,
Vasishta

Comment 30 Vasishta 2017-10-17 11:08:42 UTC
Tried with ceph-ansible-3.0.2-1.el7cp.noarch, and observed that a message being displayed asking user to remove the monitor entry from the rest of your ceph configuration files, cluster wide.

Looks good to me, moving to VERIFIED state.

Comment 34 errata-xmlrpc 2017-12-05 23:31:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.