Description of problem: The current version of ceph-ansible does not support removal of MON and OSD nodes. This is a regression against ceph-deploy functionality. Shrinking a cluster is not supported by Console, but we need to provide a way to remove nodes from the cluster at least on the CLI. Resolution: Sébastien is implementing code to accomplish this in ceph-ansible, the latest upstream version will be able to perform this functionality and we should package it as an async.
This should be targeted at the first Async — but I only see targets 2 and 3....
Yup fixed in v1.0.8
This will ship concurrently with RHCS 2.1.
What automated tests cover this feature as implemented today? From discussion with Andrew, it sounds like the current implementation requires the admin to run Ansible run *on* the Ceph cluster nodes? (runs local commands?) If so, we need to change that.
*** Bug 1335569 has been marked as a duplicate of this bug. ***
*** Bug 1414092 has been marked as a duplicate of this bug. ***
Created attachment 1324368 [details] File contains contents ansible-playbook log, conf file after removing a monitor Hi all, I worked on shrinking MON from the cluster. playbook run was successful, but 1) Monitor was still in the cluster though "verify the monitor is out of the cluster" completed without any errors and 2) Configuration file still had entry of removed monitor. By referring steps mentioned in Admin Doc to remove a monitor from the cluster, I expect ansible need to remove the mon from the cluster and modify, re-distribute the config file to increase the usability of the feature. I'm moving the BZ back to ASSIGNED state, please let me know if my expectation is not appropriate. I've attached a file containing ansible-log and conf file after removing a mon. (Terminal log after removing a MON from node magna051) # sudo ceph -s --cluster 12_3a ------- health: HEALTH_WARN ------- 1/3 mons down, quorum magna033,magna040 services: mon: 3 daemons, quorum magna033,magna040, out of quorum: magna051 ------- $ sudo ceph mon stat --cluster 12_3a e2: 3 mons at {magna033=10.8.128.33:6789/0,magna040=10.8.128.40:6789/0,magna051=10.8.128.51:6789/0}, election epoch 12, leader 0 magna033, quorum 0,1 magna033,magna040 $ sudo ceph mon remove magna051 --cluster 12_3a removing mon.magna051 at 10.8.128.51:6789/0, there will be 2 monitors $ sudo ceph mon stat --cluster 12_3a e3: 2 mons at {magna033=10.8.128.33:6789/0,magna040=10.8.128.40:6789/0}, election epoch 14, leader 0 magna033, quorum 0,1 magna033,magna040 Regards, Vasishta
It's weird, can you retry and run ansible in debug mode? with -vvvv please? I need to make sure the command was issued properly. Thanks!
FYI I haven't been able to reproduce.
Created attachment 1325704 [details] File contains contents ansible-playbook log and conf file from different nodes Hi Sebastien, This time it worked partially. Mon was removed from the cluster as expected but conf file in rest of the cluster were not updated. I've copied those conf files and ansible log with verbose enabled. Can you please check this once ?
This is expected that the user will update the ceph.conf. It's difficult for us to do the update and re-distribute because this means modifying their inventory. Modifying the inventory is not possible, even if we override it, the next-ansible run will override it again.
Since you've been able to make it work eventually I'm moving this back to POST. Also as described in my earlier comment, I don't think we can do much more than what we currently do. Thanks.
Vasishta is this still an issue in rc7?
It is acceptable, and yes, let's please add this step to the docs. A prompt indicating ceph.conf needs to be updated may also be in order (Seb's call).
At the end of the play, we prompt the user with a message saying: "The monitor has been successfully removed from the cluster. Please remove the monitor entry from the rest of your ceph configuration files, cluster wide."
Hi Ken, Can you please move this BZ to ON_QA ? Regards, Vasishta
Tried with ceph-ansible-3.0.2-1.el7cp.noarch, and observed that a message being displayed asking user to remove the monitor entry from the rest of your ceph configuration files, cluster wide. Looks good to me, moving to VERIFIED state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387