Description of problem: I follow the doc: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc for my mon add/remove testing. It has following commands to remove the mon: Stop the monitor. : service ceph -a stop mon.{mon-id} Remove the monitor from the cluster. : ceph mon remove {mon-id} Remove the monitor entry from ceph.conf. The monitor stops and is removed from quorum list also after executing these steps. But when I try to add the mon on same node again using manual steps(non-ceph-deploy) mentioned in above doc, it failed at one of the commands: [cephuser@magna040 ~]$ sudo ceph-mon -i magna040 --mkfs --monmap temp/map-filename --keyring temp/key-filename '/var/lib/ceph/mon/ceph-magna040' already exists and is not empty: monitor may already exist After removing the '/var/lib/ceph/mon/ceph-magna040', add mon was successful. so, 1) the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that. 2) after the mon entry is removed from ceph.conf the new ceph.conf file should be distributed to other nodes. Please check and add this instruction too. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
correction: Please read "1) the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that." as "1) the doc should contain a step to remove the above mentioned old dir" Thanks, Harish
https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/9a6bed924ac78f9724a53497f3acad3961a9a5ba I've added a step for that now.
Two things: A] There is no instruction to distribute the ceph.conf after removing the mon entry. Please add that. B] I am getting following error when I use umount command. [cephuser@magna040 ~]$ ll /var/lib/ceph/mon total 4 drwxr-xr-x. 3 root root 4096 Nov 7 03:44 ceph-magna040 [cephuser@magna040 ~]$ sudo umount /var/lib/ceph/mon/ceph-magna040 umount: /var/lib/ceph/mon/ceph-magna040: not mounted ----------------------------------------------------------- Following commands were run: sudo service ceph -a stop mon.magna040 sudo ceph mon remove magna040 sudo umount /var/lib/ceph/mon/ceph-magna040 --------------------------------------------------------- the "destroy_mon" in mon.py archives this /var/lib/ceph/mon/ceph-<mon-id> directory under /var/lib/ceph/mon-removed/. We may have to do similarly while removing a mon manually. You may want to check with Alfredo/Kefu on this. Kefu, can you please help here?
Added an archive instruction and ceph-deploy config push. Umount is actually only if you added manually with intent to use a separate drive, like an SSD. You can just use rm. https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/acd2f0b42322773d0479316e209238ae59713f6b
Thanks John! Kefu, please let us know if the steps mentioned above is fine with you.
@Harish and @John, s/cp/mv/ and i agree that it's optional to backup the monstore.
Hi Kefu, When 'mon destroy' command is used it archives the dir as: /var/lib/ceph/mon-removed/ceph-magna040-2015-11-06Z11:59:02 Is there any requirement in Ceph that removed mon's directory should be archived in a specific format as mentioned above? Please clarify Manual way suggests to move to just /var/lib/ceph/mon-removed/ceph-<mon-id>. If same mon is removed and added again then user has to rename it to something else. I hope that should be fine. Please confirm. Moving the defect to assigned state to incorporate Kefu's suggestions. Regards, Harish
> Is there any requirement in Ceph that removed mon's directory should be archived in a specific format as mentioned above? no. actually, i think that we backup the monstore just in case it's needed later. but as the monstore is updated very often. and a newly joined monitor will always be sync'ed with the latest monstore from its peer. so it's fine to keep only one copy of the monstore.
Verified the document and changes are properly made. However one more thing need to be added in "Manual" mon addition section.https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc At step 4 of "Manual" mon addition section user has to run "ceph auth get mon. -o {tmp}/{key-filename}" command on the new mon node , since its a new machine there will be no admin.keyring on this machine so this command fails with "ubuntu@magna110:~$ sudo ceph auth get mon. -o ./tmp/keyring 2015-11-16 09:31:35.461940 7fe62e10a700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2015-11-16 09:31:35.461946 7fe62e10a700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound" @John, So I think we need to copy this key before running this command. Am I missing something ?? or its covered elsewhere??
@John, In the section "Manual" remove of mon nodes, If user performs step 5 then step 6 is not required. Could you please highlight this in the doc. I mean if you glance through the doc it looks like both the steps are required but its not .
changing to assigned state to incorporate comments 10 and 11.
See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/4fafbb099ae526515558a53816501183f9166016
@John, Can you incorporate comment 10 as well, otherwise "ceph auth get" fails with permission errors.
See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/070689191fbeeaaf0ec794b16ef485e23a52088e
Added "ceph-deploy --overwrite-conf admin <ceph-node>" command in the doc, this will copy the client.admin keyring and ceph.conf file to the destination node. hence marking this bug as verified.
Fixed for 1.3.1 Release.