Bug 1278558 - Managing Cluster Size: one more step need to be added to 'Remove a Monitor -> manual'
Managing Cluster Size: one more step need to be added to 'Remove a Monitor ->...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Documentation (Show other bugs)
1.3.1
Unspecified Unspecified
unspecified Severity high
: rc
: 1.3.1
Assigned To: ceph-docs@redhat.com
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-05 14:28 EST by Harish NV Rao
Modified: 2015-12-18 04:59 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-18 04:59:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Harish NV Rao 2015-11-05 14:28:55 EST
Description of problem:

I follow the doc: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc for my mon add/remove testing. It has following commands to remove the mon:


    Stop the monitor. :

     service ceph -a stop mon.{mon-id}

    Remove the monitor from the cluster. :

     ceph mon remove {mon-id}

    Remove the monitor entry from ceph.conf.

The monitor stops and is removed from quorum list also after executing these steps.

But when I try to add the mon on same node again using manual steps(non-ceph-deploy) mentioned in above doc, it failed at one of the commands: 
[cephuser@magna040 ~]$ sudo ceph-mon -i magna040 --mkfs --monmap temp/map-filename --keyring temp/key-filename
'/var/lib/ceph/mon/ceph-magna040' already exists and is not empty: monitor may already exist

After removing the '/var/lib/ceph/mon/ceph-magna040', add mon was successful.

so,
1) the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that.

2) after the mon entry is removed from ceph.conf the new ceph.conf file should be distributed to other nodes. Please check and add this instruction too.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Harish NV Rao 2015-11-05 14:35:00 EST
correction:

Please read "1) the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that." as "1) the doc should contain a step to remove the above mentioned old dir"

Thanks,
Harish
Comment 4 Harish NV Rao 2015-11-07 04:15:26 EST
Two things:

A] There is no instruction to distribute the ceph.conf after removing the mon entry. Please add that.

B] I am getting following error when I use umount command.

[cephuser@magna040 ~]$ ll /var/lib/ceph/mon
total 4
drwxr-xr-x. 3 root root 4096 Nov  7 03:44 ceph-magna040

[cephuser@magna040 ~]$ sudo umount /var/lib/ceph/mon/ceph-magna040
umount: /var/lib/ceph/mon/ceph-magna040: not mounted

-----------------------------------------------------------

Following commands were run:

sudo service ceph -a stop mon.magna040
sudo ceph mon remove magna040
sudo umount /var/lib/ceph/mon/ceph-magna040

---------------------------------------------------------

the "destroy_mon" in mon.py archives this /var/lib/ceph/mon/ceph-<mon-id> directory under /var/lib/ceph/mon-removed/. 

We may have to do similarly while removing a mon manually.

You may want to check with Alfredo/Kefu on this.

Kefu, can you please help here?
Comment 5 John Wilkins 2015-11-09 13:25:43 EST
Added an archive instruction and ceph-deploy config push. Umount is actually only if you added manually with intent to use a separate drive, like an SSD. You can just use rm.

https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/acd2f0b42322773d0479316e209238ae59713f6b
Comment 6 Harish NV Rao 2015-11-09 13:42:54 EST
Thanks John!

Kefu, please let us know if the steps mentioned above is fine with you.
Comment 7 Kefu Chai 2015-11-10 03:46:02 EST
@Harish and @John, s/cp/mv/ and i agree that it's optional to backup the monstore.
Comment 8 Harish NV Rao 2015-11-10 04:20:59 EST
Hi Kefu,

When 'mon destroy' command is used it archives the dir as:

/var/lib/ceph/mon-removed/ceph-magna040-2015-11-06Z11:59:02

Is there any requirement in Ceph that removed mon's directory should be archived in a specific format as mentioned above?

Please clarify

Manual way suggests to move to just /var/lib/ceph/mon-removed/ceph-<mon-id>. 

If same mon is removed and added again then user has to rename it to something else. I hope that should be fine.

Please confirm.

Moving the defect to assigned state to incorporate Kefu's suggestions.

Regards,
Harish
Comment 9 Kefu Chai 2015-11-10 12:08:38 EST
> Is there any requirement in Ceph that removed mon's directory should be archived in a specific format as mentioned above?

no. actually, i think that we backup the monstore just in case it's needed later. but as the monstore is updated very often. and a newly joined monitor will always be sync'ed with the latest monstore from its peer.

so it's fine to keep only one copy of the monstore.
Comment 10 shylesh 2015-11-16 09:51:29 EST
Verified the document and changes are properly made.
However one more thing need to be added in "Manual" mon addition section.https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc



At step 4 of "Manual" mon addition section user has to run "ceph auth get mon. -o {tmp}/{key-filename}" command on the new mon node , since its a new machine there will be no admin.keyring on this machine so this command fails with 


"ubuntu@magna110:~$ sudo ceph auth get mon. -o ./tmp/keyring
2015-11-16 09:31:35.461940 7fe62e10a700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-11-16 09:31:35.461946 7fe62e10a700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound"

@John,
So I think we need to copy this key before running this command. Am I missing something ?? or its covered elsewhere??
Comment 11 shylesh 2015-11-16 13:46:57 EST
@John,

In the section "Manual" remove of mon nodes, If user performs step 5 then step 6 is not required. Could you please highlight this in the doc. I mean if you glance through the doc it looks like both the steps are required but its not .
Comment 12 Harish NV Rao 2015-11-16 14:12:28 EST
changing to assigned state to incorporate comments 10 and 11.
Comment 14 shylesh 2015-11-18 13:30:24 EST
@John,

Can you incorporate comment 10 as well, otherwise "ceph auth get" fails with permission errors.
Comment 16 shylesh 2015-11-19 12:17:10 EST
Added "ceph-deploy --overwrite-conf admin <ceph-node>" command in the doc, this will copy the client.admin keyring and ceph.conf file to the destination node.

hence marking this bug as verified.
Comment 17 Anjana Suparna Sriram 2015-12-18 04:59:56 EST
Fixed for 1.3.1 Release.

Note You need to log in before you can comment on or make changes to this bug.