Created attachment 1082405 [details] Current Leader Mon Log Description of problem: After continuous monitor addition and removal ( Even new hosts added a MON), my Cluster is now in unusable state, getting continuous Error message: "2015-10-13 23:01:25.332621 7fac6c50c700 0 -- :/1034060 >> 10.70.44.42:6789/0 pipe(0x7fac5c0008c0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac5c0136c0).fault" While running any ceph command. Version-Release number of selected component (if applicable): ceph-0.94.3-2.el7cp.x86_64 How reproducible: NA Steps to Reproduce: 1. Started the test with 1 Mon 2. Then added 2 more Mon. 3. Again killed 1 MON and added a new host which acted as a new Mon. 4. Again added 2 more Mon ( At this point i have 5 Mon and Cluster in healthy state and it was in Quorum) 5. Then i destroyed the leader mon. ceph-deploy mon destroy Node-Name NOTE: All the time there was IO happening in the Cluster. Actual results: After that i am unable to do any operation in the Cluster, and it went into unusable state. Expected results: 4 Nodes should be enough for having quorum Additional info: I tried the same on my 1.3.0 Cluster having the same Mon config and it worked fine. From the other mon log i can see that the health is OK and there are 4 Mons ============================================================================== 2015-10-13 21:55:14.927477 7f4bde543700 0 log_channel(cluster) log [INF] : mon.cephqe3@0 won leader election with quorum 0,1,2,3 2015-10-13 21:55:14.933411 7f4bde543700 0 log_channel(cluster) log [INF] : HEALTH_OK 2015-10-13 21:55:14.934109 7f4bde543700 0 log_channel(cluster) log [WRN] : mon.3 10.70.44.56:6789/0 clock skew 23.1797s > max 0.05s 2015-10-13 21:55:14.934184 7f4bde543700 0 log_channel(cluster) log [WRN] : mon.2 10.70.44.54:6789/0 clock skew 7.02393s > max 0.05s 2015-10-13 21:55:14.952291 7f4bde543700 0 log_channel(cluster) log [INF] : monmap e6: 4 mons at {cephqe10=10.70.44.54:6789/0,cephqe11=10.70.44.56:6789/0,cephqe3=10.70.44.40:6789/0,ceph qe7=10.70.44.48:6789/0} =============================================================================== Attaching the current leader mon log
the ceph.conf on cephqe3 looks like # cat /etc/ceph/ceph.conf [global] osd crush location hook = /usr/bin/calamari-crush-location fsid = 3461ab41-2b16-4e45-a350-902fe73ea98a mon_initial_members = cephqe4 mon_host = 10.70.44.42 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true so it's expected behaviour of "ceph" CLI and monitor. but i am not sure if it's expected from ceph-deploy's perspective. as per Tanay, he installed cephqe4 using "ceph-deploy install --mon <mon>", and added the other monitors using "ceph-deploy mon add <mon>", then put the cephqe4 offline. seems we don't update the ceph.conf when adding a new monitor using ceph-deploy. @Alfredo, is this expected behaviour? i am looking at ceph-deploy/hosts/common.py, seems we are updating the conf file in mon_add(), but seems it does not rewrite the "monmap", "mon host", the "mon addr" in "mon[.*]" and "global" sections. the ceph CLI tries to buildup the initial monmap by reading those variables from the config file. what if our user starts his/her cluster with only one monitor, and adds more of them, then let the very first monitor offline? this will leave him/her with a conf file with an out-dated monmap. and the CLI will not able to connect to the cluster with it.
ceph-deploy does not have anything to be able to *update* values in a configuration file arbitrarily. When creating a mon or adding a mon the same configuration file that exists locally in the CWD is then written on the remote host. This seems to me like an advanced usage scenario, where we shouldn't expect ceph-deploy to be able to keep up with the effort of updating and synchronizing the configuration file.
thank you, Alfredo! Tanay, if you believe that we should update the document to address this issue. could you update the ticket accordingly? thanks.
Hi Kefu/Alfredo, How can we document this, as discussed earlier this can easily happen at customer place when customer who have single Initial Mon Node, later feels like expanding the Mon Cluster and add few more. Then if something goes wrong with the original Mon Node then we will hit this Issue. Workaround is: We need to manually change the ceph.conf and remove the old entry. Can we make the change in code, to avoid this workaround.
To clarify: it sounds like we're saying the "Adding a Monitor" documentation should be updated to say "When adding a new monitor host, you should also add it to the 'mon initial members' configuration option in ceph.conf". Right? The alternative is updating ceph-deploy to dynamically insert new monitors into ceph.conf on the fly. Unfortunately that ceph-deploy change is probably not going to happen any time soon :(
Let's make it documentation per comment #7.
approved at program meeting on Oct 28,2015
John, This changes only talks about removing the old Mon reference in ceph.conf file. In addition to this, we also need to add the new monitor IP and hostname once the new Mon have been added to Cluster, changes should be again made in ceph.conf file. Thanks, Tanay
> We need to manually change the ceph.conf and remove the old entry. and more importantly, to add the new one. because the client needs to create the initial monmap so it is able to contact at least one of the alive monitors to get the latest monmap, osdmap and other important cluster information.
See: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/7bc27031c1ba98ea6902e0797753bb5582c5e237 https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/47cf365e16b63b121255177fd30fffb56705a94b
See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/4fafbb099ae526515558a53816501183f9166016
John, As mentioned in Comment 17, we need to add both in the documentation. mon_initial_members AND mon_host Refer my above comment 17 Another minor change, the starting of the Important section: If are adding a monitor to a cluster that has only one monitor Maybe solution: If you are adding a monitor to a cluster that has only one monitor If adding a monitor to a cluster that has only one monitor
See https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/commit/80ec873960e71e111f920488b405adfe16cea042
Marking this Bug as Verified.
Fixed for 1.3.1 Release.