Bug 1271227 - Monitor thrashing causing the Cluster in a dead state
Monitor thrashing causing the Cluster in a dead state
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Documentation (Show other bugs)
x86_64 Linux
unspecified Severity urgent
: rc
: 1.3.1
Assigned To: ceph-docs@redhat.com
Depends On:
  Show dependency treegraph
Reported: 2015-10-13 08:20 EDT by Tanay Ganguly
Modified: 2016-09-19 21:50 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-18 04:59:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Current Leader Mon Log (2.00 MB, text/plain)
2015-10-13 08:20 EDT, Tanay Ganguly
no flags Details

  None (edit)
Description Tanay Ganguly 2015-10-13 08:20:10 EDT
Created attachment 1082405 [details]
Current Leader Mon Log

Description of problem:
After continuous monitor addition and removal ( Even new hosts added a MON), my Cluster is now in unusable state, getting continuous Error message:

"2015-10-13 23:01:25.332621 7fac6c50c700  0 -- :/1034060 >> pipe(0x7fac5c0008c0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac5c0136c0).fault"

While running any ceph command.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Started the test with 1 Mon
2. Then added 2 more Mon.
3. Again killed 1 MON and added a new host which acted as a new Mon.
4. Again added 2 more Mon ( At this point i have 5 Mon and Cluster in healthy state and it was in Quorum)
5. Then i destroyed the leader mon.
ceph-deploy mon destroy Node-Name

NOTE: All the time there was IO happening in the Cluster.

Actual results:
After that i am unable to do any operation in the Cluster, and it went into unusable state.

Expected results:
4 Nodes should be enough for having quorum

Additional info:
I tried the same on my 1.3.0 Cluster having the same Mon config and it worked fine.

From the other mon log i can see that the health is OK and there are 4 Mons
2015-10-13 21:55:14.927477 7f4bde543700  0 log_channel(cluster) log [INF] : mon.cephqe3@0 won leader election with quorum 0,1,2,3
2015-10-13 21:55:14.933411 7f4bde543700  0 log_channel(cluster) log [INF] : HEALTH_OK
2015-10-13 21:55:14.934109 7f4bde543700  0 log_channel(cluster) log [WRN] : mon.3 clock skew 23.1797s > max 0.05s
2015-10-13 21:55:14.934184 7f4bde543700  0 log_channel(cluster) log [WRN] : mon.2 clock skew 7.02393s > max 0.05s
2015-10-13 21:55:14.952291 7f4bde543700  0 log_channel(cluster) log [INF] : monmap e6: 4 mons at {cephqe10=,cephqe11=,cephqe3=,ceph

Attaching the current leader mon log
Comment 3 Kefu Chai 2015-10-14 03:37:45 EDT
the ceph.conf on cephqe3 looks like

# cat /etc/ceph/ceph.conf
osd crush location hook = /usr/bin/calamari-crush-location
fsid = 3461ab41-2b16-4e45-a350-902fe73ea98a
mon_initial_members = cephqe4
mon_host =
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

so it's expected behaviour of "ceph" CLI and monitor. but i am not sure if it's expected from ceph-deploy's perspective. as per Tanay, he installed cephqe4 using "ceph-deploy install --mon <mon>", and added the other monitors using "ceph-deploy mon add <mon>", then put the cephqe4 offline.

seems we don't update the ceph.conf when adding a new monitor using ceph-deploy. @Alfredo, is this expected behaviour? i am looking at ceph-deploy/hosts/common.py, seems we are updating the conf file in mon_add(), but seems it does not rewrite the "monmap", "mon host", the "mon addr" in "mon[.*]" and "global" sections. the ceph CLI tries to buildup the initial monmap by reading those variables from the config file.

what if our user starts his/her cluster with only one monitor, and adds more of them, then let the very first monitor offline? this will leave him/her with a conf file with an out-dated monmap. and the CLI will not able to connect to the cluster with it.
Comment 4 Alfredo Deza 2015-10-14 08:10:08 EDT
ceph-deploy does not have anything to be able to *update* values in a configuration file arbitrarily. When creating a mon or adding a mon the same configuration file that exists locally in the CWD is then written on the remote host.

This seems to me like an advanced usage scenario, where we shouldn't expect ceph-deploy to be able to keep up with the effort of updating and synchronizing the configuration file.
Comment 5 Kefu Chai 2015-10-14 10:44:39 EDT
thank you, Alfredo!

Tanay, if you believe that we should update the document to address this issue. could you update the ticket accordingly? thanks.
Comment 6 Tanay Ganguly 2015-10-15 02:59:12 EDT
Hi Kefu/Alfredo,

How can we document this, as discussed earlier this can easily happen at customer place when customer who have single Initial Mon Node, later feels like expanding the Mon Cluster and add few more.

Then if something goes wrong with the original Mon Node then we will hit this Issue.

Workaround is:
We need to manually change the ceph.conf and remove the old entry.

Can we make the change in code, to avoid this workaround.
Comment 7 Ken Dreyer (Red Hat) 2015-10-20 11:38:33 EDT
To clarify: it sounds like we're saying the "Adding a Monitor" documentation should be updated to say "When adding a new monitor host, you should also add it to the 'mon initial members' configuration option in ceph.conf". Right?

The alternative is updating ceph-deploy to dynamically insert new monitors into ceph.conf on the fly. Unfortunately that ceph-deploy change is probably not going to happen any time soon :(
Comment 8 Federico Lucifredi 2015-10-21 12:42:11 EDT
Let's make it documentation per comment #7.
Comment 10 John Poelstra 2015-10-28 12:13:17 EDT
approved at program meeting on Oct 28,2015
Comment 12 Tanay Ganguly 2015-10-29 06:28:59 EDT

This changes only talks about removing the old Mon reference in ceph.conf file.

In addition to this, we also need to add the new monitor IP and hostname once the new Mon have been added to Cluster, changes should be again made in ceph.conf file.

Comment 13 Kefu Chai 2015-10-29 06:53:03 EDT
> We need to manually change the ceph.conf and remove the old entry.

and more importantly, to add the new one. because the client needs to create the initial monmap so it is able to contact at least one of the alive monitors to get the latest monmap, osdmap and other important cluster information.
Comment 20 Tanay Ganguly 2015-11-18 14:15:48 EST

As mentioned in Comment 17, we need to add both in the documentation.


Refer my above comment 17

Another minor change, the starting of the Important section:
If are adding a monitor to a cluster that has only one monitor

Maybe solution:
If you are adding a monitor to a cluster that has only one monitor
If adding a monitor to a cluster that has only one monitor
Comment 22 Tanay Ganguly 2015-11-19 04:01:48 EST
Marking this Bug as Verified.
Comment 23 Anjana Suparna Sriram 2015-12-18 04:59:28 EST
Fixed for 1.3.1 Release.

Note You need to log in before you can comment on or make changes to this bug.