Description of problem: Stand-by MGRs are going down after enabling all mgr modules sequentially Version-Release number of selected component (if applicable): ceph-base-12.2.12-70.el7cp.x86_64 ceph-mgr-12.2.12-70.el7cp.x86_64 ceph-ansible-3.2.27-1.el7cp.noarch ceph-common-12.2.12-70.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1.Deploy ceph cluster with atleast 3 mgrs 2.Enable all available mgr modules one after another on active mgr Actual results: Stand-by mgrs are going down Expected results: Stand-by mgrs should be up and running Additional info: Stand-by mgr status is Active and Running but not displaying in ceph status
When you enable all the ceph-mgr modules, it looks like the standby mgrs are hitting a deadlock. The daemons are still running but they are not reachable by the cluster, i.e. ceph -s does not show the standbys anymore. Re-targetting to rados since this looks like a ceph-mgr daemon issue.
i am able to reproduce it. looking
upstream PR pending on review: https://github.com/ceph/ceph/pull/30623
fix pushed to ceph-3.3-rhel-patches
Thanks Thomas!
the upstream PR is actually https://github.com/ceph/ceph/pull/27639. my change duplicates it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3173