Description of problem: From upstream tracker: http://tracker.ceph.com/issues/35985 StandbyPyModule::get_config is using state.with_config without dropping the GIL around taking the lock. The standby mgr process hangs without response, it is removed from mgrmap and does not retake active role when active mgr stops. without MGR daemon, ceph reports 0 space, which has impact on OSP spawning new instances, as the available space is checked. Version-Release number of selected component (if applicable): 12.2.8-52 How reproducible: Random Steps to Reproduce: 1. 2. 3. Actual results: standby mgr process stops responding, but process is still running - no msgs logged Expected results: standby mgr process is responding, if active mgr stops, one of stanbys mgr become active Additional info:
See analysis in https://tracker.ceph.com/issues/35985
https://github.com/ceph/ceph/pull/26613
We are still waiting on thread dumps to confirm this issue is the same as https://tracker.ceph.com/issues/35985.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911