Description of problem: All three MGR daemon crashed on the same time with the same abort message - ** Caught signal (Aborted) ** in thread 7f4117eb8700 thread_name:safe_timer Version-Release number of selected component (if applicable): RHCS 5 16.2.0-117.el8cp 0> 2021-09-03T20:45:37.923+0000 7f4117eb8700 -1 *** Caught signal (Aborted) ** in thread 7f4117eb8700 thread_name:safe_timer ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) 1: /lib64/libpthread.so.0(+0x12b20) [0x7f43aca0bb20] 2: gsignal() 3: abort() 4: /usr/bin/ceph-mgr(+0x154588) [0x55d650d08588] 5: (DaemonServer::adjust_pgs()+0x3f04) [0x55d650dc0c94] 6: (DaemonServer::tick()+0x103) [0x55d650dc5673] 7: (Context::complete(int)+0xd) [0x55d650d50c4d] 8: (SafeTimer::timer_thread()+0x1b7) [0x7f43adf0dc67] 9: (SafeTimerThread::entry()+0x11) [0x7f43adf0f241] 10: /lib64/libpthread.so.0(+0x814a) [0x7f43aca0114a] 11: clone()
# ceph -s cluster: id: 08890e38-0cc9-11ec-9c28-bc97e178dd80 health: HEALTH_WARN no active mgr services: mon: 3 daemons, quorum f18-h02-000-r640.rdu2.scalelab.redhat.com,f18-h07-000-r640,f18-h06-000-r640 (age 7h) mgr: no daemons active (since 2h) osd: 192 osds: 192 up (since 3h), 192 in (since 7h) rgw: 8 daemons active (8 hosts, 1 zones) data: pools: 7 pools, 2239 pgs objects: 2.50M objects, 9.4 TiB usage: 26 TiB used, 335 TiB / 361 TiB avail pgs: 2231 active+clean 7 active+clean+scrubbing+deep 1 active+clean+scrubbing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4105