Description of problem (please be detailed as possible and provide log snippests): Customer upgraded OpenShift from 4.8.36 to 4.9 and also upgraded ODF. After the upgrade, the MDS daemons would crash repeatedly (around 3 times per day). After trying many things with no success, the customer opted to upgrade again from 4.9 to 4.10. The MDS daemons are continuing to crash. The actual MDS pods show restarts, but they always go back to "Running" state (i.e. they are never in CrashLoopBackOff or other error states). The error reported by Ceph Crash (now on OCP4.10): sh-4.4$ ceph crash info 2022-11-09T22:08:40.143859Z_4c04a689-d1d7-457c-97c5-2d579518da57 { "assert_condition": "g_conf()->mds_wipe_sessions", "assert_file": "/builddir/build/BUILD/ceph-16.2.7/src/mds/journal.cc", "assert_func": "void EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)", "assert_line": 1618, "assert_msg": "/builddir/build/BUILD/ceph-16.2.7/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)' thread 7f90c0b25700 time 2022-11-09T22:08:40.137925+0000\n/builddir/build/BUILD/ceph-16.2.7/src/mds/journal.cc: 1618: FAILED ceph_assert(g_conf()->mds_wipe_sessions)\n", "assert_thread_name": "md_log_replay", "backtrace": [ "/lib64/libpthread.so.0(+0x12ce0) [0x7f90cfb3bce0]", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f90d0b4dd4f]", "/usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7f90d0b4df18]", "(EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5ae5) [0x562541693065]", "(EUpdate::replay(MDSRank*)+0x40) [0x562541694a80]", "(MDLog::_replay_thread()+0xcd1) [0x56254161adb1]", "(MDLog::ReplayThread::entry()+0x11) [0x56254131c941]", "/lib64/libpthread.so.0(+0x81cf) [0x7f90cfb311cf]", "clone()" ], "ceph_version": "16.2.7-126.el8cp", "crash_id": "2022-11-09T22:08:40.143859Z_4c04a689-d1d7-457c-97c5-2d579518da57", "entity_name": "mds.ocs-storagecluster-cephfilesystem-a", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "8.6 (Ootpa)", "os_version_id": "8.6", "process_name": "ceph-mds", "stack_sig": "52ebd581300a13e6933b6db0f2b6a61d1132bb285ec25dbeb28e31658f657a01", "timestamp": "2022-11-09T22:08:40.143859Z", "utsname_hostname": "rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6597794f54496", "utsname_machine": "x86_64", "utsname_release": "4.18.0-305.62.1.el8_4.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Thu Aug 11 12:07:27 EDT 2022" } Version of all relevant components (if applicable): OCP4.8 had no issues. Problems started the same day that OCP was upgraded from 4.8->4.9. After doing another upgrade from 4.9->4.10, the issue persists. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes. The MDS daemons crashing can cause the CephFS filesystem to go offline, which impacts applications in the environment. Is there any workaround available to the best of your knowledge? None. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 - just an upgrade Can this issue reproducible? Unknown. Can this issue reproduce from the UI? Unknown. If this is a regression, please provide more details to justify this: Unknown. Steps to Reproduce: 1. Run OCP 4.8 with OCS. 2. Upgrade to OCP 4.9. 3. Wait for the MDS daemons to start crashing. Actual results: MDS daemons crash about three times per day. Occasionally, the CephFS filesystem is corrupted and required repair. Expected results: MDS daemons should not crash. Additional info: This issue appears to be the same as BZ 2056935. Here is a list of most of the MDS crashes (the ones closest to the upgrade have been purged already). All of the crashes have basically the same output. sh-4.4$ ceph crash ls ID ENTITY NEW 2022-10-03T18:53:32.381587Z_bdbd732e-ff61-4aa4-9837-83199f84a7c1 mds.ocs-storagecluster-cephfilesystem-b 2022-10-05T10:50:44.778476Z_4f069073-9f97-4ce7-9954-2a9585b467e1 mds.ocs-storagecluster-cephfilesystem-b 2022-10-06T10:40:25.297537Z_93408c06-981a-4e1c-a1c3-1a8b899ab085 mds.ocs-storagecluster-cephfilesystem-b 2022-10-07T17:16:27.262168Z_bda8dc9b-9da1-4f4b-a6af-15c583d8fed4 mds.ocs-storagecluster-cephfilesystem-b 2022-10-07T20:07:47.310891Z_f51f7a12-4174-495c-b362-4fbd00e28816 mds.ocs-storagecluster-cephfilesystem-b 2022-10-08T18:17:31.609481Z_22b9eccc-3ee3-4519-9b48-195173d6271c mds.ocs-storagecluster-cephfilesystem-b 2022-10-08T22:16:30.086527Z_07963168-d4c1-4ff5-a5cb-d68802442061 mds.ocs-storagecluster-cephfilesystem-b 2022-10-09T02:43:17.565650Z_003b9811-118a-4eae-b456-df9330c3e2c9 mds.ocs-storagecluster-cephfilesystem-b 2022-10-09T05:38:38.911928Z_ee024171-894e-4c88-9286-fdc8313d4d1a mds.ocs-storagecluster-cephfilesystem-b 2022-10-09T11:46:16.590673Z_9bc5e1ac-cffc-41bc-b0c8-9c3c56f6abe0 mds.ocs-storagecluster-cephfilesystem-b 2022-10-10T00:10:05.999169Z_e06ef522-eec3-4b1d-b626-4554552ecccb mds.ocs-storagecluster-cephfilesystem-b 2022-10-10T10:08:36.686005Z_db3a983a-acc5-4b69-9678-664f8ee597f0 mds.ocs-storagecluster-cephfilesystem-b 2022-10-11T04:16:33.038254Z_b9950bb6-d1c6-4ea6-af25-757a70cc29b5 mds.ocs-storagecluster-cephfilesystem-b 2022-10-11T07:06:58.985809Z_5a98cd12-1d0c-4fa7-bbcf-c8c0e1af7d06 mds.ocs-storagecluster-cephfilesystem-b 2022-10-11T23:37:26.973810Z_bc8569b7-2a6b-44e2-8dc2-45111f98a52b mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:37:19.459402Z_3a583ace-0cbb-4654-baa3-d6051d9d9dd7 mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:37:22.507612Z_5fd6ff2c-101e-4795-a3ad-f63b77d5c687 mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:38:31.365456Z_a33fc777-4595-4127-b3f8-edee351ab1b2 mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:39:00.190922Z_14cc3bc9-6723-4b21-9502-2ac2f6a6c62a mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:39:49.053235Z_c9d391c8-c22b-4f82-88de-7100024e6367 mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:41:17.560776Z_d638be1f-7f6d-4908-b084-6b3fd90ef534 mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T13:44:02.872969Z_6a2d47b7-baff-4966-b43e-80258429572a mds.ocs-storagecluster-cephfilesystem-b 2022-10-12T23:47:48.381060Z_03b19db5-b25a-460c-94ff-49265dbdc7af mds.ocs-storagecluster-cephfilesystem-b 2022-10-13T07:16:04.379333Z_ecf68561-6ce7-460e-8d16-59b517a25077 mds.ocs-storagecluster-cephfilesystem-b 2022-10-14T01:15:49.768824Z_69431f01-5c92-43e2-9798-4e5eb83c88af mds.ocs-storagecluster-cephfilesystem-b 2022-10-14T13:11:27.619303Z_2350cdc7-8fdf-4d75-a88d-8c5e64e9ca1c mds.ocs-storagecluster-cephfilesystem-b 2022-10-15T20:38:39.696705Z_b7829023-a745-4af2-a5bc-d3744ef0ab73 mds.ocs-storagecluster-cephfilesystem-b 2022-10-15T23:22:53.837330Z_052120ad-6a9e-474c-8461-0e0e632806a2 mds.ocs-storagecluster-cephfilesystem-b 2022-10-16T04:09:57.956446Z_2388d4ea-b78f-4d68-87b7-dc71c48da7dd mds.ocs-storagecluster-cephfilesystem-b 2022-10-16T10:37:19.093492Z_d10e980d-c926-48c9-9ef2-beb3d60367ad mds.ocs-storagecluster-cephfilesystem-b 2022-10-18T08:03:37.247281Z_83cdcdb0-62bd-4294-bc5f-b1ed622f5fa1 mds.ocs-storagecluster-cephfilesystem-b 2022-10-18T20:45:40.365667Z_d4b6548f-95f6-47a1-95dd-f9b8ceb08101 mds.ocs-storagecluster-cephfilesystem-b 2022-10-19T03:16:32.387364Z_c2698703-b209-4825-8eaa-86e7d1794cda mds.ocs-storagecluster-cephfilesystem-b 2022-10-22T03:04:33.600885Z_9900ff19-f328-4ac4-a90a-4213474e05c3 mds.ocs-storagecluster-cephfilesystem-b 2022-10-22T07:59:10.834201Z_3eb1ef50-f4fb-451f-978c-24c5a377248a mds.ocs-storagecluster-cephfilesystem-b 2022-10-22T11:03:32.833396Z_6e8f4f91-3d90-4ba2-abe0-83a190aab285 mds.ocs-storagecluster-cephfilesystem-b 2022-11-02T13:31:48.620757Z_aa8b6658-bcd8-48fb-93be-8312a47a4bca mds.ocs-storagecluster-cephfilesystem-b 2022-11-03T05:06:23.371895Z_695c2dac-538a-4d7b-8f59-45724fb2d1e2 mds.ocs-storagecluster-cephfilesystem-b 2022-11-03T21:53:19.368170Z_ad1ec006-dfcb-42e8-9316-cb27baa02a40 mds.ocs-storagecluster-cephfilesystem-b 2022-11-04T01:30:11.849954Z_03d13983-7f87-4b99-97e4-476347bc30a4 mds.ocs-storagecluster-cephfilesystem-b 2022-11-04T03:49:25.907894Z_a6d4a662-8520-4005-9e72-a87714d1d058 mds.ocs-storagecluster-cephfilesystem-b 2022-11-04T11:52:09.209665Z_2a9b081b-c89a-4c6f-8745-a5688111f03d mds.ocs-storagecluster-cephfilesystem-b 2022-11-05T01:51:12.988572Z_3ea8b80b-5ad1-431c-81cd-416e2214c09c mds.ocs-storagecluster-cephfilesystem-b 2022-11-05T18:00:38.810350Z_8794997b-dc5d-4d55-9c8b-aef79d8bf0f0 mds.ocs-storagecluster-cephfilesystem-b 2022-11-06T15:06:09.812841Z_55ba0505-d200-44bc-a5cd-993a196ae14a mds.ocs-storagecluster-cephfilesystem-a 2022-11-09T22:08:40.143859Z_4c04a689-d1d7-457c-97c5-2d579518da57 mds.ocs-storagecluster-cephfilesystem-a *
Thanks, Steve. I'll have a look. (keeping NI)