Hello, I have linked a case where the customer hit > running 'scrub / recursive repair' results in active MDS crash Current state: cluster: id: 9af2f934-61de-462d-b5ab-25439dace333 health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged services: mon: 3 daemons, quorum b,f,g (age 2d) mgr: a(active, since 2d) mds: ocs-storagecluster-cephfilesystem:0/1 2 up:standby, 1 damaged Debug logs are uploaded to supportshell /cases/03370989 | 11 | 0110 | ceph-mds.log.tar.gz | 8346.13 | 2022-11-27 12:40 UTC | S3 | Yes | | 12 | 0120 | ceph-mds.ocs-storagecluster-cephfilesystem-a.log.bz2 | 187.72 | 2022-11-28 19:10 UTC | S3 | Yes | | 13 | 0130 | ceph-mds.ocs-storagecluster-cephfilesystem-b.log.bz2 | 591.04 | 2022-11-28 19:10 UTC | S3 | Yes | | 14 | 0140 | ceph-mds.ocs-storagecluster-cephfilesystem-b.log.bz2 | 11714.43 | 2022-11-28 19:14 UTC | S3 | Yes | | 15 | 0150 | ceph-mds.ocs-storagecluster-cephfilesystem-a.log.bz2 | 3834.11 | 2022-11-28 19:14 UTC | S3 | Yes | Please let me know if the data set is incomplete or if additional logs are needed.
customer update from my case 03370989 From what we have gathered our timeline was Fr, 25th ~14:20: mds pod is oom-killed and the standby pod went into up:replay Fr, 25th ~20:45: resource limits patched and scrub mds starts -> mds goes into down:damaged after liveness probes fail and the pod is once again killed by kublet Sa/Su 26th/27th: marking the mds as repaired leaves them in a stanby state Mo: We decided to recover from the journal which got things back to normal To recover we basically followed the ceph documentation (https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/) Recover metadata from journal: cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 event recover_dentries summary Truncated the journal cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 journal reset reset session map cephfs-table-tool ocs-storagecluster-cephfilesystem:0 reset session We then restarted the mds pods and saw one go into up:replay. However, the liveness probe didn't complete in time so we temporarily replaced the probe command with a simple echo to not have the pod repeatedly killed during replay. Replay finished after some 15 minutes and the filesystem was up again. Finally we ran the mds scrub ceph tell mds.0 scrub start / recursive repair
Hi Venkey, Unfortunately, the customer was unwilling to wait and applied the upstream solution. I think all the issues they encountered is a trend we are seeing more and more with MDS/ceph. Customer over utilization of cephfs sc. This case 03370989 $ less namespaces/openshift-storage/oc_output/volumesnapshot_-A | grep k10-csi-snap | wc -l 133 This volumesnapshot class retention policy is set to retain: k10-clone-ocs-storagecluster-cephfsplugin-snapclass. NAME DRIVER DELETIONPOLICY AGE k10-clone-ocs-storagecluster-cephfsplugin-snapclass openshift-storage.cephfs.csi.ceph.com Retain 38d ocs-storagecluster-cephfsplugin-snapclass openshift-storage.cephfs.csi.ceph.com Delete 657d ocs-storagecluster-rbdplugin-snapclass openshift-storage.rbd.csi.ceph.com Delete 657d $ less namespaces/openshift-storage/oc_output/volumesnapshotcontent | grep k10-clone-ocs-storagecluster-cephfsplugin-snapclass | wc -l 1042 RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 12 TiB 5.4 TiB 6.4 TiB 6.6 TiB 54.88 TOTAL 12 TiB 5.4 TiB 6.4 TiB 6.6 TiB 54.88 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR ocs-storagecluster-cephblockpool 1 52 GiB 13.83k 155 GiB 3.65 1.3 TiB N/A N/A 13.83k 0 B 0 B .rgw.root 2 4.6 KiB 16 2.8 MiB 0 1.3 TiB N/A N/A 16 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.control 3 0 B 8 0 B 0 1.3 TiB N/A N/A 8 0 B 0 B ocs-storagecluster-cephfilesystem-metadata 4 14 GiB 19.21k 15 GiB 0.38 1.3 TiB N/A N/A 19.21k 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.meta 5 3.7 KiB 14 2.3 MiB 0 1.3 TiB N/A N/A 14 0 B 0 B ocs-storagecluster-cephfilesystem-data0 6 319 GiB 37.01M 5.4 TiB 57.24 1.3 TiB N/A N/A 37.01M 0 B 0 B