Description of problem (please be detailed as possible and provide log snippests): MDS in standby-replay status remove by monitor repeatedly Version of all relevant components (if applicable): ODF 4.10 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Persistent in customer environment Can this issue reproduce from the UI? n/a If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: MDS removed by mon during replay Expected results: MDS standby-replay finish replay and join the cluster Additional info:
Hi Kotresh, Cluster is more stable now after migrating unused data and some workloads off cephfs. 3 areas that are hindering ODF performance and contributing to slow Ceph recovery. I continue to see a growing trend of customers deploying with rotational and OCP with MTU 1500. Continuing to see customers put db workloads on cephfs. MTU of 1500 would fall into the network issues category IMHO. Rotational devices config at the node layer. sh-4.4# lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME loop0 0 512 0 512 512 0 mq-deadline 256 128 0B loop1 0 512 0 512 512 0 mq-deadline 256 128 0B sda 0 512 0 512 512 1 mq-deadline 256 4096 4M |-sda1 0 512 0 512 512 1 mq-deadline 256 4096 4M |-sda2 0 512 0 512 512 1 mq-deadline 256 4096 4M |-sda3 0 512 0 512 512 1 mq-deadline 256 4096 4M `-sda4 0 512 0 512 512 1 mq-deadline 256 4096 4M `-coreos-luks-root-nocrypt 0 512 0 512 512 1 128 4096 4M sdb 0 512 0 512 512 1 mq-deadline 256 4096 4M sdc 0 512 0 512 512 1 mq-deadline 256 128 4M sdd 0 512 0 512 512 1 mq-deadline 256 128 4M MTU size is 1500 sh-4.4# ip link list 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:50:56:86:b8:b4 brd ff:ff:ff:ff:ff:ff DB workloads on cephfs sc and the # of objects this has created in cephfilesystem-metadata and data pools. omg get pv | grep cephfs| grep db ( snip of a few for example ) pvc-0d0299b2-da74-4e20-bf01-3c023a5adbc0 1Gi RWO Delete Bound sandbox-j16877r/mongodb ocs-storagecluster-cephfs 384d pvc-1ab487bc-5e93-4ce3-92d9-528aa4b07b2e 8Gi RWO Delete Bound sqa-neoload-web-test/mongodb ocs-storagecluster-cephfs 326d sh-4.4$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 12 TiB 4.2 TiB 7.8 TiB 7.8 TiB 65.06 TOTAL 12 TiB 4.2 TiB 7.8 TiB 7.8 TiB 65.06 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .rgw.root 1 8 4.7 KiB 16 2.8 MiB 0 678 GiB ocs-storagecluster-cephobjectstore.rgw.control 2 8 0 B 8 0 B 0 678 GiB ocs-storagecluster-cephblockpool 3 128 769 GiB 197.80k 2.3 TiB 53.16 678 GiB ocs-storagecluster-cephfilesystem-metadata 4 32 236 GiB 8.16M 682 GiB 25.11 678 GiB ocs-storagecluster-cephfilesystem-data0 5 116 301 GiB 22.99M 4.7 TiB 70.36 678 GiB We also uncovered lots of Kasten unneeded volume snapshots that customer cleaned up on cephfs. $ less namespaces/openshift-storage/oc_output/volumesnapshot_-A | grep cephfs | wc -l 14072