Description of problem (please be detailed as possible and provide log
snippests):
We are frequently getting CephClusterWarningState alert on our prod and non-prod cluster.
$ ceph -s
cluster:
id: 16fff585-704d-499b-9084-bc3c97534601
health: HEALTH_WARN
8 daemons have recently crashed
services:
mon: 3 daemons, quorum a,f,g (age 3d)
mgr: a(active, since 3d)
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 3d), 6 in (since 2y)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 369 pgs
objects: 7.90M objects, 4.7 TiB
usage: 15 TiB used, 8.9 TiB / 24 TiB avail
pgs: 369 active+clean
io:
client: 4.6 MiB/s rd, 43 MiB/s wr, 270 op/s rd, 640 op/s wr
sh-4.4$ ceph health detail
HEALTH_WARN 8 daemons have recently crashed
[WRN] RECENT_CRASH: 8 daemons have recently crashed
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T01:40:26.576663Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T07:00:52.824774Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:24.797894Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:41.095726Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T06:00:36.311469Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T09:00:56.592728Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T17:40:37.542708Z
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-04T01:00:43.379724Z
sh-4.4$
Version of all relevant components (if applicable):
ODF 4.10
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes.
Is there any workaround available to the best of your knowledge?
No
Can this issue reproducible?
Yes,
Description of problem (please be detailed as possible and provide log snippests): We are frequently getting CephClusterWarningState alert on our prod and non-prod cluster. $ ceph -s cluster: id: 16fff585-704d-499b-9084-bc3c97534601 health: HEALTH_WARN 8 daemons have recently crashed services: mon: 3 daemons, quorum a,f,g (age 3d) mgr: a(active, since 3d) mds: 1/1 daemons up, 1 hot standby osd: 6 osds: 6 up (since 3d), 6 in (since 2y) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 369 pgs objects: 7.90M objects, 4.7 TiB usage: 15 TiB used, 8.9 TiB / 24 TiB avail pgs: 369 active+clean io: client: 4.6 MiB/s rd, 43 MiB/s wr, 270 op/s rd, 640 op/s wr sh-4.4$ ceph health detail HEALTH_WARN 8 daemons have recently crashed [WRN] RECENT_CRASH: 8 daemons have recently crashed mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T01:40:26.576663Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T07:00:52.824774Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:24.797894Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:41.095726Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T06:00:36.311469Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T09:00:56.592728Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T17:40:37.542708Z mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-04T01:00:43.379724Z sh-4.4$ Version of all relevant components (if applicable): ODF 4.10 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes. Is there any workaround available to the best of your knowledge? No Can this issue reproducible? Yes,