Bug 2229151 - [GSS] Ceph mds daemon crashing Frequently. [NEEDINFO]
Summary: [GSS] Ceph mds daemon crashing Frequently.
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.10
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Xiubo Li
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-04 12:55 UTC by Manjunatha
Modified: 2023-08-16 06:35 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
xiubli: needinfo? (mmanjuna)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 60629 0 None None None 2023-08-04 13:07:10 UTC

Description Manjunatha 2023-08-04 12:55:04 UTC
Description of problem (please be detailed as possible and provide log
snippests):
We are frequently getting CephClusterWarningState alert on our prod and non-prod cluster.
$ ceph -s
  cluster:
    id:     16fff585-704d-499b-9084-bc3c97534601
    health: HEALTH_WARN
            8 daemons have recently crashed

  services:
    mon: 3 daemons, quorum a,f,g (age 3d)
    mgr: a(active, since 3d)
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 3d), 6 in (since 2y)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 369 pgs
    objects: 7.90M objects, 4.7 TiB
    usage:   15 TiB used, 8.9 TiB / 24 TiB avail
    pgs:     369 active+clean

  io:
    client:   4.6 MiB/s rd, 43 MiB/s wr, 270 op/s rd, 640 op/s wr

sh-4.4$ ceph health detail
HEALTH_WARN 8 daemons have recently crashed
[WRN] RECENT_CRASH: 8 daemons have recently crashed
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T01:40:26.576663Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-02T07:00:52.824774Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:24.797894Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T00:01:41.095726Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T06:00:36.311469Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T09:00:56.592728Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-03T17:40:37.542708Z
    mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7959f576k7sqg at 2023-08-04T01:00:43.379724Z
sh-4.4$

Version of all relevant components (if applicable):
ODF 4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes. 

Is there any workaround available to the best of your knowledge?
No

Can this issue reproducible?
Yes,


Note You need to log in before you can comment on or make changes to this bug.