2105881 – MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime

Bug 2105881 - MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime

Summary: MDS crash observed on 2 OCP clusters configured in Regional-DR setup with wor...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	CephFS
Sub Component:
Version:	5.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	5.2
Assignee:	Kotresh HR
QA Contact:	Amarnath
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2102272 2104790
TreeView+	depends on / blocked

Reported:	2022-07-11 04:42 UTC by ngangadh
Modified:	2022-08-11 17:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ceph-16.2.8-79.el8cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-09 17:39:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	56012	None	None	None	2022-07-12 06:16:52 UTC
Red Hat Issue Tracker	RHCEPH-4735	None	None	None	2022-07-11 04:44:02 UTC
Red Hat Product Errata	RHSA-2022:5997	None	None	None	2022-08-09 17:39:54 UTC

Description ngangadh 2022-07-11 04:42:52 UTC

MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime.


Version-Release number of selected component (if applicable):"ceph_version": "16.2.8-59.el8cp" 


How reproducible: observed 1/1


Steps to Reproduce:
1. Deployed 3 OCP clusters.
2. Configure RDR setup.
3. Run busybox-workloads 
4. Enable cephtool pod and login to ceph cluster.
5. Observed mds daemon crashes on both OCP clusters configured as MirrorPeers in RDR setup.

Actual results: mds crash observed 


Expected results: No crashes should be seen 


Additional info: OCP - 4.11, ODF - 4.11 with internal ceph deployed, ACM - 2.5.0, ceph version - 16.2.8-59.el8cp

Comment 1 RHEL Program Management 2022-07-11 04:43:17 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 16 Amarnath 2022-08-02 16:37:33 UTC

Hi Kotresh,

I tried the below steps and did not observe any crash of mds.

1. Created Filesystem with 1 active and 1 standby reply 
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      0   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s     0      0      0      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]# mkdir /mnt/ceph-fuse
2. mounted fuse client

[root@ceph-fix-amk-dyllph-node7 ~]# ceph-fuse /mnt/ceph-fuse
ceph-fuse[9273]: starting ceph client
2022-08-02T12:13:51.931-0400 7f322d7b1380 -1 init, newargv = 0x5638de6ed580 newargc=15
ceph-fuse[9273]: starting fuse
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      1   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)

3. mounted kernal client from different machine
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 2 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      2   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)

4. Got the client info
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:27:31.697-0400 7f73fe7f4700  0 client.25157 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:27:31.720-0400 7f73fe7f4700  0 client.15450 ms_handle_reset on v2:10.0.208.215:6800/3122095013
        "inst": "client.15423 v1:10.0.210.208:0/1379679400",
        "inst": "client.15411 10.0.209.23:0/482600482",

[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls

5. Blocked the client 1
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:28:32.614-0400 7ff0b2ffd700  0 client.15477 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:28:32.638-0400 7ff0b2ffd700  0 client.25196 ms_handle_reset on v2:10.0.208.215:6800/3122095013
        "inst": "client.15423 v1:10.0.210.208:0/1379679400",
        "inst": "client.15411 10.0.209.23:0/482600482",
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.210.208:0/1379679400
blocklisting 10.0.210.208:0/1379679400 until 2022-08-02T17:29:13.496250+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls

6. Bloked client 2
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.209.23:0/482600482
blocklisting 10.0.209.23:0/482600482 until 2022-08-02T17:29:41.487694+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      0   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]# 

NO crash observed 
Tested on : 
[root@ceph-fix-amk-dyllph-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
    },
    "overall": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 19
    }
}
[root@ceph-fix-amk-dyllph-node7 ~]# 

Can you please review above steps

Regards,
Amarnath

Comment 20 errata-xmlrpc 2022-08-09 17:39:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997

Note You need to log in before you can comment on or make changes to this bug.