Bug 2105881

Summary: MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: ngangadh
Component: CephFSAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.2CC: akraj, ceph-eng-bugs, cephqe-warriors, gfarnum, khiremat, tserlin, vereddy, vshankar
Target Milestone: ---Keywords: Regression
Target Release: 5.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.8-79.el8cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-09 17:39:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2102272, 2104790    

Description ngangadh 2022-07-11 04:42:52 UTC
MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime.


Version-Release number of selected component (if applicable):"ceph_version": "16.2.8-59.el8cp" 


How reproducible: observed 1/1


Steps to Reproduce:
1. Deployed 3 OCP clusters.
2. Configure RDR setup.
3. Run busybox-workloads 
4. Enable cephtool pod and login to ceph cluster.
5. Observed mds daemon crashes on both OCP clusters configured as MirrorPeers in RDR setup.

Actual results: mds crash observed 


Expected results: No crashes should be seen 


Additional info: OCP - 4.11, ODF - 4.11 with internal ceph deployed, ACM - 2.5.0, ceph version - 16.2.8-59.el8cp

Comment 1 RHEL Program Management 2022-07-11 04:43:17 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 16 Amarnath 2022-08-02 16:37:33 UTC
Hi Kotresh,

I tried the below steps and did not observe any crash of mds.

1. Created Filesystem with 1 active and 1 standby reply 
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      0   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s     0      0      0      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]# mkdir /mnt/ceph-fuse
2. mounted fuse client

[root@ceph-fix-amk-dyllph-node7 ~]# ceph-fuse /mnt/ceph-fuse
ceph-fuse[9273]: starting ceph client
2022-08-02T12:13:51.931-0400 7f322d7b1380 -1 init, newargv = 0x5638de6ed580 newargc=15
ceph-fuse[9273]: starting fuse
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      1   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)

3. mounted kernal client from different machine
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 2 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      2   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)

4. Got the client info
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:27:31.697-0400 7f73fe7f4700  0 client.25157 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:27:31.720-0400 7f73fe7f4700  0 client.15450 ms_handle_reset on v2:10.0.208.215:6800/3122095013
        "inst": "client.15423 v1:10.0.210.208:0/1379679400",
        "inst": "client.15411 10.0.209.23:0/482600482",

[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls

5. Blocked the client 1
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:28:32.614-0400 7ff0b2ffd700  0 client.15477 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:28:32.638-0400 7ff0b2ffd700  0 client.25196 ms_handle_reset on v2:10.0.208.215:6800/3122095013
        "inst": "client.15423 v1:10.0.210.208:0/1379679400",
        "inst": "client.15411 10.0.209.23:0/482600482",
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.210.208:0/1379679400
blocklisting 10.0.210.208:0/1379679400 until 2022-08-02T17:29:13.496250+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls

6. Bloked client 2
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.209.23:0/482600482
blocklisting 10.0.209.23:0/482600482 until 2022-08-02T17:29:41.487694+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK      STATE                         MDS                       ACTIVITY     DNS    INOS   DIRS   CAPS  
 0        active      cephfs.ceph-fix-amk-dyllph-node4.ekdixi  Reqs:    0 /s    14     13     12      0   
0-s   standby-replay  cephfs.ceph-fix-amk-dyllph-node5.xsajnk  Evts:    0 /s    18      4      3      0   
       POOL           TYPE     USED  AVAIL  
cephfs.cephfs.meta  metadata   480k  56.9G  
cephfs.cephfs.data    data       0   56.9G  
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]# 

NO crash observed 
Tested on : 
[root@ceph-fix-amk-dyllph-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
    },
    "overall": {
        "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 19
    }
}
[root@ceph-fix-amk-dyllph-node7 ~]# 

Can you please review above steps

Regards,
Amarnath

Comment 20 errata-xmlrpc 2022-08-09 17:39:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997