MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime. Version-Release number of selected component (if applicable):"ceph_version": "16.2.8-59.el8cp" How reproducible: observed 1/1 Steps to Reproduce: 1. Deployed 3 OCP clusters. 2. Configure RDR setup. 3. Run busybox-workloads 4. Enable cephtool pod and login to ceph cluster. 5. Observed mds daemon crashes on both OCP clusters configured as MirrorPeers in RDR setup. Actual results: mds crash observed Expected results: No crashes should be seen Additional info: OCP - 4.11, ODF - 4.11 with internal ceph deployed, ACM - 2.5.0, ceph version - 16.2.8-59.el8cp
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
Hi Kotresh, I tried the below steps and did not observe any crash of mds. 1. Created Filesystem with 1 active and 1 standby reply [root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status cephfs - 0 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 0 0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 0 0 0 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 480k 56.9G cephfs.cephfs.data data 0 56.9G MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable) [root@ceph-fix-amk-dyllph-node7 ~]# mkdir /mnt/ceph-fuse 2. mounted fuse client [root@ceph-fix-amk-dyllph-node7 ~]# ceph-fuse /mnt/ceph-fuse ceph-fuse[9273]: starting ceph client 2022-08-02T12:13:51.931-0400 7f322d7b1380 -1 init, newargv = 0x5638de6ed580 newargc=15 ceph-fuse[9273]: starting fuse [root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status cephfs - 1 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 1 0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 480k 56.9G cephfs.cephfs.data data 0 56.9G MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable) 3. mounted kernal client from different machine [root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status cephfs - 2 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 2 0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 480k 56.9G cephfs.cephfs.data data 0 56.9G MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable) 4. Got the client info [root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst 2022-08-02T12:27:31.697-0400 7f73fe7f4700 0 client.25157 ms_handle_reset on v2:10.0.208.215:6800/3122095013 2022-08-02T12:27:31.720-0400 7f73fe7f4700 0 client.15450 ms_handle_reset on v2:10.0.208.215:6800/3122095013 "inst": "client.15423 v1:10.0.210.208:0/1379679400", "inst": "client.15411 10.0.209.23:0/482600482", [root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls 5. Blocked the client 1 [root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst 2022-08-02T12:28:32.614-0400 7ff0b2ffd700 0 client.15477 ms_handle_reset on v2:10.0.208.215:6800/3122095013 2022-08-02T12:28:32.638-0400 7ff0b2ffd700 0 client.25196 ms_handle_reset on v2:10.0.208.215:6800/3122095013 "inst": "client.15423 v1:10.0.210.208:0/1379679400", "inst": "client.15411 10.0.209.23:0/482600482", [root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.210.208:0/1379679400 blocklisting 10.0.210.208:0/1379679400 until 2022-08-02T17:29:13.496250+0000 (3600 sec) [root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls 6. Bloked client 2 [root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.209.23:0/482600482 blocklisting 10.0.209.23:0/482600482 until 2022-08-02T17:29:41.487694+0000 (3600 sec) [root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls [root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status cephfs - 0 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 0 0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 480k 56.9G cephfs.cephfs.data data 0 56.9G MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable) [root@ceph-fix-amk-dyllph-node7 ~]# NO crash observed Tested on : [root@ceph-fix-amk-dyllph-node7 ~]# ceph versions { "mon": { "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2 }, "osd": { "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 12 }, "mds": { "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2 }, "overall": { "ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 19 } } [root@ceph-fix-amk-dyllph-node7 ~]# Can you please review above steps Regards, Amarnath
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997