Bug 2105881
| Summary: | MDS crash observed on 2 OCP clusters configured in Regional-DR setup with workloads running for sometime | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | ngangadh |
| Component: | CephFS | Assignee: | Kotresh HR <khiremat> |
| Status: | CLOSED ERRATA | QA Contact: | Amarnath <amk> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.2 | CC: | akraj, ceph-eng-bugs, cephqe-warriors, gfarnum, khiremat, tserlin, vereddy, vshankar |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | 5.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.2.8-79.el8cp | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-09 17:39:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2102272, 2104790 | ||
|
Description
ngangadh
2022-07-11 04:42:52 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. Hi Kotresh,
I tried the below steps and did not observe any crash of mds.
1. Created Filesystem with 1 active and 1 standby reply
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 0
0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 0 0 0 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 480k 56.9G
cephfs.cephfs.data data 0 56.9G
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]# mkdir /mnt/ceph-fuse
2. mounted fuse client
[root@ceph-fix-amk-dyllph-node7 ~]# ceph-fuse /mnt/ceph-fuse
ceph-fuse[9273]: starting ceph client
2022-08-02T12:13:51.931-0400 7f322d7b1380 -1 init, newargv = 0x5638de6ed580 newargc=15
ceph-fuse[9273]: starting fuse
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 1 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 1
0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 480k 56.9G
cephfs.cephfs.data data 0 56.9G
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
3. mounted kernal client from different machine
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 2 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 2
0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 480k 56.9G
cephfs.cephfs.data data 0 56.9G
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
4. Got the client info
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:27:31.697-0400 7f73fe7f4700 0 client.25157 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:27:31.720-0400 7f73fe7f4700 0 client.15450 ms_handle_reset on v2:10.0.208.215:6800/3122095013
"inst": "client.15423 v1:10.0.210.208:0/1379679400",
"inst": "client.15411 10.0.209.23:0/482600482",
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls
5. Blocked the client 1
[root@ceph-fix-amk-dyllph-node7 ~]# ceph tell mds.0 client ls | grep inst
2022-08-02T12:28:32.614-0400 7ff0b2ffd700 0 client.15477 ms_handle_reset on v2:10.0.208.215:6800/3122095013
2022-08-02T12:28:32.638-0400 7ff0b2ffd700 0 client.25196 ms_handle_reset on v2:10.0.208.215:6800/3122095013
"inst": "client.15423 v1:10.0.210.208:0/1379679400",
"inst": "client.15411 10.0.209.23:0/482600482",
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.210.208:0/1379679400
blocklisting 10.0.210.208:0/1379679400 until 2022-08-02T17:29:13.496250+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls
6. Bloked client 2
[root@ceph-fix-amk-dyllph-node7 ~]# ceph osd blocklist add 10.0.209.23:0/482600482
blocklisting 10.0.209.23:0/482600482 until 2022-08-02T17:29:41.487694+0000 (3600 sec)
[root@ceph-fix-amk-dyllph-node7 ~]# ceph crash ls
[root@ceph-fix-amk-dyllph-node7 ~]# ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active cephfs.ceph-fix-amk-dyllph-node4.ekdixi Reqs: 0 /s 14 13 12 0
0-s standby-replay cephfs.ceph-fix-amk-dyllph-node5.xsajnk Evts: 0 /s 18 4 3 0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 480k 56.9G
cephfs.cephfs.data data 0 56.9G
MDS version: ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)
[root@ceph-fix-amk-dyllph-node7 ~]#
NO crash observed
Tested on :
[root@ceph-fix-amk-dyllph-node7 ~]# ceph versions
{
"mon": {
"ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 12
},
"mds": {
"ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 2
},
"overall": {
"ceph version 16.2.8-83.el8cp (b9e2e7dfc1a402ccdd33751fff71b4bb717017ff) pacific (stable)": 19
}
}
[root@ceph-fix-amk-dyllph-node7 ~]#
Can you please review above steps
Regards,
Amarnath
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997 |