Description of problem: Kernal Mount command returning error as MDS is laggy for unauthorized client Steps followed : 1. Created 2 filesystems(cephfs, cephfs1) 2. Authorized two clients, assigning each to a distinct filesystem (client1 for "cephfs" and client2 for "cephfs1"). 3. Attempted to mount "cephfs1" using client1, resulting in the error message: "mount error: no MDS server is up or the cluster is laggy." 4. Conversely, when attempting the mount operation with client2 on "cephfs1," it succeeded without errors. Observation and Question: 1. The encountered error suggests that there is no active MDS (Metadata Server) or a potential cluster lag when client1 attempts to mount "cephfs1." But filesystem is up and running 2. An expected behavior would be to receive an unauthorized error instead of the "No MDS server is UP" message Please find the commands executed [root@ceph-mirror-amk-pwsavd-node7 ~]# cat /etc//ceph/ceph.client.client1.keyring [client.client1] key = AQCGcIBlLD/cNBAA4ESJx6hW82bDOu3thGqI5w== caps mds = "allow rw fsname=cephfs" caps mon = "allow r fsname=cephfs" caps osd = "allow rw tag cephfs data=cephfs" [root@ceph-mirror-amk-pwsavd-node7 ~]# cat /etc//ceph/ceph.client.client2.keyring [client.client2] key = AQCQcIBlb9wiFxAAv0KUVguMQKY4cSejYsAOuQ== caps mds = "allow rw fsname=cephfs1" caps mon = "allow r fsname=cephfs1" caps osd = "allow rw tag cephfs data=cephfs1" [root@ceph-mirror-amk-pwsavd-node7 ~]# ceph fs status cephfs - 8 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-mirror-amk-pwsavd-node5.hjouso Reqs: 0 /s 1448 1302 87 12 1 active cephfs.ceph-mirror-amk-pwsavd-node4.xxssaw Reqs: 0 /s 331 282 28 12 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 4060M 47.8G cephfs.cephfs.data data 0 47.8G cephfs1 - 1 clients ======= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs1.ceph-mirror-amk-pwsavd-node2.dljcau Reqs: 0 /s 10 13 12 1 POOL TYPE USED AVAIL cephfs.cephfs1.meta metadata 96.0k 47.8G cephfs.cephfs1.data data 0 47.8G STANDBY MDS cephfs1.ceph-mirror-amk-pwsavd-node5.omovtd cephfs.ceph-mirror-amk-pwsavd-node6.xlqsfz MDS version: ceph version 18.2.0-128.el9cp (d38df712b9120eae50f448fe0847719d3567c2d1) reef (stable) [root@ceph-mirror-amk-pwsavd-node7 ~]# mount -t ceph 10.0.211.1,10.0.210.182,10.0.211.126:/ /mnt/test_client1 -o name=client1,secretfile=/etc/ceph/client1.secret,fs=cephfs1 mount error: no mds server is up or the cluster is laggy [root@ceph-mirror-amk-pwsavd-node7 ~]# mkdir /mnt/test_client2 [root@ceph-mirror-amk-pwsavd-node7 ~]# mount -t ceph 10.0.211.1,10.0.210.182,10.0.211.126:/ /mnt/test_client2 -o name=client2,secretfile=/etc/ceph/client2.secret,fs=cephfs mount error: no mds server is up or the cluster is laggy [root@ceph-mirror-amk-pwsavd-node7 ~]# mount -t ceph 10.0.211.1,10.0.210.182,10.0.211.126:/ /mnt/test_client2 -o name=client2,secretfile=/etc/ceph/client2.secret,fs=cephfs1 [root@ceph-mirror-amk-pwsavd-node7 ~]# mount -t ceph 10.0.211.1,10.0.210.182,10.0.211.126:/ /mnt/test_client1 -o name=client1,secretfile=/etc/ceph/client1.secret,fs=cephfs [root@ceph-mirror-amk-pwsavd-node7 ~]# Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to Amarnath from comment #0) > Description of problem: > Kernal Mount command returning error as MDS is laggy for unauthorized client > > Steps followed : > 1. Created 2 filesystems(cephfs, cephfs1) > 2. Authorized two clients, assigning each to a distinct filesystem (client1 > for "cephfs" and client2 for "cephfs1"). > 3. Attempted to mount "cephfs1" using client1, resulting in the error > message: "mount error: no MDS server is up or the cluster is laggy." > 4. Conversely, when attempting the mount operation with client2 on > "cephfs1," it succeeded without errors. > > Observation and Question: > > 1. The encountered error suggests that there is no active MDS (Metadata > Server) or a potential cluster lag when client1 attempts to mount "cephfs1." > But filesystem is up and running > 2. An expected behavior would be to receive an unauthorized error instead of > the "No MDS server is UP" message This error message is thrown by the mount helper when mount returns -EHOSTUNREACH errno and also by the kernel driver in the kernel ring buffer > [517841.491998] libceph: auth protocol 'cephx' msgr authentication failed: -13 > [517841.492242] ceph: No mds server is up or the cluster is laggy So, it looks like a generic message is thrown. Since we do get errno -13, I think specific error message can be shown.
(In reply to Venky Shankar from comment #1) > (In reply to Amarnath from comment #0) > > Description of problem: > > Kernal Mount command returning error as MDS is laggy for unauthorized client > > > > Steps followed : > > 1. Created 2 filesystems(cephfs, cephfs1) > > 2. Authorized two clients, assigning each to a distinct filesystem (client1 > > for "cephfs" and client2 for "cephfs1"). > > 3. Attempted to mount "cephfs1" using client1, resulting in the error > > message: "mount error: no MDS server is up or the cluster is laggy." > > 4. Conversely, when attempting the mount operation with client2 on > > "cephfs1," it succeeded without errors. > > > > Observation and Question: > > > > 1. The encountered error suggests that there is no active MDS (Metadata > > Server) or a potential cluster lag when client1 attempts to mount "cephfs1." > > But filesystem is up and running > > 2. An expected behavior would be to receive an unauthorized error instead of > > the "No MDS server is UP" message > > This error message is thrown by the mount helper when mount returns > -EHOSTUNREACH errno and also by the kernel driver in the kernel ring buffer > > > [517841.491998] libceph: auth protocol 'cephx' msgr authentication failed: -13 > > [517841.492242] ceph: No mds server is up or the cluster is laggy > > So, it looks like a generic message is thrown. Since we do get errno -13, I > think specific error message can be shown. Since you specified the invalidate the 'fsname' paramters and the ceph mon just returned the fsname list allowed, but failed to match and just return -2(-ENOENT) and just break by leaving the local mdsmap cache to be empty, and then in uplayer just before failing the mounting it will check the local mdsmap cache and found no MDS is up, then switched the errno to -113(-EHOSTUNREACH). The problem is the above also could happen when the cephx is disabled. So it's hard to distinguish which case it is. Venky, Maybe we should just change the debug logs to: "No mds server is up or the cluster is laggy" ---> "No mds server is up or the cluster is laggy or unauthorized" ? Thanks - Xiubo
(In reply to Xiubo Li from comment #2) > (In reply to Venky Shankar from comment #1) > > (In reply to Amarnath from comment #0) > > > Description of problem: > > > Kernal Mount command returning error as MDS is laggy for unauthorized client > > > > > > Steps followed : > > > 1. Created 2 filesystems(cephfs, cephfs1) > > > 2. Authorized two clients, assigning each to a distinct filesystem (client1 > > > for "cephfs" and client2 for "cephfs1"). > > > 3. Attempted to mount "cephfs1" using client1, resulting in the error > > > message: "mount error: no MDS server is up or the cluster is laggy." > > > 4. Conversely, when attempting the mount operation with client2 on > > > "cephfs1," it succeeded without errors. > > > > > > Observation and Question: > > > > > > 1. The encountered error suggests that there is no active MDS (Metadata > > > Server) or a potential cluster lag when client1 attempts to mount "cephfs1." > > > But filesystem is up and running > > > 2. An expected behavior would be to receive an unauthorized error instead of > > > the "No MDS server is UP" message > > > > This error message is thrown by the mount helper when mount returns > > -EHOSTUNREACH errno and also by the kernel driver in the kernel ring buffer > > > > > [517841.491998] libceph: auth protocol 'cephx' msgr authentication failed: -13 > > > [517841.492242] ceph: No mds server is up or the cluster is laggy > > > > So, it looks like a generic message is thrown. Since we do get errno -13, I > > think specific error message can be shown. > > Since you specified the invalidate the 'fsname' paramters and the ceph mon > just returned the fsname list allowed, but failed to match and just return > -2(-ENOENT) and just break by leaving the local mdsmap cache to be empty, > and then in uplayer just before failing the mounting it will check the local > mdsmap cache and found no MDS is up, then switched the errno to > -113(-EHOSTUNREACH). > > The problem is the above also could happen when the cephx is disabled. So > it's hard to distinguish which case it is. > > Venky, > > Maybe we should just change the debug logs to: > > "No mds server is up or the cluster is laggy" ---> "No mds server is up or > the cluster is laggy or unauthorized" ? Absolutely. There no need to complicate by sending the mdsmap on mismatch just for error string correctness. Neeraj, please create a tracker.
How about "No mds server is available — it may be laggy or down, or you may not be authorized"
(In reply to Greg Farnum from comment #4) > How about "No mds server is available — it may be laggy or down, or you may > not be authorized" Yeah, much better. Thanks!
Hi All, We are seeing updated error message concerning authorizion [root@ceph-nfs-fail-pff5tt-node8 ~]# ceph auth get client.client1 -o /etc/ceph/ceph.client.client1.keyring [root@ceph-nfs-fail-pff5tt-node8 ~]# mount -t ceph 10.0.211.170,10.0.209.108,10.0.211.130:/ /mnt/test_client1 -o name=client1,fs=cephfs [root@ceph-nfs-fail-pff5tt-node8 ~]# ceph auth get client.client2 -o /etc/ceph/ceph.client.client2.keyring [root@ceph-nfs-fail-pff5tt-node8 ~]# mount -t ceph 10.0.211.170,10.0.209.108,10.0.211.130:/ /mnt/test_client2 -o name=client2,fs=cephfs mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized [root@ceph-nfs-fail-pff5tt-node8 ~]# mount -t ceph 10.0.211.170,10.0.209.108,10.0.211.130:/ /mnt/test_client2 -o name=client2,fs=cephfs1 [root@ceph-nfs-fail-pff5tt-node8 ~]# ceph versions { "mon": { "ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable)": 3 }, "mgr": { "ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable)": 2 }, "osd": { "ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable)": 12 }, "mds": { "ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable)": 7 }, "overall": { "ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable)": 24 } } [root@ceph-nfs-fail-pff5tt-node8 ~]# [root@ceph-nfs-fail-pff5tt-node8 ~]# ceph fs status cephfs - 1 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph-nfs-fail-pff5tt-node7.gwjydi Reqs: 0 /s 177 72 71 1 1 active cephfs.ceph-nfs-fail-pff5tt-node4.azrceg Reqs: 0 /s 371 21 19 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 1401M 49.5G cephfs.cephfs.data data 0 49.5G cephfs1 - 1 clients ======= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs1.ceph-nfs-fail-pff5tt-node5.xrkvwi Reqs: 0 /s 10 13 12 1 POOL TYPE USED AVAIL cephfs.cephfs1.meta metadata 96.0k 49.5G cephfs.cephfs1.data data 0 49.5G STANDBY MDS cephfs.ceph-nfs-fail-pff5tt-node5.sumcxl cephfs.ceph-nfs-fail-pff5tt-node3.ydhhkv cephfs.ceph-nfs-fail-pff5tt-node6.ozpqjo cephfs1.ceph-nfs-fail-pff5tt-node2.jzzydo MDS version: ceph version 18.2.1-20.el9cp (171d20b9d47e6145ad666c10de8e45efe66b8f50) reef (stable) [root@ceph-nfs-fail-pff5tt-node8 ~]# Regards, Amarnath
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925