Description of problem: ===================== NFS deployment over ceph cluster is failing with latest IBM 8.0 build. NFS daemon going into error state and mount is failing on clients. [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph nfs cluster create nfsganesha "ceph-msaini-qb43i4-node1-installer ceph-msaini-qb43i4-node2" [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph nfs cluster ls [ "nfsganesha" ] [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph nfs cluster info nfsganesha { "nfsganesha": { "backend": [ { "hostname": "ceph-msaini-qb43i4-node1-installer", "ip": "10.0.67.94", "port": 2049 }, { "hostname": "ceph-msaini-qb43i4-node2", "ip": "10.0.66.226", "port": 2049 } ], "virtual_ip": null } } [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph nfs export create cephfs nfsganesha /ganesha1 cephfs --path=/ { "bind": "/ganesha1", "cluster": "nfsganesha", "fs": "cephfs", "mode": "RW", "path": "/" } [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph nfs export info nfsganesha /ganesha1 { "access_type": "RW", "clients": [], "cluster_id": "nfsganesha", "export_id": 1, "fsal": { "fs_name": "cephfs", "name": "CEPH", "user_id": "nfs.nfsganesha.1" }, "path": "/", "protocols": [ 3, 4 ], "pseudo": "/ganesha1", "security_label": true, "squash": "none", "transports": [ "TCP" ] } [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph orch ps | grep nfs nfs.nfsganesha.0.0.ceph-msaini-qb43i4-node1-installer.wictxd ceph-msaini-qb43i4-node1-installer *:2049 error 3m ago 24m - - <unknown> <unknown> <unknown> nfs.nfsganesha.1.0.ceph-msaini-qb43i4-node2.qyecux ceph-msaini-qb43i4-node2 *:2049 error 3m ago 24m - - <unknown> <unknown> <unknown> [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph -s cluster: id: 92150a44-657c-11ef-99fc-fa163e39316c health: HEALTH_WARN 2 failed cephadm daemon(s) services: mon: 3 daemons, quorum ceph-msaini-qb43i4-node1-installer,ceph-msaini-qb43i4-node2,ceph-msaini-qb43i4-node3 (age 45m) mgr: ceph-msaini-qb43i4-node1-installer.xzasxy(active, since 46m), standbys: ceph-msaini-qb43i4-node2.fpcmtw mds: 1/1 daemons up, 1 standby osd: 18 osds: 18 up (since 43m), 18 in (since 44m) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 9 pools, 721 pgs objects: 222 objects, 458 KiB usage: 1.2 GiB used, 269 GiB / 270 GiB avail pgs: 721 active+clean [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph health detail HEALTH_WARN 2 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) daemon nfs.nfsganesha.0.0.ceph-msaini-qb43i4-node1-installer.wictxd on ceph-msaini-qb43i4-node1-installer is in error state daemon nfs.nfsganesha.1.0.ceph-msaini-qb43i4-node2.qyecux on ceph-msaini-qb43i4-node2 is in error state [root@ceph-msaini-qb43i4-node7 mnt]# mount -t nfs 10.0.67.94:/ganesha1 /mnt/ganesha/ Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /usr/lib/systemd/system/rpc-statd.service. mount.nfs: Connection refused Version-Release number of selected component (if applicable): ========================= [ceph: root@ceph-msaini-qb43i4-node1-installer /]# rpm -qa | grep nfs libnfsidmap-2.5.4-25.el9.x86_64 nfs-utils-2.5.4-25.el9.x86_64 nfs-ganesha-selinux-6.0-1.el9cp.noarch nfs-ganesha-6.0-1.el9cp.x86_64 nfs-ganesha-ceph-6.0-1.el9cp.x86_64 nfs-ganesha-rados-grace-6.0-1.el9cp.x86_64 nfs-ganesha-rados-urls-6.0-1.el9cp.x86_64 nfs-ganesha-rgw-6.0-1.el9cp.x86_64 [ceph: root@ceph-msaini-qb43i4-node1-installer /]# ceph --version ceph version 19.1.0-59.el9cp (22f005343241d56f3c48549d7a4ec0e3995538df) squid (rc) How reproducible: ================== Everytime Steps to Reproduce: ================= 1. Deploy NFS over ceph cluster Actual results: ============== Deployment failing with NFS daemon going into failed state Expected results: ================ Deployment should be successful Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10216
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days