Description of problem: ======================= On the scale lab cluster, --> On Node 3 and Node 4, observed multiple logs related to "rados_cluster_grace_enforcing: ret=-45" followed by Sever going in 90 sec of GRACE period [root@e24-h25-740xd ~]# cat f1 | grep "Server Now IN GRACE, duration 90" | wc -l 67 [root@e24-h25-740xd ~]# cat f1 | grep ":rados_cluster_grace_enforcing" | wc -l 66 [root@e24-h27-740xd ~]# cat f1 | grep ":rados_cluster_grace_enforcing" | wc -l 73 [root@e24-h27-740xd ~]# cat f1 | grep "Server Now IN GRACE, duration 90" | wc -l 73 -------------------------------------- 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:20 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45 ----------------------------------- --> Also, On Node 1 , observed multiple logs related to "nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)" [root@e24-h21-740xd var]# cat f1 | grep "check grace:reclaim complete(0)" | wc -l 1770 --------------------------------------- 4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:20 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:30 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:40 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:50 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:00 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:10 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:20 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) 4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:30 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) -------------------------------------- Version-Release number of selected component (if applicable): ============================================================= # rpm -qa | grep ganesha nfs-ganesha-selinux-5.4-1.el9cp.noarch nfs-ganesha-5.4-1.el9cp.x86_64 nfs-ganesha-rgw-5.4-1.el9cp.x86_64 nfs-ganesha-ceph-5.4-1.el9cp.x86_64 nfs-ganesha-rados-grace-5.4-1.el9cp.x86_64 nfs-ganesha-rados-urls-5.4-1.el9cp.x86_64 How reproducible: ================== 1/1 Steps to Reproduce: ================== 1.Configure ganesha on 4 node with HA 2.Create 1 cephFS volume and mount the volume on 3 clients with single VIP 3.On client 1, run the following workload Client 1,2 and 3: ------ Create 1G file in loop using dd command along with fio Actual results: ============= Ganesha server is going in grace period continuously rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45 nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 On another server following logs were observed - nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2) Expected results: ============== Continuous messages will flood ganesha.log Ganesha server should not go in GRACE period multiple times. Additional info:
Any Update on RCA?
Could you share the full config, not just the EXPORT config, but NFS_CORE_PARAM and NFSV4 etc.?