Bug 2233673 - Observed continuous messages related to "rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45" on 2 out of 4 nodes running ganesha service and server keep on going in 90 sec grace period [NEEDINFO]
Summary: Observed continuous messages related to "rados_cluster_grace_enforcing :CLIEN...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NFS-Ganesha
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.0
Assignee: Frank Filz
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-22 21:26 UTC by Manisha Saini
Modified: 2023-09-07 18:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
ffilz: needinfo? (brgardne)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7254 0 None None None 2023-08-22 21:29:24 UTC

Description Manisha Saini 2023-08-22 21:26:21 UTC
Description of problem:
=======================

On the scale lab cluster, 

 --> On Node 3 and Node 4, observed multiple logs related to "rados_cluster_grace_enforcing: ret=-45" followed by Sever going in 90 sec of GRACE period

[root@e24-h25-740xd ~]# cat f1 | grep "Server Now IN GRACE, duration 90" | wc -l
67

[root@e24-h25-740xd ~]# cat f1 | grep ":rados_cluster_grace_enforcing" | wc -l
66

[root@e24-h27-740xd ~]# cat f1 | grep ":rados_cluster_grace_enforcing" | wc -l
73

[root@e24-h27-740xd ~]#  cat f1 | grep "Server Now IN GRACE, duration 90" | wc -l
73

--------------------------------------

4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:20 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0)
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:24 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0)
4-h27-740xd-mfelxg[21201]: 19/08/2023 19:59:28 : epoch 64e11e0d : e24-h27-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45

-----------------------------------


 --> Also, On Node 1 , observed multiple logs related to "nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)"

[root@e24-h21-740xd var]# cat f1 | grep "check grace:reclaim complete(0)" | wc -l
1770

---------------------------------------

4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:20 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:30 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:40 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:26:50 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:00 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:10 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:20 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)
4-h21-740xd-jevbro[44672]: 19/08/2023 09:27:30 : epoch 64dff537 : e24-h21-740xd.alias.bos.scalelab.redhat.com : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)

--------------------------------------


Version-Release number of selected component (if applicable):
=============================================================

# rpm -qa | grep ganesha
nfs-ganesha-selinux-5.4-1.el9cp.noarch
nfs-ganesha-5.4-1.el9cp.x86_64
nfs-ganesha-rgw-5.4-1.el9cp.x86_64
nfs-ganesha-ceph-5.4-1.el9cp.x86_64
nfs-ganesha-rados-grace-5.4-1.el9cp.x86_64
nfs-ganesha-rados-urls-5.4-1.el9cp.x86_64


How reproducible:
==================

1/1


Steps to Reproduce:
==================
1.Configure ganesha on 4 node with HA
2.Create 1 cephFS volume and mount the volume on 3 clients with single VIP
3.On client 1, run the following workload

Client 1,2 and 3:
------
Create 1G file in loop using dd command along with fio 



Actual results:
=============
Ganesha server is going in grace period continuously 

rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=-45
nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 

On another server following logs were observed - 

nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(2)

Expected results:
==============
Continuous messages will flood ganesha.log
Ganesha server should not go in GRACE period multiple times. 

Additional info:

Comment 2 Manisha Saini 2023-09-05 08:44:30 UTC
Any Update on RCA?

Comment 5 Frank Filz 2023-09-06 20:04:01 UTC
Could you share the full config, not just the EXPORT config, but NFS_CORE_PARAM and NFSV4 etc.?


Note You need to log in before you can comment on or make changes to this bug.