Bug 2331790 - nfs-ganesha 4.1+ server does not process reclaim_complete correctly after moving to another node
Summary: nfs-ganesha 4.1+ server does not process reclaim_complete correctly after mov...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NFS-Ganesha
Version: 6.0
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: 8.1
Assignee: Sachin Punadikar
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-12-11 21:30 UTC by jeff.a.smith
Modified: 2025-06-26 12:20 UTC (History)
4 users (show)

Fixed In Version: nfs-ganesha-6.5-9.el9cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2025-06-26 12:20:08 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-10343 0 None None None 2024-12-11 21:31:35 UTC
Red Hat Product Errata RHSA-2025:9775 0 None None None 2025-06-26 12:20:11 UTC

Description jeff.a.smith 2024-12-11 21:30:53 UTC
Description of problem:
After moving a ganesha instance (with its VIP) to another protocol node, our wire trace shows that client sent reclaim_complete, but log messages from the nfs-ganesha server show that it never processed / accounted for the reclaim_complete and a full grace period is enforced.   For our test scenario, we configured and ganesha cluster with 3 nfs-ganesha instances over a set of 6 protocol nodes, and then we placed 1 node into maintenance mode which changed the placement of the ganesha instance to another protocol node.

The ganesha instance only had a single client, and it was a 4.1 client.  The problem reproduced 100% using rados_cluster recovery backend, but it was less common (about 50%) using rados_ng and radon_kv recovery backends.   With rados_ng, first time the instance was moved, reclaim_complete processing usually worked as expected, but if reclaim_complete exited early the first time, it would usually enforce a full grace period when repeating the experiment to return to its original node.

When a nfs-ganesha instance is simply restarted without changing nodes, reclaim_complete is processed as expected, and the grace period exits early after the client sends reclaim_complete.

I configured nfs-ganesha with additional logging and these messages popped up from rados_cluster and rados_ng backends:

https://github.com/nfs-ganesha/nfs-ganesha/blob/c9ff03bb11397d525e8b768772a2a26b84628796/src/SAL/recovery/recovery_rados_cluster.c#L164


https://github.com/nfs-ganesha/nfs-ganesha/blob/c9ff03bb11397d525e8b768772a2a26b84628796/src/SAL/recovery/recovery_rados_ng.c#L312



Version-Release number of selected component (if applicable):


How reproducible: 100% on rados_cluster backend, about 50% with rados_ng


Steps to Reproduce:
1.configure a nfs-ganesha cluster for 4.1+ with 1 export accessible 1 client
2.setup the NFS4.1+ client, start a tcpdump capture, and mount the export.
3.open a file from the 4.1+ mount and start doing IO
4.move the nfs-ganesha (4.1+) instance (with its vip) to another node
5.observe that client IO is blocked because server enforces a full grace period even though the client sent reclaim_compolete and it was the only client.


Actual results:
After relocating the nfs-ganesha instance to another node, the client completed recovery and sent RECLAIM_COMPLETE.   It was the only client of the server, and the server enforced a full grace period as if the client never sent RC.


Expected results:
The nfs-ganesha server should have exited grace early after its only client sent RECLAIM_COMPLETE.   Instead it enforced a full grace period waiting for RC.

Additional info:

These experiments were performed on an Acadia Storage cluster, and the mechanism we were using to move the VIP used by the ganesha instance is unique to Acadia.  The mechanics of initiating HA-NFS failover in your lab env will be different, but that's fine.

This bz is about the rados_cluster and rados_ng recovery backends emitting log message linked above as they bail out early from reading in persisted lease info from their rados objects after restarting on another node.  After these messages are logged, the nfs-ganesha instance doesn't do reclaim_complete accounting / process RECLAIM_COMPLETE correctly and a full grace period is enforced.

The really strange part that I don't understand at all is that it isn't 100% reproducible for rados_ng.   Typically the first failover works, but then the fallback will reproduce.

Comment 1 Storage PM bot 2024-12-11 21:31:02 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 jeff.a.smith 2024-12-13 20:54:44 UTC
Severity set to medium.  When the nfs-ganesha server does not process reclaim complete correctly after moving to another node, it enforces a full grace period.  Until the nfs-ganesha server exits its grace period client workloads that create or destroy protocol state are blocked.   So NFS client impact is a temporary workload disruption until the nfs-ganesha server exits its grace period.

Comment 10 errata-xmlrpc 2025-06-26 12:20:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775


Note You need to log in before you can comment on or make changes to this bug.