Bug 2332349

Summary: Need supported nfs-ganesha recovery backend for HA-NFS planned and unplanned failover
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: jeff.a.smith
Component: NFS-GaneshaAssignee: Sachin Punadikar <spunadik>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.0CC: cephqe-warriors, kkeithle, spunadik, tserlin
Target Milestone: ---   
Target Release: 8.1   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-6.5-9.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-26 12:20:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jeff.a.smith 2024-12-13 22:12:30 UTC
Description of problem:

Acadia Storage must deploy NFS-ganesha clusters with full support for HA-NFS failover (planned and unplanned).  For our nfs-ganesha clusters providing NFS 4.1+ (only), NFS clients will not always be subjected to a full grace period after failover.  Thanks to RECLAIM_COMPLETE, when all clients mount with NFS 4.1 or later, the nfs-ganesha server can exit grace early after all clients complete recovery.

When a ganesha instance starts its grace period after failover, the grace period must be enforced by all nodes across the nfs-ganesha cluster.  Our understanding is that only the "rados_cluster" recovery backend is capable of enforcing a cluster-wide grace period.  Whatever recovery backend is recommended by the nfs-ganesha dev team, it must support a cluster-wide grace period as we intend to make shares accessible from any cluster node.

The NFS-ganesha dev team has recommended that we use the rados_ng backend, but they are still investigating to ensure that it fully supports HA-NFS.

To date, we ran failover experiments with both rados_cluster and rados_ng backends, but we are not seeing consistent behavior.   After inspecting ganesha logs, we sometimes see "Unable to perform takeover with rados_ng recovery backend" from this code: https://github.com/nfs-ganesha/nfs-ganesha/blob/c9ff03bb11397d525e8b768772a2a26b84628796/src/SAL/recovery/recovery_rados_ng.c#L312

We saw a similar message from the rados_cluster recovery backend after failover
"Clustered rados backend does not support takeover!":
https://github.com/nfs-ganesha/nfs-ganesha/blob/c9ff03bb11397d525e8b768772a2a26b84628796/src/SAL/recovery/recovery_rados_cluster.c#L164

We created this Bugzilla issue to track delivery of a recommendation from nfs-ganesha dev as to which recovery backend fully supports HANFS failover (planned and unplanned).  We also need any nfs-ganesha configuration required to provide correct HANFS failover behavior.

1) Clients are able to reclaim/recover all previous state.

2) When only 4.1+ clients exist, the nfs-ganesha server will exit the grace period early after all clients send RECLAIM_COMPLETE.

3) When any nfs-ganesha server has a failover event, its grace period is enforced by all nfs-ganesha instances across the cluster.  We need this feature because a client's shares can be accessible simultaneously from any of the ganesha instances in the cluster.

4) Failover events must be seamless to NFS clients.   Clients must not be required to unmount and remount after a nfs-ganesha failover event (planned or unplanned).


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sachin Punadikar 2025-01-20 10:20:39 UTC
Hello,
IBM cloud team need to make use of "rados_ng" as recovery backend.

Code:
The required code is available at -
https://gitlab.cee.redhat.com/ceph/nfs-ganesha/-/tree/ceph-8.0-rhel-patches-for-ceph8.1-features  and/or
https://github.com/nfs-ganesha/nfs-ganesha (latest code has required change)

Configuration:
For making use of failover functionality, some config changes are required in ganesha.conf file and command line option.
The recovery backend should be mentioned as "rados_ng" in the NFSv4 block (as shown below).
        NFSv4 {
           RecoveryBackend = "rados_ng";
           Minor_Versions = 1, 2;
        }
Every node (physical system / VM / container), should have a unique numerical node id (like 1, 2 etc), and that should be part of ganesha startup command using option "I" (see below).
e.g.
 # env CEPH_CONF=/home/data/code/ceph8/ceph/build/ceph.conf ganesha.nfsd -L /home/data/code/ceph8/ceph/build/out/ganesha-a.log -f /home/data/code/ceph8/ceph/build/dev/ganesha.a/ganesha-a.conf -p /home/data/code/ceph8/ceph/build/out/ganesha-a.pid -I 1

Testing of failover:
Consider Ganesha instances with nodeids 1, 2, and 3.
When you want to failover node 1 to node 3:
1. Take down the IPs running on node 1
2. Ask Ganesha instances with nodeid 2 & 3 to enter in grace using ganesha_mgr:
   nodeid 2 - #ganesha_mgr grace “0”
   nodeid 3 - #ganesha_mgr grace “4:1"
3. After the reclaim/grace period is over, IPs hosted by nodeid 1 will be served by nodeid 3.

Comment 2 Sachin Punadikar 2025-01-20 10:21:25 UTC
Next step - working on code changes to have nodeid as part of config file.

Comment 3 Sachin Punadikar 2025-01-21 15:52:39 UTC
Code changes for supporting nodeid as part of config is pushed for review
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1207554

One can give nodeid as part of ganesha.conf file instead of command line parameter.
It should be part of RADOS_KV block as shown below. The nodeid is numeric value.

        RADOS_KV {
           pool = ".nfs";
           namespace = "vstart";
           UserId = "vstart";
           nodeid = 1;
        }

Comment 10 errata-xmlrpc 2025-06-26 12:20:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775