Description of problem: By default, cephadm configures a nfs-ganesha cluster to embed hostname inside Sever_Scope. For NFS4.1+ clusters, this breaks state recovery after moving a ganesha instance (and vip) to another protocol node. The client reacts to the change in "Server_Scope" by not even attempting to recover state created from the prior incarnation/epoch of the nfs-ganesha instance that moved to another node. Instead, cephadm should configure every ganesha instance in the NFS cluster to use the same server_scope value by embedding name of the nfs-ganesha cluster instead of hostname. Version-Release number of selected component (if applicable): How reproducible: 100% for nfs-ganesha clusters spanning multiple protocol nodes Steps to Reproduce: 1. with cephadm cli, create a nfs-ganesha cluster spanning multiple protocol nodes 2. dump the nfs-ganesha config for each ganesha instance in the cluster and verify that the value of Server_Scope is the same. Currently this value is different for each instance because by default cephadm embeds hostname within the server_scope value. 3. Actual results: each ganesha instance in a nfs-ganesha cluster configured by cephadm has a unique value for Server_Scope Expected results: each ganesha instance in the same cluster must be configured to use the same value for Server_Scope Additional info: These experiments were conducted using Acadia Storage clusters, but Acadia storage is not required. This is just a bug in cephadm with the default config value for Server_Scope when creating a nfs-ganesha cluster.
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
I set severity to medium as the default ganesha config produced by cephadm results in a nfs-ganesha cluster such that each ganesha instance is configured with a unique value for "Server_Scope" (the default behavior embeds hostname in Server_Scope). The default nfs-ganesha config cannot support HANFS. The impact is that NFS4.1+ clients of a ganesha instance that is moved to another node will never be able to reclaim/recover their protocol state. This problem is easily mitigated by setting a Server_Scope value that is that same for all ganesha instances. We set Server_Scope to the nfs cluster name.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775