Bug 2331781

Summary: cephadm generates nfs-ganesha config with incorrect server_scope value
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: jeff.a.smith
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.0CC: cephqe-warriors, ffilz, mobisht, msaini, tserlin
Target Milestone: ---   
Target Release: 8.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-19.2.1-33.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-26 12:20:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jeff.a.smith 2024-12-11 20:39:49 UTC
Description of problem: By default, cephadm configures a nfs-ganesha cluster to embed hostname inside Sever_Scope.  For NFS4.1+ clusters, this breaks state recovery after moving a ganesha instance (and vip) to another protocol node.  The client reacts to the change in "Server_Scope" by not even attempting to recover state created from the prior incarnation/epoch of the nfs-ganesha instance that moved to another node.

Instead, cephadm should configure every ganesha instance in the NFS cluster to use the same server_scope value by embedding name of the nfs-ganesha cluster instead of hostname.

Version-Release number of selected component (if applicable):


How reproducible: 100% for nfs-ganesha clusters spanning multiple protocol nodes


Steps to Reproduce:
1. with cephadm cli, create a nfs-ganesha cluster spanning multiple protocol nodes
2. dump the nfs-ganesha config for each ganesha instance in the cluster and verify that the value of Server_Scope is the same.   Currently this value is different for each instance because by default cephadm embeds hostname within the server_scope value.
3. 

Actual results:
each ganesha instance in a nfs-ganesha cluster configured by cephadm has a unique value for Server_Scope


Expected results:
each ganesha instance in the same cluster must be configured to use the same value for Server_Scope

Additional info:
These experiments were conducted using Acadia Storage clusters, but Acadia storage is not required.   This is just a bug in cephadm with the default config value for Server_Scope when creating a nfs-ganesha cluster.

Comment 1 Storage PM bot 2024-12-11 20:40:00 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 4 jeff.a.smith 2024-12-13 19:33:06 UTC
I set severity to medium as the default ganesha config produced by cephadm results in a nfs-ganesha cluster such that each ganesha instance is configured with a unique value for "Server_Scope" (the default behavior embeds hostname in Server_Scope).  The default nfs-ganesha config cannot support HANFS.  The impact is that NFS4.1+ clients of a ganesha instance that is moved to another node will never be able to reclaim/recover their protocol state.

This problem is easily mitigated by setting a Server_Scope value that is that same for all ganesha instances.   We set Server_Scope to the nfs cluster name.

Comment 12 errata-xmlrpc 2025-06-26 12:20:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775