Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2374362

Summary: [8.1z backport] [NFS-Ganesha] - NFS Ganesha daemon crashes after upgrade
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Kumar <hyelloji>
Component: NFS-GaneshaAssignee: Marcus Watts <mwatts>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 8.1CC: bkunal, cephqe-warriors, kkeithle, msaini, rpollack, tserlin, vereddy
Target Milestone: ---Keywords: External
Target Release: 8.1z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-6.5-25.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2385952 (view as bug list) Environment:
Last Closed: 2025-08-18 14:01:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2385952    
Bug Blocks:    

Description Hemanth Kumar 2025-06-23 17:47:53 UTC
Description of problem:
------------------------
While running negative QoS functionality tests on an NFS cluster using Ceph CLI commands (with intentionally invalid parameters), the NFS Ganesha daemon crashed post upgrade to 8.1z1, entering an error state and dumping core.


Version-Release number of selected component (if applicable):
----------
ceph version 19.2.1-224.el9cp (7a698d1865dee2d91ba1430045db051c4def6957) squid 

Steps to Reproduce:
-----------
* Set up a Ceph NFS cluster:
ceph nfs cluster create cephfs-nfs
* Run a series of negative test CLI commands:

  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare
  #ceph nfs cluster qos get cephfs-nfs
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerClient
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerCluster --max_export_write_bw 10MB --max_export_read_bw 20MB
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare  --max_export_write_bw 10AB --max_export_read_bw 20XY
  #ceph -s
  #ceph nfs cluster qos disable bandwidth_control cephfs-nfs
  #ceph nfs cluster qos get cephfs-nfs
  #ceph nfs export qos enable bandwidth_control cephfs-nfs /export1 --max_export_write_bw 10MB --max_export_read_bw 10MB
  #ceph nfs export create cephfs cephfs-nfs /export1 cephfs /
  #ceph nfs export qos enable bandwidth_control cephfs-nfs /export1 --max_export_write_bw 10MB --max_export_read_bw 10MB
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --combined-rw-bw-ctrl --max_export_write_bw 10MB --max_export_read_bw 10MB
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --max_export_write_bw 1000000000000000000MB --max_export_read_bw 1000000000000000000MB
  #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare  --max_export_write_bw -100MB --max_export_read_bw -200MB
  
* Upgrade the Cluster from 8.1(ceph version 19.2.1-222.el9cp) to 8.1z1 Early Test Build(ceph version 19.2.1-224.el9cp) 

* Observe the NFS daemon enters an error state:
# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon nfs.cephfs-nfs.0.0.magna022 is in error state


From the NFS Logs : 
--------------------
?Stack trace
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] insert_gsh_export :RW
 LOCK :F_DBG :Unlocked 0x7f3720a15480 (&export_by_id.eid_lock) at ./support/export_mgr.c:270
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] LogExportClients :RW
LOCK :F_DBG :Got read lock on 0x556c107e0ba8 (&export->exp_lock) at ./support/exports.c:344
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] LogExportClients :RW
LOCK :F_DBG :Unlocked 0x556c107e0ba8 (&export->exp_lock) at ./support/exports.c:357
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :CONFIG :INFO :Export 1 created at pseudo (/export1) with path (/) and tag ((null)) perms (options=022031e0/077801e7 no_root_squash, RWrw, -4-, ---, TCP, ----,               ,         ,                ,                , expire=       0)
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :CONFIG :INFO :Export 1 has 0 defined clients
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :EXPORT :F_DBG :put export ref for id 1 /export1, exp_refcount = 1
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_display :EXPORT :M_DBG :RESULT 0x556c107e0a30 Export 1 pseudo (/export1) with path (/) and tag ((null)) perms (options=022031e0/077801e7 no_root_squash, RWrw, -4-, ---, TCP, ----,               ,         ,                ,                , expire=       0)
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/cephfs-nfs/export-1":1): process block EXPORT link_mem = 0x7f3720964c5e
Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/cephfs-nfs/export-1":1): do_block_init EXPORT
Jun 23 12:48:45 magna022 systemd-coredump[1542331]: Process 1542273 (ganesha.nfsd) of user 0 dumped core.

                                                    Stack trace of thread 2:
                                                    #0  0x00007f3720665faa n/a (/usr/lib64/libc.so.6 + 0xacfaa)
                                                    #1  0x00007f371e8e4b36 n/a (n/a + 0x0)
                                                    ELF object binary architecture: AMD x86-64
Jun 23 12:48:46 magna022 podman[1542336]: 2025-06-23 12:48:46.120144576 +0000 UTC m=+0.029490491 container died 5e1cea8a21d6cbf15a19e94df84268dff8eddf9b34b979ac3dd7f3014486fbeb (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:15e81a41f0c70565818a9c78cb79edf821606255f0ff42b298588586407f836f, name=ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx, GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.display-name=IBM Storage Ceph 8, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, maintainer=Guillaume Abrioux <gabrioux>, vcs-type=git, distribution-scope=public, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-155.1.TEST.8.1z1, vendor=Red Hat, Inc., ceph=True, RELEASE=main, version=8, io.k8s.description=IBM Storage Ceph 8, GIT_BRANCH=main, CEPH_POINT_RELEASE=, com.redhat.license_terms=https://www.redhat.com/agreements, com.redhat.component=ibm-ceph-container, io.buildah.version=1.33.12, vcs-ref=3ece9b835558c52d067ecdc55b3bd164410a357f, GIT_CLEAN=True, build-date=2025-06-20T21:01:18, description=IBM Storage Ceph 8, io.openshift.expose-services=, io.openshift.tags=ibm ceph, architecture=x86_64, name=ibm-ceph, release=155.1.TEST.8.1z1)
Jun 23 12:48:46 magna022 podman[1542336]: 2025-06-23 12:48:46.658236769 +0000 UTC m=+0.567582669 container remove 5e1cea8a21d6cbf15a19e94df84268dff8eddf9b34b979ac3dd7f3014486fbeb (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:15e81a41f0c70565818a9c78cb79edf821606255f0ff42b298588586407f836f, name=ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., version=8, release=155.1.TEST.8.1z1, RELEASE=main, CEPH_POINT_RELEASE=, distribution-scope=public, GIT_CLEAN=True, io.k8s.description=IBM Storage Ceph 8, GIT_REPO=https://github.com/ceph/ceph-container.git, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-155.1.TEST.8.1z1, GIT_BRANCH=main, vendor=Red Hat, Inc., GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, io.k8s.display-name=IBM Storage Ceph 8, architecture=x86_64, com.redhat.license_terms=https://www.redhat.com/agreements, build-date=2025-06-20T21:01:18, maintainer=Guillaume Abrioux <gabrioux>, io.buildah.version=1.33.12, vcs-type=git, name=ibm-ceph, io.openshift.tags=ibm ceph, ceph=True, vcs-ref=3ece9b835558c52d067ecdc55b3bd164410a357f, io.openshift.expose-services=, description=IBM Storage Ceph 8, com.redhat.component=ibm-ceph-container)


bt of the core : 
-----------------
(gdb) bt
#0  0x00007fae48607faa in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007fae485bb6c8 in __vfprintf_internal () from /lib64/libc.so.6
#2  0x00007fae485da55a in __vsnprintf_internal () from /lib64/libc.so.6
#3  0x00007fae4881d152 in vsnprintf (__ap=0x7fff00613ba0, __fmt=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, __n=1947, __s=<optimized out>) at /usr/include/bits/stdio2.h:68
#4  display_vprintf (dspbuf=dspbuf@entry=0x7fff00613a60, fmt=fmt@entry=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, args=args@entry=0x7fff00613ba0)
    at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/display.c:310
#5  0x00007fae4881de06 in display_log_component_level (component=COMPONENT_EXPORT, file=<optimized out>, line=1689, function=0x7fae48921dd8 <__func__.36.lto_priv.0> “export_display”, level=NIV_MID_DEBUG,
    format=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, arguments=0x7fff00613ba0) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/log_functions.c:1423
#6  0x00007fae4881e39b in DisplayLogComponentLevel (component=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>, level=<optimized out>, format=<optimized out>)
    at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/log_functions.c:1629
#7  0x00007fae48852b7e in export_display (step=0x7fae4891015f “DEFAULTS”, node=<optimized out>, link_mem=<optimized out>, self_struct=0x7fae48906c5e) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/support/exports.c:1689
#8  0x00007fae488075ca in proc_block (node=node@entry=0x55f81d99ead0, item=item@entry=0x7fae48984928 <unrelax_export_param.lto_priv+8>, link_mem=link_mem@entry=0x7fae48906c5e, err_type=err_type@entry=0x7fff00614810)
    at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1354
#9  0x00007fae48808939 in proc_block (err_type=0x7fff00614810, link_mem=0x7fae48906c5e, item=0x7fae48984928 <unrelax_export_param.lto_priv+8>, node=0x55f81d99ead0)
    at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1994
#10 load_config_from_parse (config=0x55f81d7ffe20, conf_blk=0x7fae48984920 <unrelax_export_param.lto_priv>, param=0x7fae48906c5e, unique=<optimized out>, err_type=0x7fff00614810)
    at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1994
#11 0x00007fae48854f3b in ReadExports (in_config=0x55f81d7ffe20, err_type=0x7fff00614810) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/support/exports.c:2739
#12 0x000055f81bcdc07e in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/MainNFSD/nfs_main.c:655
(gdb)


Actual results:
----------------
The daemon crashes and dumps core on startup may be due to bad config, most likely triggered by persisted malformed parameters passed earlier, not sure though.

Expected results:
-----------------
The daemon should not crash due to incorrect user input.


Logs and Cores are copied here : http://magna002.ceph.redhat.com/ceph-qe-logs/hemanth_kumar/BZ-Ceph-Upgrade-Crash/

Comment 3 Naresh 2025-07-01 13:28:31 UTC
This issue is not related to QoS.

There are few wrong values in export block, which is causing the issue.

Please change bug title to remove QoS from it.

Currently i am checking the core.

Comment 4 Naresh 2025-07-04 13:04:00 UTC
This issue is because of BYOK (Bring your own key) changes.

In this code changes, export details are being printed even before load completed.

We have the fix for this issue.

Please change bug description and remove QoS from it.

Comment 12 errata-xmlrpc 2025-08-18 14:01:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:14015