Bug 2374362
| Summary: | [8.1z backport] [NFS-Ganesha] - NFS Ganesha daemon crashes after upgrade | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Hemanth Kumar <hyelloji> | |
| Component: | NFS-Ganesha | Assignee: | Marcus Watts <mwatts> | |
| Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> | |
| Severity: | high | Docs Contact: | Rivka Pollack <rpollack> | |
| Priority: | unspecified | |||
| Version: | 8.1 | CC: | bkunal, cephqe-warriors, kkeithle, msaini, rpollack, tserlin, vereddy | |
| Target Milestone: | --- | Keywords: | External | |
| Target Release: | 8.1z2 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | nfs-ganesha-6.5-25.el9cp | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2385952 (view as bug list) | Environment: | ||
| Last Closed: | 2025-08-18 14:01:31 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2385952 | |||
| Bug Blocks: | ||||
This issue is not related to QoS. There are few wrong values in export block, which is causing the issue. Please change bug title to remove QoS from it. Currently i am checking the core. This issue is because of BYOK (Bring your own key) changes. In this code changes, export details are being printed even before load completed. We have the fix for this issue. Please change bug description and remove QoS from it. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.1 security and bug fix updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:14015 |
Description of problem: ------------------------ While running negative QoS functionality tests on an NFS cluster using Ceph CLI commands (with intentionally invalid parameters), the NFS Ganesha daemon crashed post upgrade to 8.1z1, entering an error state and dumping core. Version-Release number of selected component (if applicable): ---------- ceph version 19.2.1-224.el9cp (7a698d1865dee2d91ba1430045db051c4def6957) squid Steps to Reproduce: ----------- * Set up a Ceph NFS cluster: ceph nfs cluster create cephfs-nfs * Run a series of negative test CLI commands: #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare #ceph nfs cluster qos get cephfs-nfs #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerClient #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerCluster --max_export_write_bw 10MB --max_export_read_bw 20MB #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --max_export_write_bw 10AB --max_export_read_bw 20XY #ceph -s #ceph nfs cluster qos disable bandwidth_control cephfs-nfs #ceph nfs cluster qos get cephfs-nfs #ceph nfs export qos enable bandwidth_control cephfs-nfs /export1 --max_export_write_bw 10MB --max_export_read_bw 10MB #ceph nfs export create cephfs cephfs-nfs /export1 cephfs / #ceph nfs export qos enable bandwidth_control cephfs-nfs /export1 --max_export_write_bw 10MB --max_export_read_bw 10MB #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --combined-rw-bw-ctrl --max_export_write_bw 10MB --max_export_read_bw 10MB #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --max_export_write_bw 1000000000000000000MB --max_export_read_bw 1000000000000000000MB #ceph nfs cluster qos enable bandwidth_control cephfs-nfs PerShare --max_export_write_bw -100MB --max_export_read_bw -200MB * Upgrade the Cluster from 8.1(ceph version 19.2.1-222.el9cp) to 8.1z1 Early Test Build(ceph version 19.2.1-224.el9cp) * Observe the NFS daemon enters an error state: # ceph health detail HEALTH_WARN 1 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon nfs.cephfs-nfs.0.0.magna022 is in error state From the NFS Logs : -------------------- ?Stack trace Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] insert_gsh_export :RW LOCK :F_DBG :Unlocked 0x7f3720a15480 (&export_by_id.eid_lock) at ./support/export_mgr.c:270 Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] LogExportClients :RW LOCK :F_DBG :Got read lock on 0x556c107e0ba8 (&export->exp_lock) at ./support/exports.c:344 Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] LogExportClients :RW LOCK :F_DBG :Unlocked 0x556c107e0ba8 (&export->exp_lock) at ./support/exports.c:357 Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :CONFIG :INFO :Export 1 created at pseudo (/export1) with path (/) and tag ((null)) perms (options=022031e0/077801e7 no_root_squash, RWrw, -4-, ---, TCP, ----, , , , , expire= 0) Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :CONFIG :INFO :Export 1 has 0 defined clients Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_commit_common :EXPORT :F_DBG :put export ref for id 1 /export1, exp_refcount = 1 Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] export_display :EXPORT :M_DBG :RESULT 0x556c107e0a30 Export 1 pseudo (/export1) with path (/) and tag ((null)) perms (options=022031e0/077801e7 no_root_squash, RWrw, -4-, ---, TCP, ----, , , , , expire= 0) Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/cephfs-nfs/export-1":1): process block EXPORT link_mem = 0x7f3720964c5e Jun 23 12:48:44 magna022 ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx[1542269]: 23/06/2025 12:48:44 : epoch 68594d1f : magna022 : ganesha.nfsd-2[main] proc_block :CONFIG :F_DBG :------ At ("rados://.nfs/cephfs-nfs/export-1":1): do_block_init EXPORT Jun 23 12:48:45 magna022 systemd-coredump[1542331]: Process 1542273 (ganesha.nfsd) of user 0 dumped core. Stack trace of thread 2: #0 0x00007f3720665faa n/a (/usr/lib64/libc.so.6 + 0xacfaa) #1 0x00007f371e8e4b36 n/a (n/a + 0x0) ELF object binary architecture: AMD x86-64 Jun 23 12:48:46 magna022 podman[1542336]: 2025-06-23 12:48:46.120144576 +0000 UTC m=+0.029490491 container died 5e1cea8a21d6cbf15a19e94df84268dff8eddf9b34b979ac3dd7f3014486fbeb (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:15e81a41f0c70565818a9c78cb79edf821606255f0ff42b298588586407f836f, name=ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx, GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.display-name=IBM Storage Ceph 8, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, maintainer=Guillaume Abrioux <gabrioux>, vcs-type=git, distribution-scope=public, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-155.1.TEST.8.1z1, vendor=Red Hat, Inc., ceph=True, RELEASE=main, version=8, io.k8s.description=IBM Storage Ceph 8, GIT_BRANCH=main, CEPH_POINT_RELEASE=, com.redhat.license_terms=https://www.redhat.com/agreements, com.redhat.component=ibm-ceph-container, io.buildah.version=1.33.12, vcs-ref=3ece9b835558c52d067ecdc55b3bd164410a357f, GIT_CLEAN=True, build-date=2025-06-20T21:01:18, description=IBM Storage Ceph 8, io.openshift.expose-services=, io.openshift.tags=ibm ceph, architecture=x86_64, name=ibm-ceph, release=155.1.TEST.8.1z1) Jun 23 12:48:46 magna022 podman[1542336]: 2025-06-23 12:48:46.658236769 +0000 UTC m=+0.567582669 container remove 5e1cea8a21d6cbf15a19e94df84268dff8eddf9b34b979ac3dd7f3014486fbeb (image=cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9@sha256:15e81a41f0c70565818a9c78cb79edf821606255f0ff42b298588586407f836f, name=ceph-45abfd24-4835-11f0-8d6d-002590fc2a2e-nfs-cephfs-nfs-0-0-magna022-kgcjkx, summary=Provides the latest IBM Storage Ceph 8 in a fully featured and supported base image., version=8, release=155.1.TEST.8.1z1, RELEASE=main, CEPH_POINT_RELEASE=, distribution-scope=public, GIT_CLEAN=True, io.k8s.description=IBM Storage Ceph 8, GIT_REPO=https://github.com/ceph/ceph-container.git, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ibm-ceph/images/8-155.1.TEST.8.1z1, GIT_BRANCH=main, vendor=Red Hat, Inc., GIT_COMMIT=eadbe5f6c4471e17c1721f9f08dde7964a4f491b, io.k8s.display-name=IBM Storage Ceph 8, architecture=x86_64, com.redhat.license_terms=https://www.redhat.com/agreements, build-date=2025-06-20T21:01:18, maintainer=Guillaume Abrioux <gabrioux>, io.buildah.version=1.33.12, vcs-type=git, name=ibm-ceph, io.openshift.tags=ibm ceph, ceph=True, vcs-ref=3ece9b835558c52d067ecdc55b3bd164410a357f, io.openshift.expose-services=, description=IBM Storage Ceph 8, com.redhat.component=ibm-ceph-container) bt of the core : ----------------- (gdb) bt #0 0x00007fae48607faa in __strlen_sse2 () from /lib64/libc.so.6 #1 0x00007fae485bb6c8 in __vfprintf_internal () from /lib64/libc.so.6 #2 0x00007fae485da55a in __vsnprintf_internal () from /lib64/libc.so.6 #3 0x00007fae4881d152 in vsnprintf (__ap=0x7fff00613ba0, __fmt=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, __n=1947, __s=<optimized out>) at /usr/include/bits/stdio2.h:68 #4 display_vprintf (dspbuf=dspbuf@entry=0x7fff00613a60, fmt=fmt@entry=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, args=args@entry=0x7fff00613ba0) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/display.c:310 #5 0x00007fae4881de06 in display_log_component_level (component=COMPONENT_EXPORT, file=<optimized out>, line=1689, function=0x7fae48921dd8 <__func__.36.lto_priv.0> “export_display”, level=NIV_MID_DEBUG, format=0x7fae48922a50 “%s %p Export %d pseudo (%s) with path (%s) and tag (%s) perms (%s)“, arguments=0x7fff00613ba0) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/log_functions.c:1423 #6 0x00007fae4881e39b in DisplayLogComponentLevel (component=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>, level=<optimized out>, format=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/log/log_functions.c:1629 #7 0x00007fae48852b7e in export_display (step=0x7fae4891015f “DEFAULTS”, node=<optimized out>, link_mem=<optimized out>, self_struct=0x7fae48906c5e) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/support/exports.c:1689 #8 0x00007fae488075ca in proc_block (node=node@entry=0x55f81d99ead0, item=item@entry=0x7fae48984928 <unrelax_export_param.lto_priv+8>, link_mem=link_mem@entry=0x7fae48906c5e, err_type=err_type@entry=0x7fff00614810) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1354 #9 0x00007fae48808939 in proc_block (err_type=0x7fff00614810, link_mem=0x7fae48906c5e, item=0x7fae48984928 <unrelax_export_param.lto_priv+8>, node=0x55f81d99ead0) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1994 #10 load_config_from_parse (config=0x55f81d7ffe20, conf_blk=0x7fae48984920 <unrelax_export_param.lto_priv>, param=0x7fae48906c5e, unique=<optimized out>, err_type=0x7fff00614810) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/config_parsing/config_parsing.c:1994 #11 0x00007fae48854f3b in ReadExports (in_config=0x55f81d7ffe20, err_type=0x7fff00614810) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/support/exports.c:2739 #12 0x000055f81bcdc07e in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-20.el9cp.x86_64/src/MainNFSD/nfs_main.c:655 (gdb) Actual results: ---------------- The daemon crashes and dumps core on startup may be due to bad config, most likely triggered by persisted malformed parameters passed earlier, not sure though. Expected results: ----------------- The daemon should not crash due to incorrect user input. Logs and Cores are copied here : http://magna002.ceph.redhat.com/ceph-qe-logs/hemanth_kumar/BZ-Ceph-Upgrade-Crash/