Description of problem: ================= Deploy NFS-Ganesha using nfs-ganesha-6.5-1.2.el9cp. Add QoS-related parameters to the ganesha.conf file, create the export, and initiate IO operations from the mount point. The NFS-Ganesha service crashed and generated a core dump. --- Core was generated by `/usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT'. Program terminated with signal SIGSEGV, Segmentation fault. #0 setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:385 385 node->combined_rw_bw_control = qos_block->combined_rw_bw_control; [Current thread is 1 (LWP 77)] (gdb) bt #0 setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:385 #1 pspc_allocate_and_init_client (qos_block=0x0, client_addr=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:434 #2 pspc_alloc_init_add_client (qos_block=0x0, client_addr=<optimized out>, head=0x7f3b740089d8) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:453 #3 QoS_Process_pspc (op_type=1, data=0x7f3b7400ccd0, caller_data=0x7f3b7400afd0, size=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:831 #4 QoS_Process (size=<optimized out>, caller_data=0x7f3b7400afd0, data=0x7f3b7400ccd0, op_type=1) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:858 #5 0x00007f3b931eef94 in nfs4_op_write (op=0x7f3b7400c6b0, data=0x7f3b7400ccd0, resp=0x7f3b74012a10) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_op_write.c:501 #6 0x00007f3b931d0485 in process_one_op (data=data@entry=0x7f3b7400ccd0, status=status@entry=0x7f3b397f864c) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:905 #7 0x00007f3b931d1288 in nfs4_Compound (arg=<optimized out>, req=0x7f3b74009670, res=0x7f3b74012680) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:1386 #8 0x00007f3b93150bc5 in nfs_rpc_process_request (reqdata=<optimized out>, retry=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_worker_thread.c:1479 #9 0x00007f3b92ea25e7 in svc_request (xprt=0x7f3b40001330, xdrs=<optimized out>) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1229 #10 0x00007f3b92ea6e5a in svc_rqst_xprt_task_recv (wpe=<optimized out>) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1210 #11 0x00007f3b92ea991b in svc_rqst_epoll_loop (wpe=0x564e192190b8) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1585 #12 0x00007f3b92eb2cbc in work_pool_thread (arg=0x7f3b40016460) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/work_pool.c:187 #13 0x00007f3b92f52d22 in pthread_detach.5 () from /lib64/libc.so.6 #14 0x0000000000000000 in ?? () (gdb) ----- ganesha.log ---- Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 systemd-coredump[329301]: Process 329226 (ganesha.nfsd) of user 0 dumped core. Stack trace of thread 46: #0 0x00007f82a294ed11 n/a (/usr/lib64/libganesha_nfsd.so.6.5 + 0x5ad11) ELF object binary architecture: AMD x86-64 Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 podman[329306]: 2025-01-27 12:27:46.779033541 +0000 UTC m=+0.035736089 container died 2cc5d0c75c5318042b9492be0a36dbc079f50f0bf0c8e925dba4ca7ec7059d07 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:5c3a4ff92a3205922f1b4d25b43864013bd145a415cf922ff2e4fb33db5818e7, name=ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., GIT_CLEAN=True, GIT_REPO=https://github.com/ceph/ceph-container.git, CEPH_POINT_RELEASE=, GIT_BRANCH=main, com.redhat.license_terms=https://www.redhat.com/agreements, io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, vcs-type=git, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, RELEASE=main, ceph=True, com.redhat.component=rhceph-container, build-date=2025-01-20T13:41:51, io.openshift.expose-services=, vendor=Red Hat, Inc., distribution-scope=public, version=8, name=rhceph, release=228, description=Red Hat Ceph Storage 8, io.buildah.version=1.33.8, maintainer=Guillaume Abrioux <gabrioux>, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, architecture=x86_64, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-228) Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 podman[329306]: 2025-01-27 12:27:46.81613975 +0000 UTC m=+0.072842290 container remove 2cc5d0c75c5318042b9492be0a36dbc079f50f0bf0c8e925dba4ca7ec7059d07 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:5c3a4ff92a3205922f1b4d25b43864013bd145a415cf922ff2e4fb33db5818e7, name=ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj, GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, vcs-type=git, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., name=rhceph, description=Red Hat Ceph Storage 8, version=8, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>, GIT_CLEAN=True, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, distribution-scope=public, GIT_BRANCH=main, CEPH_POINT_RELEASE=, RELEASE=main, io.buildah.version=1.33.8, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-228, release=228, io.openshift.expose-services=, com.redhat.component=rhceph-container, ceph=True, io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, build-date=2025-01-20T13:41:51, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, architecture=x86_64, vendor=Red Hat, Inc.) Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Main process exited, code=exited, status=139/n/a Jan 27 12:27:47 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Failed with result 'exit-code'. Jan 27 12:27:47 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Consumed 1.225s CPU time. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Scheduled restart job, restart counter is at 5. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj for 2fefc25c-d8bb-11ef-a07d-fa163e4cf23a. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Consumed 1.225s CPU time. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Start request repeated too quickly. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Failed with result 'exit-code'. Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: Failed to start Ceph nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj for 2fefc25c-d8bb-11ef-a07d-fa163e4cf23a. ---------- Version-Release number of selected component (if applicable): # ceph --version ceph version 19.2.0-61.el9cp (1addfd37086eff688a3ec62ee4b6aa98d5982a31) squid (stable) # rpm -qa | grep nfs libnfsidmap-2.5.4-27.el9.x86_64 nfs-utils-2.5.4-27.el9.x86_64 nfs-ganesha-selinux-6.5-1.2.el9cp.noarch nfs-ganesha-6.5-1.2.el9cp.x86_64 nfs-ganesha-rgw-6.5-1.2.el9cp.x86_64 nfs-ganesha-ceph-6.5-1.2.el9cp.x86_64 nfs-ganesha-rados-grace-6.5-1.2.el9cp.x86_64 nfs-ganesha-rados-urls-6.5-1.2.el9cp.x86_64 How reproducible: ================= 1/1 Steps to Reproduce: ================== 1. Create nfs ganesha cluster on ceph 2. Enable Qos in ganesha.conf # ceph config-key get mgr/cephadm/services/nfs/ganesha.conf # {{ cephadm_managed }} NFS_CORE_PARAM { Enable_NLM = {{ enable_nlm }}; Enable_RQUOTA = false; Protocols = 3, 4; mount_path_pseudo = true; Allow_Set_Io_Flusher_Fail = true; Enable_UDP = false; NFS_Port = {{ port }}; {% if bind_addr %} Bind_addr = {{ bind_addr }}; {% endif %} {% if haproxy_hosts %} HAProxy_Hosts = {{ haproxy_hosts|join(", ") }}; {% endif %} } QOS_DEFAULT_CONFIG { enable_qos = true; enable_tokens = false; enable_bw_control = true; combined_rw_bw_control = true; combined_rw_token_control = false; qos_type = 3; max_export_write_bw = 41943040; max_export_read_bw = 83886080; max_client_write_bw = 10485760; max_client_read_bw = 20971520; max_export_read_tokens = 0; max_export_write_tokens = 0; max_client_read_tokens = 0; max_client_write_tokens = 0; export_read_tokens_renew_time = 0; export_write_tokens_renew_time = 0; client_read_tokens_renew_time = 0; client_write_tokens_renew_time = 0; } NFSv4 { Delegations = false; RecoveryBackend = 'rados_cluster'; Minor_Versions = 1, 2; {% if nfs_idmap_conf %} IdmapConf = "{{ nfs_idmap_conf }}"; {% endif %} } RADOS_KV { UserId = "{{ user }}"; nodeid = "{{ nodeid }}"; pool = "{{ pool }}"; namespace = "{{ namespace }}"; } RADOS_URLS { UserId = "{{ user }}"; watch_url = "{{ url }}"; } RGW { cluster = "ceph"; name = "client.{{ rgw_user }}"; } %url {{ url }} 3. Mount the export on NFS client and create a file using dd command Actual results: ========= NFS crashed and dumped core # ceph orch ps | grep nfs nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj ceph-manisaini-su4kp8-node2 *:2049 error 3m ago 17h - - <unknown> <unknown> <unknown> Expected results: ========= NFS should not crash Additional info: =========== # ceph -s cluster: id: 2fefc25c-d8bb-11ef-a07d-fa163e4cf23a health: HEALTH_WARN 1 failed cephadm daemon(s) services: mon: 3 daemons, quorum ceph-manisaini-su4kp8-node1-installer,ceph-manisaini-su4kp8-node3,ceph-manisaini-su4kp8-node2 (age 17h) mgr: ceph-manisaini-su4kp8-node1-installer.mvcgix(active, since 5h), standbys: ceph-manisaini-su4kp8-node3.muaqsz mds: 1/1 daemons up, 1 standby osd: 18 osds: 18 up (since 17h), 18 in (since 5d) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 9 pools, 721 pgs objects: 268 objects, 44 MiB usage: 2.0 GiB used, 268 GiB / 270 GiB avail pgs: 721 active+clean # ceph health detail HEALTH_WARN 1 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj on ceph-manisaini-su4kp8-node2 is in error state # ls 'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.326625.1737980809000000.zst' 'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.328786.1737980822000000.zst' 'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.328937.1737980836000000.zst' 'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.329088.1737980849000000.zst' 'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.329226.1737980865000000.zst'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:3635