2342266 – [NFS-Ganesha] Ganesha crashed in setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) after enabling the QoS feature

Bug 2342266 - [NFS-Ganesha] Ganesha crashed in setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) after enabling the QoS feature [NEEDINFO]

Summary: [NFS-Ganesha] Ganesha crashed in setNode_pc (qos_block=0x0, client_addr=<opti...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	NFS-Ganesha
Sub Component:
Version:	8.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	8.0z3
Assignee:	Naresh
QA Contact:	Manisha Saini
Docs Contact:	Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2025-01-27 15:25 UTC by Manisha Saini
Modified:	2025-04-07 15:26 UTC (History)
CC List:	8 users (show)
Fixed In Version:	nfs-ganesha-6.5-5.el9cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2025-04-07 15:26:06 UTC
Embargoed:
Dependent Products:
Flags:	ffilz: needinfo? (nchillar)

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-10509	0	None	None	None	2025-01-27 15:27:44 UTC
Red Hat Product Errata	RHSA-2025:3635	0	None	None	None	2025-04-07 15:26:09 UTC

Description Manisha Saini 2025-01-27 15:25:25 UTC

Description of problem:
=================

Deploy NFS-Ganesha using nfs-ganesha-6.5-1.2.el9cp.
Add QoS-related parameters to the ganesha.conf file, create the export, and initiate IO operations from the mount point.
The NFS-Ganesha service crashed and generated a core dump.

---
Core was generated by `/usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:385
385		node->combined_rw_bw_control = qos_block->combined_rw_bw_control;
[Current thread is 1 (LWP 77)]
(gdb) bt
#0  setNode_pc (qos_block=0x0, client_addr=<optimized out>, node=0x7f3b74012ac0) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:385
#1  pspc_allocate_and_init_client (qos_block=0x0, client_addr=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:434
#2  pspc_alloc_init_add_client (qos_block=0x0, client_addr=<optimized out>, head=0x7f3b740089d8) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:453
#3  QoS_Process_pspc (op_type=1, data=0x7f3b7400ccd0, caller_data=0x7f3b7400afd0, size=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:831
#4  QoS_Process (size=<optimized out>, caller_data=0x7f3b7400afd0, data=0x7f3b7400ccd0, op_type=1) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_qos.c:858
#5  0x00007f3b931eef94 in nfs4_op_write (op=0x7f3b7400c6b0, data=0x7f3b7400ccd0, resp=0x7f3b74012a10) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_op_write.c:501
#6  0x00007f3b931d0485 in process_one_op (data=data@entry=0x7f3b7400ccd0, status=status@entry=0x7f3b397f864c) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:905
#7  0x00007f3b931d1288 in nfs4_Compound (arg=<optimized out>, req=0x7f3b74009670, res=0x7f3b74012680) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/Protocols/NFS/nfs4_Compound.c:1386
#8  0x00007f3b93150bc5 in nfs_rpc_process_request (reqdata=<optimized out>, retry=<optimized out>) at /usr/src/debug/nfs-ganesha-6.5-1.2.el9cp.x86_64/src/MainNFSD/nfs_worker_thread.c:1479
#9  0x00007f3b92ea25e7 in svc_request (xprt=0x7f3b40001330, xdrs=<optimized out>) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1229
#10 0x00007f3b92ea6e5a in svc_rqst_xprt_task_recv (wpe=<optimized out>) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1210
#11 0x00007f3b92ea991b in svc_rqst_epoll_loop (wpe=0x564e192190b8) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/svc_rqst.c:1585
#12 0x00007f3b92eb2cbc in work_pool_thread (arg=0x7f3b40016460) at /usr/src/debug/libntirpc-6.3-1.el9cp.x86_64/src/work_pool.c:187
#13 0x00007f3b92f52d22 in pthread_detach.5 () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()
(gdb)
-----

ganesha.log
----
Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
Jan 27 12:27:42 ceph-manisaini-su4kp8-node2 ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj[329222]: 27/01/2025 12:27:42 : epoch 67977bbd : ceph-manisaini-su4kp8-node2 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 systemd-coredump[329301]: Process 329226 (ganesha.nfsd) of user 0 dumped core.

                                                                      Stack trace of thread 46:
                                                                      #0  0x00007f82a294ed11 n/a (/usr/lib64/libganesha_nfsd.so.6.5 + 0x5ad11)
                                                                      ELF object binary architecture: AMD x86-64
Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 podman[329306]: 2025-01-27 12:27:46.779033541 +0000 UTC m=+0.035736089 container died 2cc5d0c75c5318042b9492be0a36dbc079f50f0bf0c8e925dba4ca7ec7059d07 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:5c3a4ff92a3205922f1b4d25b43864013bd145a415cf922ff2e4fb33db5818e7, name=ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., GIT_CLEAN=True, GIT_REPO=https://github.com/ceph/ceph-container.git, CEPH_POINT_RELEASE=, GIT_BRANCH=main, com.redhat.license_terms=https://www.redhat.com/agreements, io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, vcs-type=git, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, RELEASE=main, ceph=True, com.redhat.component=rhceph-container, build-date=2025-01-20T13:41:51, io.openshift.expose-services=, vendor=Red Hat, Inc., distribution-scope=public, version=8, name=rhceph, release=228, description=Red Hat Ceph Storage 8, io.buildah.version=1.33.8, maintainer=Guillaume Abrioux <gabrioux>, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, architecture=x86_64, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-228)
Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 podman[329306]: 2025-01-27 12:27:46.81613975 +0000 UTC m=+0.072842290 container remove 2cc5d0c75c5318042b9492be0a36dbc079f50f0bf0c8e925dba4ca7ec7059d07 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:5c3a4ff92a3205922f1b4d25b43864013bd145a415cf922ff2e4fb33db5818e7, name=ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a-nfs-nfsganesha-0-0-ceph-manisaini-su4kp8-node2-wdszsj, GIT_REPO=https://github.com/ceph/ceph-container.git, io.k8s.display-name=Red Hat Ceph Storage 8 on RHEL 9, vcs-type=git, summary=Provides the latest Red Hat Ceph Storage 8 on RHEL 9 in a fully featured and supported base image., name=rhceph, description=Red Hat Ceph Storage 8, version=8, com.redhat.license_terms=https://www.redhat.com/agreements, maintainer=Guillaume Abrioux <gabrioux>, GIT_CLEAN=True, vcs-ref=6bc17c430374b15a8dee08107281b6a4fa5b8ce9, distribution-scope=public, GIT_BRANCH=main, CEPH_POINT_RELEASE=, RELEASE=main, io.buildah.version=1.33.8, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/8-228, release=228, io.openshift.expose-services=, com.redhat.component=rhceph-container, ceph=True, io.k8s.description=Red Hat Ceph Storage 8, io.openshift.tags=rhceph ceph, build-date=2025-01-20T13:41:51, GIT_COMMIT=55ad0f204a1d654ee565abf874aecad0cc209d0e, architecture=x86_64, vendor=Red Hat, Inc.)
Jan 27 12:27:46 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Main process exited, code=exited, status=139/n/a
Jan 27 12:27:47 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Failed with result 'exit-code'.
Jan 27 12:27:47 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Consumed 1.225s CPU time.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Scheduled restart job, restart counter is at 5.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: Stopped Ceph nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj for 2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Consumed 1.225s CPU time.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Start request repeated too quickly.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: ceph-2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.0.0.ceph-manisaini-su4kp8-node2.wdszsj.service: Failed with result 'exit-code'.
Jan 27 12:27:57 ceph-manisaini-su4kp8-node2 systemd[1]: Failed to start Ceph nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj for 2fefc25c-d8bb-11ef-a07d-fa163e4cf23a.
----------


Version-Release number of selected component (if applicable):

# ceph --version
ceph version 19.2.0-61.el9cp (1addfd37086eff688a3ec62ee4b6aa98d5982a31) squid (stable)

# rpm -qa | grep nfs
libnfsidmap-2.5.4-27.el9.x86_64
nfs-utils-2.5.4-27.el9.x86_64
nfs-ganesha-selinux-6.5-1.2.el9cp.noarch
nfs-ganesha-6.5-1.2.el9cp.x86_64
nfs-ganesha-rgw-6.5-1.2.el9cp.x86_64
nfs-ganesha-ceph-6.5-1.2.el9cp.x86_64
nfs-ganesha-rados-grace-6.5-1.2.el9cp.x86_64
nfs-ganesha-rados-urls-6.5-1.2.el9cp.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
==================

1. Create nfs ganesha cluster on ceph 

2. Enable Qos in ganesha.conf

# ceph config-key  get mgr/cephadm/services/nfs/ganesha.conf
# {{ cephadm_managed }}
NFS_CORE_PARAM {
        Enable_NLM = {{ enable_nlm }};
        Enable_RQUOTA = false;
        Protocols = 3, 4;
        mount_path_pseudo = true;
        Allow_Set_Io_Flusher_Fail = true;
        Enable_UDP = false;
        NFS_Port = {{ port }};
{% if bind_addr %}
        Bind_addr = {{ bind_addr }};
{% endif %}
{% if haproxy_hosts %}
        HAProxy_Hosts = {{ haproxy_hosts|join(", ") }};
{% endif %}
}
QOS_DEFAULT_CONFIG {
        enable_qos = true;
        enable_tokens = false;
        enable_bw_control = true;
        combined_rw_bw_control = true;
        combined_rw_token_control = false;
        qos_type = 3;
        max_export_write_bw = 41943040;
        max_export_read_bw = 83886080;
        max_client_write_bw = 10485760;
        max_client_read_bw = 20971520;
        max_export_read_tokens = 0;
        max_export_write_tokens = 0;
        max_client_read_tokens = 0;
        max_client_write_tokens = 0;
        export_read_tokens_renew_time = 0;
        export_write_tokens_renew_time = 0;
        client_read_tokens_renew_time = 0;
        client_write_tokens_renew_time = 0;
}
NFSv4 {
        Delegations = false;
        RecoveryBackend = 'rados_cluster';
        Minor_Versions = 1, 2;
{% if nfs_idmap_conf %}
        IdmapConf = "{{ nfs_idmap_conf }}";
{% endif %}
}

RADOS_KV {
        UserId = "{{ user }}";
        nodeid = "{{ nodeid }}";
        pool = "{{ pool }}";
        namespace = "{{ namespace }}";
}

RADOS_URLS {
        UserId = "{{ user }}";
        watch_url = "{{ url }}";
}

RGW {
        cluster = "ceph";
        name = "client.{{ rgw_user }}";
}

%url    {{ url }}

3. Mount the export on NFS client and create a file using dd command

Actual results:
=========
NFS crashed and dumped core

# ceph orch ps | grep nfs
nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj  ceph-manisaini-su4kp8-node2            *:2049            error             3m ago  17h        -        -  <unknown>        <unknown>     <unknown>


Expected results:
=========
NFS should not crash


Additional info:
===========

# ceph -s
  cluster:
    id:     2fefc25c-d8bb-11ef-a07d-fa163e4cf23a
    health: HEALTH_WARN
            1 failed cephadm daemon(s)

  services:
    mon: 3 daemons, quorum ceph-manisaini-su4kp8-node1-installer,ceph-manisaini-su4kp8-node3,ceph-manisaini-su4kp8-node2 (age 17h)
    mgr: ceph-manisaini-su4kp8-node1-installer.mvcgix(active, since 5h), standbys: ceph-manisaini-su4kp8-node3.muaqsz
    mds: 1/1 daemons up, 1 standby
    osd: 18 osds: 18 up (since 17h), 18 in (since 5d)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 721 pgs
    objects: 268 objects, 44 MiB
    usage:   2.0 GiB used, 268 GiB / 270 GiB avail
    pgs:     721 active+clean

# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon nfs.nfsganesha.0.0.ceph-manisaini-su4kp8-node2.wdszsj on ceph-manisaini-su4kp8-node2 is in error state


# ls
'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.326625.1737980809000000.zst'
'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.328786.1737980822000000.zst'
'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.328937.1737980836000000.zst'
'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.329088.1737980849000000.zst'
'core.ganesha\x2enfsd.0.0c4c1c25a83b44da8097219002c185d3.329226.1737980865000000.zst'

Comment 10 errata-xmlrpc 2025-04-07 15:26:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:3635

Note You need to log in before you can comment on or make changes to this bug.